From patchwork Tue Jan 7 08:17:33 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shuai Xue X-Patchwork-Id: 13928412 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 789B1E77198 for ; Tue, 7 Jan 2025 08:17:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DCC156B00A4; Tue, 7 Jan 2025 03:17:48 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D53FB6B00AC; Tue, 7 Jan 2025 03:17:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B80FB6B00C4; Tue, 7 Jan 2025 03:17:48 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 5D62C6B00A4 for ; Tue, 7 Jan 2025 03:17:48 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 13E1A1C6A93 for ; Tue, 7 Jan 2025 08:17:48 +0000 (UTC) X-FDA: 82979952216.16.E15E072 Received: from out30-118.freemail.mail.aliyun.com (out30-118.freemail.mail.aliyun.com [115.124.30.118]) by imf10.hostedemail.com (Postfix) with ESMTP id 10C5EC0010 for ; Tue, 7 Jan 2025 08:17:45 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=MhaBbna1; spf=pass (imf10.hostedemail.com: domain of xueshuai@linux.alibaba.com designates 115.124.30.118 as permitted sender) smtp.mailfrom=xueshuai@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736237866; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=FVLlDvOc/WmPvf7983kR+UwF2h9MmqIpndaukwqMXSs=; b=Ywhx3+GDIHSs9gqoPBIDxQapyVKv94q+j/BRJ+jdp53yrp6ZXCcA/5gbCawTyYQV5jPpZV +kZnrsgqkVKpp4yrRw3pxLQ0An2qQjaf/55Bm7tPcCnQB6JtFhOkZ0FAlUxXsTB+RrwjWZ D9dBWaPnC2QnvXzlFRy3Qf3MHHrhqhw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736237866; a=rsa-sha256; cv=none; b=2s6T8mr+o5CvbAkgaUolCsTUkoC3odaZt5cTpXdu95DwcABy2K60JewowFutJkR0QVbJFS YnchBam/HVSz9a/n9m8qHfTUBOBrUiPkSDmaA96WEpV5N6QVypYtJBPWrSElHrtpJ8yIvd PKkKyUVBfDM3KW7GKgUomfIupvb4Olc= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=MhaBbna1; spf=pass (imf10.hostedemail.com: domain of xueshuai@linux.alibaba.com designates 115.124.30.118 as permitted sender) smtp.mailfrom=xueshuai@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1736237863; h=From:To:Subject:Date:Message-ID:MIME-Version; bh=FVLlDvOc/WmPvf7983kR+UwF2h9MmqIpndaukwqMXSs=; b=MhaBbna1yMTgDZEU6gFUkB9GKL8NMlzEqTEaCWjO0/bUL+Kz34zVHZUqYEn9lrpwhTOu/wv/rdXcEtbxJKUmnxDz3rE137M37pv5xhU1U2B30tJVeOEwnHRUxBclpImhBslTNqGwxZVnyj6JMvoe168TE2lOKmhI8Eohq4HDK4Y= Received: from localhost.localdomain(mailfrom:xueshuai@linux.alibaba.com fp:SMTPD_---0WNA-B3T_1736237859 cluster:ay36) by smtp.aliyun-inc.com; Tue, 07 Jan 2025 16:17:41 +0800 From: Shuai Xue To: yazen.ghannam@amd.com, mark.rutland@arm.com, catalin.marinas@arm.com, mingo@redhat.com, robin.murphy@arm.com, Jonathan.Cameron@Huawei.com, bp@alien8.de, rafael@kernel.org, linux-arm-kernel@lists.infradead.org, wangkefeng.wang@huawei.com, tanxiaofei@huawei.com, mawupeng1@huawei.com, tony.luck@intel.com, linmiaohe@huawei.com, naoya.horiguchi@nec.com, james.morse@arm.com, tongtiangen@huawei.com, gregkh@linuxfoundation.org, will@kernel.org, jarkko@kernel.org Cc: linux-acpi@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-edac@vger.kernel.org, x86@kernel.org, xueshuai@linux.alibaba.com, justin.he@arm.com, ardb@kernel.org, ying.huang@linux.alibaba.com, ashish.kalra@amd.com, baolin.wang@linux.alibaba.com, tglx@linutronix.de, dave.hansen@linux.intel.com, lenb@kernel.org, hpa@zytor.com, robert.moore@intel.com, lvying6@huawei.com, xiexiuqi@huawei.com, zhuo.song@linux.alibaba.com Subject: [PATCH v18 1/3] ACPI: APEI: send SIGBUS to current task if synchronous memory error not recovered Date: Tue, 7 Jan 2025 16:17:33 +0800 Message-ID: <20250107081735.16159-2-xueshuai@linux.alibaba.com> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20250107081735.16159-1-xueshuai@linux.alibaba.com> References: <20250107081735.16159-1-xueshuai@linux.alibaba.com> MIME-Version: 1.0 X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 10C5EC0010 X-Stat-Signature: jxi7m3im87h7ip8km1znw79qyb3aws1n X-Rspam-User: X-HE-Tag: 1736237865-12499 X-HE-Meta: U2FsdGVkX18EzcVybfpwqba6/edwKAjcw4zg0bkj7ASfODEuiFNhwuCSFZw3JqPe0y4TuUEY0gVq36bb+BOKu/Dcq4ybLnFsxtbclJNgVDgspR6UjsWfh4KByG2jbGTJ28gZ18f8JTFVKwgB8fCeCvenaz2ObH8AYh82/IoYqMV2fxFP0KCPpeNtX6p8l+Ct8hyYUE3VnfoYydz/oP9JV3cWqiEpzd9GKp0Ud7eZCBvLqzYeWjXdNmTj258CMGa1Ekx5FCsOU9D1y/CDYQ085aDfH1Nq8CAZ32Rr88iV/sDmUdGJc4F/b5a7xUhyDcwP+eFyrbCfx7vZmqcIaB2jaJAebeO3PQn8lvXKc93PDkUVDPNrz6EmTmPPu5ns4jeUbMPhDAcr8jN7+1cCp2s132i7zYqsAQEJpvxlEB8dXy6oUY4HPq8nojPRn0zv6rou34pngjJwlgjw+Sz5MPpewk/yHWurCBgIaFGFkqfNXyuWktIlfc0xvaagQv4g7fzZT69vatJKFXF/qT2dB5ikK3bMDqLQdPqhJ4EE6sN7yte0d0NnAnYXDS907SlNcLKIu6ZDyR9I55OUBOoFkRV+qP0VUdN+yrajZ1vqlRIB0SRIjfV+b8pJZqS7KN+1qw499qdLnSqILsrow83L0lL/70JOOF8Wd2/KZKasfYMbdMEvtGVziVU7N4/uwjcUm6z/NkqBbJ9PpMkqDcPpc52cjvtsJfVsYzIPfvpBsESfvIIhXWi7eG6LfpqX0lBxIjH6GQB+VZqceS6p7q/0BGU//msiUckLp2wU2CWjGrWsxjobn59nZp3sdoffOX097/N4OWfcSnnpzyjTxe/tkC8g+6s57sclaKEh8T2tYrzS8VKEgCwgqBGBtNvfLu6VKzIQVbXsRUZ7/9RW3nxKG7x4MmMBGjcuxox0jkCUp88+qRN04KyPT2cFDiT0yRdwS0sX4wcasNtz9w4NJ45mg4r KPug7lm+ icNA8BmOurrgRhcrIQEXyu4pafKDCVR6nfaxz0qYogb49q8+D5BBGNoZsrCZeOSQoVSu1ns3duIYJjhIbC7utw4zjBGvGY1AChJbya2+aLiGgK90w4nXRsIv70hpmK4dBst+h2Flb+RNBngtrAoOp4WXR8ACR4fbkirQbeWqJpAHqB4K+Cx7eXtsZLUMaMBN5jjdVxnf3CMJ6ETnO3DQMb+S0VlUkR9oPdA1o5Qbz8hLiIe1R//XlsbfH9AkkatHN0OucPIXvMk/f3Zo7dVZMTI9sBLoX4V97qpk7TCmr/fO1tzmUwHD9yHhAngGZwZr3csIp X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Synchronous error was detected as a result of user-space process accessing a 2-bit uncorrected error. The CPU will take a synchronous error exception such as Synchronous External Abort (SEA) on Arm64. The kernel will queue a memory_failure() work which poisons the related page, unmaps the page, and then sends a SIGBUS to the process, so that a system wide panic can be avoided. However, no memory_failure() work will be queued when abnormal synchronous errors occur. These errors can include situations such as invalid PA, unexpected severity, no memory failure config support, invalid GUID section, etc. In such case, the user-space process will trigger SEA again. This loop can potentially exceed the platform firmware threshold or even trigger a kernel hard lockup, leading to a system reboot. Fix it by performing a force kill if no memory_failure() work is queued for synchronous errors. Signed-off-by: Shuai Xue Reviewed-by: Jarkko Sakkinen Reviewed-by: Jonathan Cameron Reviewed-by: Yazen Ghannam Reviewed-by: Jane Chu --- drivers/acpi/apei/ghes.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 07789f0b59bc..0e643e26c8ad 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -801,6 +801,17 @@ static bool ghes_do_proc(struct ghes *ghes, } } + /* + * If no memory failure work is queued for abnormal synchronous + * errors, do a force kill. + */ + if (sync && !queued) { + dev_err(ghes->dev, + HW_ERR GHES_PFX "%s:%d: synchronous unrecoverable error (SIGBUS)\n", + current->comm, task_pid_nr(current)); + force_sig(SIGBUS); + } + return queued; } From patchwork Tue Jan 7 08:17:34 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shuai Xue X-Patchwork-Id: 13928413 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4627AE77198 for ; Tue, 7 Jan 2025 08:17:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C6C4A8D0005; Tue, 7 Jan 2025 03:17:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BF64A8D0001; Tue, 7 Jan 2025 03:17:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A206F8D0005; Tue, 7 Jan 2025 03:17:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 3D2988D0001 for ; Tue, 7 Jan 2025 03:17:52 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id D6C401407A7 for ; Tue, 7 Jan 2025 08:17:51 +0000 (UTC) X-FDA: 82979952342.14.257A10A Received: from out30-130.freemail.mail.aliyun.com (out30-130.freemail.mail.aliyun.com [115.124.30.130]) by imf03.hostedemail.com (Postfix) with ESMTP id C065D20007 for ; Tue, 7 Jan 2025 08:17:49 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b="bAJZH/1r"; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf03.hostedemail.com: domain of xueshuai@linux.alibaba.com designates 115.124.30.130 as permitted sender) smtp.mailfrom=xueshuai@linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736237870; a=rsa-sha256; cv=none; b=Skn1FxU7q/gOg0JUXnDt94viHSkCoXqjtlkVJpnW3OqWZRitVFr82EEPJ0eMEqqbGL+Yw+ jXaVF/G277FAr/faIU/IMMJ3jATXCIq321SMNq+QmxJQG87AV11ei1ZqAr07T6zM+Azvx1 0hRzDYOxNaZIxodg5MbviwKo/WNTDws= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b="bAJZH/1r"; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf03.hostedemail.com: domain of xueshuai@linux.alibaba.com designates 115.124.30.130 as permitted sender) smtp.mailfrom=xueshuai@linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736237870; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fuOOqQ61+VArG3go1uLMFptm6dU/o8SC/xzK+2tkgc4=; b=jfUjf8D913Rccw15ZDnsi6HDNMiulUiCSIvGz11dV/PdSmmydM8K1uKeOuGPGswJPdoewa mE0l9R2ZV7NcQaBWUjPS7OatSlkY/BRVo3vCM0+3znUN7gU2jFTmNUwrgnJi5dO6+TWlq3 RmNHUQq6RiPBg537rqv62a8RmkZZZ/Q= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1736237866; h=From:To:Subject:Date:Message-ID:MIME-Version; bh=fuOOqQ61+VArG3go1uLMFptm6dU/o8SC/xzK+2tkgc4=; b=bAJZH/1rJZWXvaQGt9u7MOF82uMU5bkbFA1Owjj6xgil7dxuSDLFU4viv+NwDf7WfGz5AP09tRGBI7IbXDwYnRgZ5dB/b5YtY03m7EZozmaRXdR4x8xrWErkmwC+H0Igq64GmmcKC6uSmvCBbZpH572CxUdALAAA88VnOD5oYgI= Received: from localhost.localdomain(mailfrom:xueshuai@linux.alibaba.com fp:SMTPD_---0WNA-B4H_1736237862 cluster:ay36) by smtp.aliyun-inc.com; Tue, 07 Jan 2025 16:17:43 +0800 From: Shuai Xue To: yazen.ghannam@amd.com, mark.rutland@arm.com, catalin.marinas@arm.com, mingo@redhat.com, robin.murphy@arm.com, Jonathan.Cameron@Huawei.com, bp@alien8.de, rafael@kernel.org, linux-arm-kernel@lists.infradead.org, wangkefeng.wang@huawei.com, tanxiaofei@huawei.com, mawupeng1@huawei.com, tony.luck@intel.com, linmiaohe@huawei.com, naoya.horiguchi@nec.com, james.morse@arm.com, tongtiangen@huawei.com, gregkh@linuxfoundation.org, will@kernel.org, jarkko@kernel.org Cc: linux-acpi@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-edac@vger.kernel.org, x86@kernel.org, xueshuai@linux.alibaba.com, justin.he@arm.com, ardb@kernel.org, ying.huang@linux.alibaba.com, ashish.kalra@amd.com, baolin.wang@linux.alibaba.com, tglx@linutronix.de, dave.hansen@linux.intel.com, lenb@kernel.org, hpa@zytor.com, robert.moore@intel.com, lvying6@huawei.com, xiexiuqi@huawei.com, zhuo.song@linux.alibaba.com Subject: [PATCH v18 2/3] mm: memory-failure: move return value documentation to function declaration Date: Tue, 7 Jan 2025 16:17:34 +0800 Message-ID: <20250107081735.16159-3-xueshuai@linux.alibaba.com> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20250107081735.16159-1-xueshuai@linux.alibaba.com> References: <20250107081735.16159-1-xueshuai@linux.alibaba.com> MIME-Version: 1.0 X-Stat-Signature: 5oncxbogr3rtezhsui4n1xtpkdoay7x7 X-Rspam-User: X-Rspamd-Queue-Id: C065D20007 X-Rspamd-Server: rspam08 X-HE-Tag: 1736237869-508877 X-HE-Meta: U2FsdGVkX18m5z7foYAj1ugKO8KRe6gEvVMQtkPH0zcZgk9BQbX3nk7JUitGcArLjjiIpThffhXLpHfDoreXaW23ZLpQooUVYznlc4h7N/x5JBQWrzGiU32ErXnobX8ghPO3OzmTTgKnjnZbhbQEBu62EkIGgbP6EnbDmByOwbWogDgJy3hyT2/za5L2D3AZGL0d5czAbII+Ko1NY7dwwKf0TgYUrSwyb9aQPGKBdCF2Eq5PwTDL5U408w0gxhgEyBygGy6KZ8I4heEmV7rqLXOARTy2nVs9Mt87mq2xquaiOmClsc+OBE4TEp5iBTJ3bEumKi/bXrntwN0RmFJPeY17FnIF5jZUWrQstf4K38BWu4WP+D6nspzB2ckllkyibgE/ux9qLFSU08lrSFnfeWLkMKLUB5yyKgZ9MwcH0l4yCFh3ybAPr0unVZdq8O3JR4v6snPmUe0BGdNZqKlFvKP3acSULVT6TQVeQaHdh5di6V2PvS2iWXxMIY9wirjIqxIft275t9Jm6lBiHOd1jLc/EpeUnzOlCdxoZekLltXmhMqY1KD7jkuidE4p9TGqr99nnpBbYXuKrOPhHCNptqj9IjgsgBGzpq5O6v0R1Vd7SB01U/awXa/a7smOq+b35M2JEIkt95AolnZxwcIidK51IBv4w5OnABEIUvdk/PVwZ/ZEB4QlORWim6S6K7qoINuJMvJ+fJ7G/CW8DIht/YGzY4jWrCwkfQnIaF1LzCpYorOy2bUcrdt90ymL8ReRhDN1s6qOaxVuxBy5mp3XFVP5GHaHZlzifAKvHat9xUzYpboQkI/U7eqjfBBVHBVOlt+PLfdIVqZqxg9smc5WuwUF+E5QS8bWS9Q75/qhuwj2+HGXw4YiMEu8BHTIONJemYHA8rAwwb0giIlR5uGtXKAD9lKsM/MwOHqblacwlvAzgDrdaSob103CFm9g3jKy4HU1eiwTZRIoOzA3hUX 9UCVz2KY EiYwKibMP5/SN9CnzcitU5iqi1Z38va3dozwNyAaNFJ4vn5vfZcjXPue5O1kUL0hJYesyFS02LfBYJo8Fj9KBwlZtn80zC4gg1fHNzl9nptwJD3VqMDQ37by1n7C7WO38PwqkatzXO2UqSOmbIo8t63ofGfDKIh9reVjHMegpFOo5cow8T8iO+T6c3TqDYsSG+uVxdK45NrUuPLU7CYtT39FdeUMdKSFWmLY4VdHhG38tkschXraiRZsgJ8/JFq2ibXN3K+vv8u0860W57Rn6dGMvhzuakM08OrGz61m/suC8r9yPeuDF9YgPn9BIuYHY+eAj X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Part of return value comments for memory_failure() were originally documented at the call site. Move those comments to the function declaration to improve code readability and to provide developers with immediate access to function usage and return information. Signed-off-by: Shuai Xue Reviewed-by: Jarkko Sakkinen Reviewed-by: Jonathan Cameron Reviewed-by: Yazen Ghannam Reviewed-by: Jane Chu --- arch/x86/kernel/cpu/mce/core.c | 7 ------- mm/memory-failure.c | 10 +++++++--- 2 files changed, 7 insertions(+), 10 deletions(-) diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 7fb5556a0b53..d1dd7f892514 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -1398,13 +1398,6 @@ static void kill_me_maybe(struct callback_head *cb) return; } - /* - * -EHWPOISON from memory_failure() means that it already sent SIGBUS - * to the current process with the proper error info, - * -EOPNOTSUPP means hwpoison_filter() filtered the error event, - * - * In both cases, no further processing is required. - */ if (ret == -EHWPOISON || ret == -EOPNOTSUPP) return; diff --git a/mm/memory-failure.c b/mm/memory-failure.c index a7b8ccd29b6f..14c316d7d38d 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -2211,9 +2211,13 @@ static void kill_procs_now(struct page *p, unsigned long pfn, int flags, * Must run in process context (e.g. a work queue) with interrupts * enabled and no spinlocks held. * - * Return: 0 for successfully handled the memory error, - * -EOPNOTSUPP for hwpoison_filter() filtered the error event, - * < 0(except -EOPNOTSUPP) on failure. + * Return: + * 0 - success, + * -ENXIO - memory not managed by the kernel + * -EOPNOTSUPP - hwpoison_filter() filtered the error event, + * -EHWPOISON - the page was already poisoned, potentially + * kill process, + * other negative values - failure. */ int memory_failure(unsigned long pfn, int flags) { From patchwork Tue Jan 7 08:17:35 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shuai Xue X-Patchwork-Id: 13928414 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3630EE77197 for ; Tue, 7 Jan 2025 08:17:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D30598D0006; Tue, 7 Jan 2025 03:17:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CB8E78D0001; Tue, 7 Jan 2025 03:17:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A97878D0006; Tue, 7 Jan 2025 03:17:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 842A58D0001 for ; Tue, 7 Jan 2025 03:17:53 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 430351A07DB for ; Tue, 7 Jan 2025 08:17:53 +0000 (UTC) X-FDA: 82979952426.03.C422691 Received: from out30-124.freemail.mail.aliyun.com (out30-124.freemail.mail.aliyun.com [115.124.30.124]) by imf24.hostedemail.com (Postfix) with ESMTP id D788A18000C for ; Tue, 7 Jan 2025 08:17:50 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=m+dJWalD; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf24.hostedemail.com: domain of xueshuai@linux.alibaba.com designates 115.124.30.124 as permitted sender) smtp.mailfrom=xueshuai@linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736237871; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=rVGasGlP2qaqHjm+cUU7drqtAQgLCI7PJLkwGbZhVtw=; b=7JWSRKA9M9ZDFUf+7dBUEqCM/o1O4KzFMlF+fQcUnQua+5SyRdFmWmcvuipbfs1CmqetjX +6Uz9QhyVAj6eXqZVAZCO8D5CDSHkou9/dIeBZYMWE9sHZFddOe7JZ4csf1TUJUDz3CznK VLzpv8qb5OcNpsSoxufEm/yKQI2u65U= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736237871; a=rsa-sha256; cv=none; b=ts0s5she6WwYMk5Vdl7KNSPoq3XoVADEN1TT9fdbjQp5WBiIdAo/teYKtPODyGzEnBDqGq boYW24GYoPxRZffv5tz782AYHsDfvCgPiEV82FBvtj5f3F3y1OnOn+CtDC/n4RsM6J0naF sb+wW934TNWh0JK+BgOBYefWpZ5Ixfs= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=m+dJWalD; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf24.hostedemail.com: domain of xueshuai@linux.alibaba.com designates 115.124.30.124 as permitted sender) smtp.mailfrom=xueshuai@linux.alibaba.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1736237867; h=From:To:Subject:Date:Message-ID:MIME-Version; bh=rVGasGlP2qaqHjm+cUU7drqtAQgLCI7PJLkwGbZhVtw=; b=m+dJWalD8wQ7103tLl+AQJVLHAuOossQQtD8mF1fHWJyuRqPJODEqMMQ/GlqGwj443kd3MeDCM752cHIUUg5zqk2EuTdTZtOccKlIXdtOKCDADw5jh/FOt604n3n1AsDjYItr5OYH1Q0csah3iCBpvVEwgYLAPYpOeNLAtVCBe8= Received: from localhost.localdomain(mailfrom:xueshuai@linux.alibaba.com fp:SMTPD_---0WNA-B5m_1736237864 cluster:ay36) by smtp.aliyun-inc.com; Tue, 07 Jan 2025 16:17:46 +0800 From: Shuai Xue To: yazen.ghannam@amd.com, mark.rutland@arm.com, catalin.marinas@arm.com, mingo@redhat.com, robin.murphy@arm.com, Jonathan.Cameron@Huawei.com, bp@alien8.de, rafael@kernel.org, linux-arm-kernel@lists.infradead.org, wangkefeng.wang@huawei.com, tanxiaofei@huawei.com, mawupeng1@huawei.com, tony.luck@intel.com, linmiaohe@huawei.com, naoya.horiguchi@nec.com, james.morse@arm.com, tongtiangen@huawei.com, gregkh@linuxfoundation.org, will@kernel.org, jarkko@kernel.org Cc: linux-acpi@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-edac@vger.kernel.org, x86@kernel.org, xueshuai@linux.alibaba.com, justin.he@arm.com, ardb@kernel.org, ying.huang@linux.alibaba.com, ashish.kalra@amd.com, baolin.wang@linux.alibaba.com, tglx@linutronix.de, dave.hansen@linux.intel.com, lenb@kernel.org, hpa@zytor.com, robert.moore@intel.com, lvying6@huawei.com, xiexiuqi@huawei.com, zhuo.song@linux.alibaba.com Subject: [PATCH v18 3/3] ACPI: APEI: handle synchronous exceptions in task work Date: Tue, 7 Jan 2025 16:17:35 +0800 Message-ID: <20250107081735.16159-4-xueshuai@linux.alibaba.com> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20250107081735.16159-1-xueshuai@linux.alibaba.com> References: <20250107081735.16159-1-xueshuai@linux.alibaba.com> MIME-Version: 1.0 X-Stat-Signature: zzx89stc9b7kwc7ixwibjomwfjxexae6 X-Rspamd-Queue-Id: D788A18000C X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1736237870-289706 X-HE-Meta: U2FsdGVkX18RfcVFO/AJySXAcQNkZ4cGwbNTI0DxUlyUlYoKUXn4Pj4DHeV9n1HLgn/HI4C64zUanqc/YoDsIH5NYgghyBNosnL+KDAfXwz0nmnyFMjQsjq2DRVkZKb5smrFFkUdzZ7ApBoMouZYVdn+UMZq4uo6ZHnXalq1m4FY1sjgZWdS9cQYfhjoiWI4dLLEkzvgBzjuFkoUmXtyZ2N4+d797CdHBGeQfYGpPhqMsedXYJZAYWIDWFhkrpZJfT5yc+8z8Lh2Ilqtpo8ogsEAkOm/icrq6dEvc4h9EP2/NS9wPiI+T3VbWN2mhvYR/4K9436p86teTn0I5hYedrfBC9xK3UmX1lh1oobqq/aBq9r6g92prRA8B491ypv9wMVgeZFvt1g4Jks+w5iITbvj9gH3DTtoj8RmqhrEDQdowv1flomLjeKUOXpMLC3a6jlo6QYk9hLnnKCl2znpNWuWumxDpqlTzpJ+HxYyUo+/Tp8W64lKMgxFQZ0AeSAVIgqdhK4gyu+koJTMnzdRMYh7SQ7JRuD4Zp85p4C/JniqLw632g0XyjVIIb8rA4gCQrZ9Vqeeb2NGK3vyUnqmjqfhbGyhxpd2at/8VfM6617J5lihMpiLAZHegBFSIX2PDOoT8DPQKLEZOFAA/C/qSxjLIXPPQ1S2LMo1kYStE31dbr3gKqf/G0sIC4TGfxPNAn9B1b559CCGVd3Ts8VJR2Mc5Khq7Cn24SvltdwZXLpKA/ko15tQuLg3mgnbq9srsHzO0BKO5eKic7GnBXKxGWlSwNEQzR5gq4MQFeNEqiGfvSwOZ8cwV1zhb75yy4peu4v+/uX1DCs203Bni846TmU4epEA7x9A2UrcdK1/tO6DVUjqHPxFsruPC5/75u5JvqSmLhD+dE1Fucw213eC+GCD+tj0gb+t0j/7krppACrLG6Da6IgxmfFyNyxlV1wDcvl1eg70qxdIpajoY37 sSUqtMPq /uAfDsNzVqLIyowmTEt/5ou/BLGMepXyEl2iYL6SkbhGFvk3SMqGpsYg7+7fM6FPAFDyDV0/5OPsoRu9IgsoG2h5xiWPYi1VXQtiURuYv0fNFBRxCXgYFTm/DFIplbmry738qmFIBV1QdBr2J2NwRjMEJZXdqlKyqKqXMeI/Vji+Y0hRZbAR6JKwgbDlPyHlIKWV0TbIsa0NhKWr8lZSc0CwaNXaUDlXN8N/HcYKTKBdo2GpekrSMt7Ak3w3JRK7rHo+2RDzWrYovzn3OO4D7claBcnC4kHRKyXvwV/sdxTCFqFkGg3ePP0OHuAViFdPzkiVg X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The memory uncorrected error could be signaled by asynchronous interrupt (specifically, SPI in arm64 platform), e.g. when an error is detected by a background scrubber, or signaled by synchronous exception (specifically, data abort exception in arm64 platform), e.g. when a CPU tries to access a poisoned cache line. Currently, both synchronous and asynchronous error use memory_failure_queue() to schedule memory_failure() to exectute in a kworker context. As a result, when a user-space process is accessing a poisoned data, a data abort is taken and the memory_failure() is executed in the kworker context, memory_failure(): - will send wrong si_code by SIGBUS signal in early_kill mode, and - can not kill the user-space in some cases resulting a synchronous error infinite loop Issue 1: send wrong si_code in early_kill mode Since commit a70297d22132 ("ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events")', the flag MF_ACTION_REQUIRED could be used to determine whether a synchronous exception occurs on ARM64 platform. When a synchronous exception is detected, the kernel is expected to terminate the current process which has accessed poisoned page. This is done by sending a SIGBUS signal with an error code BUS_MCEERR_AR, indicating an action-required machine check error on read. However, when kill_proc() is called to terminate the processes who have the poisoned page mapped, it sends the incorrect SIGBUS error code BUS_MCEERR_AO because the context in which it operates is not the one where the error was triggered. To reproduce this problem: #sysctl -w vm.memory_failure_early_kill=1 vm.memory_failure_early_kill = 1 # STEP2: inject an UCE error and consume it to trigger a synchronous error #einj_mem_uc single 0: single vaddr = 0xffffb0d75400 paddr = 4092d55b400 injecting ... triggering ... signal 7 code 5 addr 0xffffb0d75000 page not present Test passed The si_code (code 5) from einj_mem_uc indicates that it is BUS_MCEERR_AO error and it is not the fact. After this patch: # STEP1: enable early kill mode #sysctl -w vm.memory_failure_early_kill=1 vm.memory_failure_early_kill = 1 # STEP2: inject an UCE error and consume it to trigger a synchronous error #einj_mem_uc single 0: single vaddr = 0xffffb0d75400 paddr = 4092d55b400 injecting ... triggering ... signal 7 code 4 addr 0xffffb0d75000 page not present Test passed The si_code (code 4) from einj_mem_uc indicates that it is a BUS_MCEERR_AR error as we expected. Issue 2: a synchronous error infinite loop If a user-space process, e.g. devmem, accesses a poisoned page for which the HWPoison flag is set, kill_accessing_process() is called to send SIGBUS to current processs with error info. Because the memory_failure() is executed in the kworker context, it will just do nothing but return EFAULT. So, devmem will access the posioned page and trigger an exception again, resulting in a synchronous error infinite loop. Such exception loop may cause platform firmware to exceed some threshold and reboot when Linux could have recovered from this error. To reproduce this problem: # STEP 1: inject an UCE error, and kernel will set HWPosion flag for related page #einj_mem_uc single 0: single vaddr = 0xffffb0d75400 paddr = 4092d55b400 injecting ... triggering ... signal 7 code 4 addr 0xffffb0d75000 page not present Test passed # STEP 2: access the same page and it will trigger a synchronous error infinite loop devmem 0x4092d55b400 To fix above two issues, queue memory_failure() as a task_work so that it runs in the context of the process that is actually consuming the poisoned data. Signed-off-by: Shuai Xue Tested-by: Ma Wupeng Reviewed-by: Kefeng Wang Reviewed-by: Xiaofei Tan Reviewed-by: Baolin Wang Reviewed-by: Jarkko Sakkinen Reviewed-by: Jonathan Cameron Reviewed-by: Jane Chu Reviewed-by: Yazen Ghannam --- drivers/acpi/apei/ghes.c | 81 +++++++++++++++++++++++----------------- include/acpi/ghes.h | 3 -- include/linux/mm.h | 1 - mm/memory-failure.c | 13 ------- 4 files changed, 46 insertions(+), 52 deletions(-) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 0e643e26c8ad..324fe9d306e6 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -466,28 +466,41 @@ static void ghes_clear_estatus(struct ghes *ghes, ghes_ack_error(ghes->generic_v2); } -/* - * Called as task_work before returning to user-space. - * Ensure any queued work has been done before we return to the context that - * triggered the notification. +/** + * struct ghes_task_work - for synchronous RAS event + * + * @twork: callback_head for task work + * @pfn: page frame number of corrupted page + * @flags: work control flags + * + * Structure to pass task work to be handled before + * returning to user-space via task_work_add(). */ -static void ghes_kick_task_work(struct callback_head *head) +struct ghes_task_work { + struct callback_head twork; + u64 pfn; + int flags; +}; + +static void memory_failure_cb(struct callback_head *twork) { - struct acpi_hest_generic_status *estatus; - struct ghes_estatus_node *estatus_node; - u32 node_len; + struct ghes_task_work *twcb = container_of(twork, struct ghes_task_work, twork); + int ret; - estatus_node = container_of(head, struct ghes_estatus_node, task_work); - if (IS_ENABLED(CONFIG_ACPI_APEI_MEMORY_FAILURE)) - memory_failure_queue_kick(estatus_node->task_work_cpu); + ret = memory_failure(twcb->pfn, twcb->flags); + gen_pool_free(ghes_estatus_pool, (unsigned long)twcb, sizeof(*twcb)); - estatus = GHES_ESTATUS_FROM_NODE(estatus_node); - node_len = GHES_ESTATUS_NODE_LEN(cper_estatus_len(estatus)); - gen_pool_free(ghes_estatus_pool, (unsigned long)estatus_node, node_len); + if (!ret || ret == -EHWPOISON || ret == -EOPNOTSUPP) + return; + + pr_err("%#llx: Sending SIGBUS to %s:%d due to hardware memory corruption\n", + twcb->pfn, current->comm, task_pid_nr(current)); + force_sig(SIGBUS); } static bool ghes_do_memory_failure(u64 physical_addr, int flags) { + struct ghes_task_work *twcb; unsigned long pfn; if (!IS_ENABLED(CONFIG_ACPI_APEI_MEMORY_FAILURE)) @@ -501,6 +514,18 @@ static bool ghes_do_memory_failure(u64 physical_addr, int flags) return false; } + if (flags == MF_ACTION_REQUIRED && current->mm) { + twcb = (void *)gen_pool_alloc(ghes_estatus_pool, sizeof(*twcb)); + if (!twcb) + return false; + + twcb->pfn = pfn; + twcb->flags = flags; + init_task_work(&twcb->twork, memory_failure_cb); + task_work_add(current, &twcb->twork, TWA_RESUME); + return true; + } + memory_failure_queue(pfn, flags); return true; } @@ -745,8 +770,8 @@ int cxl_cper_kfifo_get(struct cxl_cper_work_data *wd) } EXPORT_SYMBOL_NS_GPL(cxl_cper_kfifo_get, "CXL"); -static bool ghes_do_proc(struct ghes *ghes, - const struct acpi_hest_generic_status *estatus) +static void ghes_do_proc(struct ghes *ghes, + const struct acpi_hest_generic_status *estatus) { int sev, sec_sev; struct acpi_hest_generic_data *gdata; @@ -811,8 +836,6 @@ static bool ghes_do_proc(struct ghes *ghes, current->comm, task_pid_nr(current)); force_sig(SIGBUS); } - - return queued; } static void __ghes_print_estatus(const char *pfx, @@ -1114,9 +1137,7 @@ static void ghes_proc_in_irq(struct irq_work *irq_work) struct ghes_estatus_node *estatus_node; struct acpi_hest_generic *generic; struct acpi_hest_generic_status *estatus; - bool task_work_pending; u32 len, node_len; - int ret; llnode = llist_del_all(&ghes_estatus_llist); /* @@ -1131,25 +1152,16 @@ static void ghes_proc_in_irq(struct irq_work *irq_work) estatus = GHES_ESTATUS_FROM_NODE(estatus_node); len = cper_estatus_len(estatus); node_len = GHES_ESTATUS_NODE_LEN(len); - task_work_pending = ghes_do_proc(estatus_node->ghes, estatus); + + ghes_do_proc(estatus_node->ghes, estatus); + if (!ghes_estatus_cached(estatus)) { generic = estatus_node->generic; if (ghes_print_estatus(NULL, generic, estatus)) ghes_estatus_cache_add(generic, estatus); } - - if (task_work_pending && current->mm) { - estatus_node->task_work.func = ghes_kick_task_work; - estatus_node->task_work_cpu = smp_processor_id(); - ret = task_work_add(current, &estatus_node->task_work, - TWA_RESUME); - if (ret) - estatus_node->task_work.func = NULL; - } - - if (!estatus_node->task_work.func) - gen_pool_free(ghes_estatus_pool, - (unsigned long)estatus_node, node_len); + gen_pool_free(ghes_estatus_pool, (unsigned long)estatus_node, + node_len); llnode = next; } @@ -1210,7 +1222,6 @@ static int ghes_in_nmi_queue_one_entry(struct ghes *ghes, estatus_node->ghes = ghes; estatus_node->generic = ghes->generic; - estatus_node->task_work.func = NULL; estatus = GHES_ESTATUS_FROM_NODE(estatus_node); if (__ghes_read_estatus(estatus, buf_paddr, fixmap_idx, len)) { diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h index be1dd4c1a917..ebd21b05fe6e 100644 --- a/include/acpi/ghes.h +++ b/include/acpi/ghes.h @@ -35,9 +35,6 @@ struct ghes_estatus_node { struct llist_node llnode; struct acpi_hest_generic *generic; struct ghes *ghes; - - int task_work_cpu; - struct callback_head task_work; }; struct ghes_estatus_cache { diff --git a/include/linux/mm.h b/include/linux/mm.h index b1c3db9cf355..850165eaebfe 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3947,7 +3947,6 @@ enum mf_flags { int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index, unsigned long count, int mf_flags); extern int memory_failure(unsigned long pfn, int flags); -extern void memory_failure_queue_kick(int cpu); extern int unpoison_memory(unsigned long pfn); extern atomic_long_t num_poisoned_pages __read_mostly; extern int soft_offline_page(unsigned long pfn, int flags); diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 14c316d7d38d..e0adb665d07b 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -2499,19 +2499,6 @@ static void memory_failure_work_func(struct work_struct *work) } } -/* - * Process memory_failure work queued on the specified CPU. - * Used to avoid return-to-userspace racing with the memory_failure workqueue. - */ -void memory_failure_queue_kick(int cpu) -{ - struct memory_failure_cpu *mf_cpu; - - mf_cpu = &per_cpu(memory_failure_cpu, cpu); - cancel_work_sync(&mf_cpu->work); - memory_failure_work_func(&mf_cpu->work); -} - static int __init memory_failure_init(void) { struct memory_failure_cpu *mf_cpu;