From patchwork Tue Apr 11 10:48:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shuai Xue X-Patchwork-Id: 13207340 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EAA39C76196 for ; Tue, 11 Apr 2023 10:49:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 704D46B0072; Tue, 11 Apr 2023 06:49:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6B471900002; Tue, 11 Apr 2023 06:49:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 57C306B0074; Tue, 11 Apr 2023 06:49:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 45AD0900002 for ; Tue, 11 Apr 2023 06:49:00 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id DF1DF1405AD for ; Tue, 11 Apr 2023 10:48:59 +0000 (UTC) X-FDA: 80668787598.19.96B01FA Received: from out30-98.freemail.mail.aliyun.com (out30-98.freemail.mail.aliyun.com [115.124.30.98]) by imf29.hostedemail.com (Postfix) with ESMTP id BD499120017 for ; Tue, 11 Apr 2023 10:48:56 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=alibaba.com; spf=pass (imf29.hostedemail.com: domain of xueshuai@linux.alibaba.com designates 115.124.30.98 as permitted sender) smtp.mailfrom=xueshuai@linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681210138; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zyOAh51wIg0Dd+abVoexP6Q6fJ6YqRoOOwHNIxDaKuQ=; b=rBmBS8KfxOJjBRzR+Koxuiud3x7FHH0q+NxtFAC9HSe/9usU/eI+YolEHjP9jiOoeqm1od ygY7lRlJboG+758YxO46cR2EbNj/2bAOQsVzTc44H33BiQ+iKIc7pp9fiIclf+qtU5D/xO gwn1BNPrqT2XydDfcITJ5jb3JqmgyYY= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=alibaba.com; spf=pass (imf29.hostedemail.com: domain of xueshuai@linux.alibaba.com designates 115.124.30.98 as permitted sender) smtp.mailfrom=xueshuai@linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681210138; a=rsa-sha256; cv=none; b=ZezWpHKwZI0cLmtN6JfQaWBtpCM0YiPSQh9BCbOLy+xj4q66dQuEHt6aaeS5QuXSnpnqKd QGmriIfT5psnZmwfmOmiuKJWpvIQV74ffHlhBMgrOoLidZpN/yrhYMhkcd058vdabvpmSz McF9XSMzl2SZdP1pjS+L8+Z+C/fQuJc= X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R141e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046051;MF=xueshuai@linux.alibaba.com;NM=1;PH=DS;RN=24;SR=0;TI=SMTPD_---0Vfs-ska_1681210127; Received: from localhost.localdomain(mailfrom:xueshuai@linux.alibaba.com fp:SMTPD_---0Vfs-ska_1681210127) by smtp.aliyun-inc.com; Tue, 11 Apr 2023 18:48:51 +0800 From: Shuai Xue To: tanxiaofei@huawei.com, mawupeng1@huawei.com, tony.luck@intel.com, naoya.horiguchi@nec.com Cc: linux-acpi@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, xueshuai@linux.alibaba.com, justin.he@arm.com, akpm@linux-foundation.org, ardb@kernel.org, ashish.kalra@amd.com, baolin.wang@linux.alibaba.com, bp@alien8.de, cuibixuan@linux.alibaba.com, dave.hansen@linux.intel.com, james.morse@arm.com, jarkko@kernel.org, lenb@kernel.org, linmiaohe@huawei.com, lvying6@huawei.com, rafael@kernel.org, xiexiuqi@huawei.com, zhuo.song@linux.alibaba.com Subject: [PATCH v5 0/2] ACPI: APEI: handle synchronous exceptions with proper si_code Date: Tue, 11 Apr 2023 18:48:40 +0800 Message-Id: <20230411104842.37079-1-xueshuai@linux.alibaba.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20221027042445.60108-1-xueshuai@linux.alibaba.com> References: <20221027042445.60108-1-xueshuai@linux.alibaba.com> MIME-Version: 1.0 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: BD499120017 X-Rspam-User: X-Stat-Signature: fdndanqe4om4ma89357o7bmcd49pqpnw X-HE-Tag: 1681210136-158142 X-HE-Meta: U2FsdGVkX18qahdeXQtEGu8MEHJC4EIMD9IlvhGma68iumijX9xgOiQTjzB4ZK8mbanR1YDQSnWXqxCen3ZFfWTu/gdS0kQZmLTzYVuiISyOLCS4JLmuUQmujgfZGtYZjaXFFBCjHm+E+d7zpWEICNoVyDA7k6dMjs+Wutzbur/txD865DsxuOfbKW9rfBaL+C9N10Fq0St+wUSP3/4KwGq6jrHsdLRcReMFVyphrR048olxepWlMKIHpb/0VbtIWQjKNpHqsI6pTbhqO8HzfckdXI3/XxAjZLaGBfE75eEeLu4798qri5zE6wEL0ZdNKO/VNdn/aSVBoJ+AW/NTDp390pSs9NbGuHxc3n47uHgxv16wOY3XtEKtyNHcqEdIgIdiBJR9pRZwOV14nPnvIHk8FGfg6arto1kMm3ExdrpHkgXMbIMAcKZnxPUxb6yVgjW2VV3pvdRhIcpZo4kBWqFyI29aLocfXZC4Trv4VDa+wLp34pZTVBOkmrC/+vuY0AiWAa3ZYRXKWSCNR6V2KcAKUkI2oEhALBBqBayOvD5C/BVSin04L7xURqegP9MddfI+tNHkhYGaBRhUJtcTLVvHjuhVsrkchzm5eLUqMMSaNhBcdZx2er26cJ/1oOW8XYpYwHZqqxUrZpGkGCjbQvSonJVY5c1Dj9XZ4k5ttdeLIpzPvIv85XWEOqpwN8SbWNU2axr7qW8eKbeL15POQnl+HC5UwShYPNCBsVyYLy5ko5XaM68XsbKBz5x9NmkDuatRDgE9gOJtPRYyKN/ejMyG+mvVMU7lxuubLxxRKUkuz3VZUEnMEuMO3q/TS4BrRFm02E97s5L3IlGwRhqktUiewR9RrhNkssM7xYVtUTNh6RKZj4kX3NquMpeiL4yaqyexsB/xUtwL9oC56/sjg8cZ/LYN+WyyZ5eYnRRhVAduCrjg82Dhdrl4PfPI6S8bZbJgmIDJful6PJUVOkc Bc4oU/53 EJW2tk0IipU1/GdxW+c7IGb9dda9k0eRpXSzREA0l2sRLKG+WBrFaWrZb1wjjb96ArP2PBqwlx2TMX9IUXztRJhNTDw5oddNXRveixXafo6Z92ofhwh7mb4aPlV2Y6BhRu7MksDHlV6L0ovTM3MJRz0Y4VETGaS+XUwNiAnMbMtp6CGH2rCh7O9+I0PBsaU9kpVa9hEIRfCaMzb52+PKtCBhwD5tuu7wMSuarYouAkBS+WPnks1C4CcQ/5Fvr17kkXMHmmW2GMrNJooylOouQvt3DMgVRdLISGfgin9tw9ZFcq5Y= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: changes since v4 by addressing comments from Xiaofei: - do a force kill only for abnormal sync errors - Link: https://lore.kernel.org/lkml/1aa0ca90-d44c-aa99-1e2d-bd2ae610b088@linux.alibaba.com/ changes since v3 by addressing comments from Xiaofei: - do a force kill for abnormal memofy failure error such as invalid PA, unexpected severity, OOM, etc - pcik up tested-by tag from Ma Wupeng - Link: https://lore.kernel.org/lkml/1aa0ca90-d44c-aa99-1e2d-bd2ae610b088@linux.alibaba.com/ changes since v2 by addressing comments from Naoya: - rename mce_task_work to sync_task_work - drop ACPI_HEST_NOTIFY_MCE case in is_hest_sync_notify() - add steps to reproduce this problem in cover letter - Link: https://lore.kernel.org/lkml/1aa0ca90-d44c-aa99-1e2d-bd2ae610b088@linux.alibaba.com/ changes since v1: - synchronous events by notify type - Link: https://lore.kernel.org/lkml/20221206153354.92394-3-xueshuai@linux.alibaba.com/ Currently, both synchronous and asynchronous error are queued and handled by a dedicated kthread in workqueue. And Memory failure for synchronous error is synced by a cancel_work_sync trick which ensures that the corrupted page is unmapped and poisoned. And after returning to user-space, the task starts at current instruction which triggering a page fault in which kernel will send SIGBUS to current process due to VM_FAULT_HWPOISON. However, the memory failure recovery for hwpoison-aware mechanisms does not work as expected. For example, hwpoison-aware user-space processes like QEMU register their customized SIGBUS handler and enable early kill mode by seting PF_MCE_EARLY at initialization. Then the kernel will directy notify the process by sending a SIGBUS signal in memory failure with wrong si_code: BUS_MCEERR_AO si_code to the actual user-space process instead of BUS_MCEERR_AR. To address this problem: - PATCH 1 sets mf_flags as MF_ACTION_REQUIRED on synchronous events which indicates error happened in current execution context - PATCH 2 separates synchronous error handling into task work so that the current context in memory failure is exactly belongs to the task consuming poison data. Then, kernel will send SIGBUS with proper si_code in kill_proc(). Lv Ying and XiuQi also proposed to address similar problem and we discussed about new solution to add a new flag(acpi_hest_generic_data::flags bit 8) to distinguish synchronous event. [2][3] The UEFI community still has no response. After a deep dive into the SDEI TRM, the SDEI notification should be used for asynchronous error. As SDEI TRM[1] describes "the dispatcher can simulate an exception-like entry into the client, **with the client providing an additional asynchronous entry point similar to an interrupt entry point**". The client (kernel) lacks complete synchronous context, e.g. systeam register (ELR, ESR, etc). So notify type is enough to distinguish synchronous event. To reproduce this problem: # STEP1: enable early kill mode #sysctl -w vm.memory_failure_early_kill=1 vm.memory_failure_early_kill = 1 # STEP2: inject an UCE error and consume it to trigger a synchronous error #einj_mem_uc single 0: single vaddr = 0xffffb0d75400 paddr = 4092d55b400 injecting ... triggering ... signal 7 code 5 addr 0xffffb0d75000 page not present Test passed The si_code (code 5) from einj_mem_uc indicates that it is BUS_MCEERR_AO error and it is not fact. After this patch set: # STEP1: enable early kill mode #sysctl -w vm.memory_failure_early_kill=1 vm.memory_failure_early_kill = 1 # STEP2: inject an UCE error and consume it to trigger a synchronous error #einj_mem_uc single 0: single vaddr = 0xffffb0d75400 paddr = 4092d55b400 injecting ... triggering ... signal 7 code 4 addr 0xffffb0d75000 page not present Test passed The si_code (code 4) from einj_mem_uc indicates that it is BUS_MCEERR_AR error as we expected. [1] https://developer.arm.com/documentation/den0054/latest/ [2] https://lore.kernel.org/linux-arm-kernel/20221205160043.57465-4-xiexiuqi@huawei.com/T/ [3] https://lore.kernel.org/lkml/20221209095407.383211-1-lvying6@huawei.com/ Shuai Xue (2): ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events ACPI: APEI: handle synchronous exceptions in task work drivers/acpi/apei/ghes.c | 120 +++++++++++++++++++++++++++------------ include/acpi/ghes.h | 3 - mm/memory-failure.c | 13 ----- 3 files changed, 84 insertions(+), 52 deletions(-)