From patchwork Thu Jan 13 23:11:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naoya Horiguchi X-Patchwork-Id: 12713161 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D7EA7C433F5 for ; Thu, 13 Jan 2022 23:11:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 28D276B0072; Thu, 13 Jan 2022 18:11:29 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 23D326B0073; Thu, 13 Jan 2022 18:11:29 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 12CD26B0074; Thu, 13 Jan 2022 18:11:29 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0126.hostedemail.com [216.40.44.126]) by kanga.kvack.org (Postfix) with ESMTP id 057E86B0072 for ; Thu, 13 Jan 2022 18:11:29 -0500 (EST) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 9FC5993F20 for ; Thu, 13 Jan 2022 23:11:28 +0000 (UTC) X-FDA: 79026812256.30.28714E5 Received: from out2.migadu.com (out2.migadu.com [188.165.223.204]) by imf05.hostedemail.com (Postfix) with ESMTP id E3747100005 for ; Thu, 13 Jan 2022 23:11:27 +0000 (UTC) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1642115485; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=+CoZiINFVqY4tRz0h+rd5YDtl6623JrbL8fyCa3EoQc=; b=PsvDcht36POJkSlyvVHWr7CnBLmJ86SbuCW325MfocVlna/6eXcuFKLzu1Ipb3Mg1gDJJl OyyRcGYYXdPJ8pX/lnBjTRk2/lYDCqvujtxZ71tmxqlnavG8otpVY5i9Uap9c744b9Qg47 yaoLELWA6bY5PwRtCQOrQZ25gH7vNh4= From: Naoya Horiguchi To: linux-mm@kvack.org Cc: Andrew Morton , Tony Luck , Youquan Song , Naoya Horiguchi , linux-kernel@vger.kernel.org Subject: [PATCH v2] mm/hwpoison: Fix error page recovered but reported "not recovered" Date: Fri, 14 Jan 2022 08:11:17 +0900 Message-Id: <20220113231117.1021405-1-naoya.horiguchi@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Migadu-Auth-User: linux.dev X-Rspamd-Queue-Id: E3747100005 X-Stat-Signature: xmfjgo9esqdzy7gypd8brr3q5onsqhgf Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=PsvDcht3; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf05.hostedemail.com: domain of naoya.horiguchi@linux.dev designates 188.165.223.204 as permitted sender) smtp.mailfrom=naoya.horiguchi@linux.dev X-Rspamd-Server: rspam08 X-HE-Tag: 1642115487-810724 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Naoya Horiguchi When an uncorrected memory error is consumed there is a race between the CMCI from the memory controller reporting an uncorrected error with a UCNA signature, and the core reporting and SRAR signature machine check when the data is about to be consumed. If the CMCI wins that race, the page is marked poisoned when uc_decode_notifier() calls memory_failure() and the machine check processing code finds the page already poisoned. It calls kill_accessing_process() to make sure a SIGBUS is sent. But returns the wrong error code. Console log looks like this: [34775.674296] mce: Uncorrected hardware memory error in user-access at 3710b3400 [34775.675413] Memory failure: 0x3710b3: recovery action for dirty LRU page: Recovered [34775.690310] Memory failure: 0x3710b3: already hardware poisoned [34775.696247] Memory failure: 0x3710b3: Sending SIGBUS to einj_mem_uc:361438 due to hardware memory corruption [34775.706072] mce: Memory error not recovered kill_accessing_process() is supposed to return -EHWPOISON to notify that SIGBUS is already set to the process and kill_me_maybe() doesn't have to send it again. But current code simply fails to do this, so fix it to make sure to work as intended. This change avoids the noise message "Memory error not recovered" and skips duplicate SIGBUSs. [Tony: Reworded some parts of commit message] Fixes: a3f5d80ea401 ("mm,hwpoison: send SIGBUS with error virutal address") Reported-by: Youquan Song Cc: Tony Luck Signed-off-by: Naoya Horiguchi --- mm/memory-failure.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 14ae5c18e776..4c9bd1d37301 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -707,8 +707,10 @@ static int kill_accessing_process(struct task_struct *p, unsigned long pfn, (void *)&priv); if (ret == 1 && priv.tk.addr) kill_proc(&priv.tk, pfn, flags); + else + ret = 0; mmap_read_unlock(p->mm); - return ret ? -EFAULT : -EHWPOISON; + return ret > 0 ? -EHWPOISON : -EFAULT; } static const char *action_name[] = {