From patchwork Fri Mar 25 20:14:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rik van Riel X-Patchwork-Id: 12792061 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 71DDBC433EF for ; Fri, 25 Mar 2022 20:14:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 90C2C6B0071; Fri, 25 Mar 2022 16:14:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8BADD6B0073; Fri, 25 Mar 2022 16:14:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 782B06B0074; Fri, 25 Mar 2022 16:14:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0194.hostedemail.com [216.40.44.194]) by kanga.kvack.org (Postfix) with ESMTP id 697596B0071 for ; Fri, 25 Mar 2022 16:14:41 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 238A2182888BB for ; Fri, 25 Mar 2022 20:14:41 +0000 (UTC) X-FDA: 79284011562.21.8CB4C01 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) by imf28.hostedemail.com (Postfix) with ESMTP id 91258C0042 for ; Fri, 25 Mar 2022 20:14:39 +0000 (UTC) Received: from [2603:3005:d05:2b00:6e0b:84ff:fee2:98bb] (helo=imladris.surriel.com) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1nXqKL-0000sb-7e; Fri, 25 Mar 2022 16:14:29 -0400 Date: Fri, 25 Mar 2022 16:14:28 -0400 From: Rik van Riel To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, kernel-team@fb.com, Oscar Salvador , Miaohe Lin , Naoya Horiguchi , Mel Gorman , Johannes Weiner , Andrew Morton , stable@vger.kernel.org Subject: [PATCH] mm,hwpoison: unmap poisoned page before invalidation Message-ID: <20220325161428.5068d97e@imladris.surriel.com> X-Mailer: Claws Mail 4.0.0 (GTK+ 3.24.31; x86_64-redhat-linux-gnu) MIME-Version: 1.0 X-Rspam-User: Authentication-Results: imf28.hostedemail.com; dkim=none; spf=none (imf28.hostedemail.com: domain of riel@shelob.surriel.com has no SPF policy when checking 96.67.55.147) smtp.mailfrom=riel@shelob.surriel.com; dmarc=none X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 91258C0042 X-Stat-Signature: ex85tkuq3bcdw8hxo5utpts5zi677rem X-HE-Tag: 1648239279-687465 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In some cases it appears the invalidation of a hwpoisoned page fails because the page is still mapped in another process. This can cause a program to be continuously restarted and die when it page faults on the page that was not invalidated. Avoid that problem by unmapping the hwpoisoned page when we find it. Another issue is that sometimes we end up oopsing in finish_fault, if the code tries to do something with the now-NULL vmf->page. I did not hit this error when submitting the previous patch because there are several opportunities for alloc_set_pte to bail out before accessing vmf->page, and that apparently happened on those systems, and most of the time on other systems, too. However, across several million systems that error does occur a handful of times a day. It can be avoided by returning VM_FAULT_NOPAGE which will cause do_read_fault to return before calling finish_fault. Fixes: e53ac7374e64 ("mm: invalidate hwpoison page cache page in fault path") Cc: Oscar Salvador Cc: Miaohe Lin Cc: Naoya Horiguchi Cc: Mel Gorman Cc: Johannes Weiner Cc: Andrew Morton Cc: stable@vger.kernel.org Reviewed-by: Miaohe Lin Tested-by: Naoya Horiguchi Reviewed-by: Oscar Salvador --- mm/memory.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index be44d0b36b18..76e3af9639d9 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3918,14 +3918,18 @@ static vm_fault_t __do_fault(struct vm_fault *vmf) return ret; if (unlikely(PageHWPoison(vmf->page))) { + struct page *page = vmf->page; vm_fault_t poisonret = VM_FAULT_HWPOISON; if (ret & VM_FAULT_LOCKED) { + if (page_mapped(page)) + unmap_mapping_pages(page_mapping(page), + page->index, 1, false); /* Retry if a clean page was removed from the cache. */ - if (invalidate_inode_page(vmf->page)) - poisonret = 0; - unlock_page(vmf->page); + if (invalidate_inode_page(page)) + poisonret = VM_FAULT_NOPAGE; + unlock_page(page); } - put_page(vmf->page); + put_page(page); vmf->page = NULL; return poisonret; }