From patchwork Thu Mar 9 17:46:36 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Punit Agrawal X-Patchwork-Id: 9613775 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id EAAA1604D9 for ; Thu, 9 Mar 2017 17:47:25 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E09FE285A5 for ; Thu, 9 Mar 2017 17:47:25 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D40A628640; Thu, 9 Mar 2017 17:47:25 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID autolearn=ham version=3.3.1 Received: from bombadil.infradead.org (bombadil.infradead.org [65.50.211.133]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 460E4285A5 for ; Thu, 9 Mar 2017 17:47:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:Message-ID:In-Reply-To: Date:References:Subject:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=hiIUMNDdT1ruYcGT0svLx5hBLHxKyt7ho0GQW/40gsE=; b=XZBr2QHxnVKpVo d5g9AGgnb0xsq7VRf4S2L3B5bb9+uVbCzMh8nZav0M2KCWjZVEtE1ssJpvjZLLTbAEsBiaGtY8uin PV8x1MdWNQhtS0OfBwHsk6HnYKKLrxa4WtYhejdbJCNjiA1Fdh40w09cJwctjxSNZYrTtMwIBcZR/ Q16SEJNF3NtrZ9jwr1YFnA8b5XBfOTqESSgvHGLZWNFCtx/osKTZWpQvCoqzvWRUAAQPMJ3UsFun5 ANtebndz2eIcvJhDaSTOrmlfjnx8XVjHE1U/eR60Ddb6HTcfZYMzpTTsFzyClXn0ZbqTfPLSJLIEw bDXByVk1uuQc/dK5vHyw==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.87 #1 (Red Hat Linux)) id 1cm29t-0006FK-16; Thu, 09 Mar 2017 17:47:25 +0000 Received: from foss.arm.com ([217.140.101.70]) by bombadil.infradead.org with esmtp (Exim 4.87 #1 (Red Hat Linux)) id 1cm29S-0006DS-Eo for linux-arm-kernel@lists.infradead.org; Thu, 09 Mar 2017 17:47:01 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id AD2C713D5; Thu, 9 Mar 2017 09:46:37 -0800 (PST) Received: from localhost (e105922-lin.cambridge.arm.com [10.1.195.42]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 797D53F3E1; Thu, 9 Mar 2017 09:46:37 -0800 (PST) From: Punit Agrawal To: "Baicar\, Tyler" Subject: Re: [PATCH V2] arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling References: <1487720205-14594-1-git-send-email-tbaicar@codeaurora.org> <87wpc7o7mo.fsf@e105922-lin.cambridge.arm.com> <874lz4oo80.fsf@e105922-lin.cambridge.arm.com> Date: Thu, 09 Mar 2017 17:46:36 +0000 In-Reply-To: (Tyler Baicar's message of "Tue, 7 Mar 2017 13:28:01 -0700") Message-ID: <87efy6mjgj.fsf@e105922-lin.cambridge.arm.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux) MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20170309_094658_597618_1F2892C1 X-CRM114-Status: GOOD ( 23.23 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: mark.rutland@arm.com, "Jonathan \(Zhixiong\) Zhang" , catalin.marinas@arm.com, Steve Capper , will.deacon@arm.com, linux-kernel@vger.kernel.org, shijie.huang@arm.com, paul.gortmaker@windriver.com, james.morse@arm.com, sandeepa.s.prabhu@gmail.com, akpm@linux-foundation.org, linux-arm-kernel@lists.infradead.org Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Virus-Scanned: ClamAV using ClamSMTP [ +steve for arm64 mm and hugepages chops ] "Baicar, Tyler" writes: > On 3/7/2017 12:56 PM, Punit Agrawal wrote: >> Punit Agrawal writes: >> >> [...] >> >>> The code looks good but I ran into some failures while running the >>> hugepages hwpoison tests from mce-tests suite[0]. I get a bad pmd error >>> in dmesg - >>> >>> [ 344.165544] mm/pgtable-generic.c:33: bad pmd 000000083af00074. >>> >>> I suspect that this is due to the huge pte accessors not correctly >>> dealing with poisoned entries (which are represented as swap entries). >> I think I've got to the bottom of the issue - the problem is due to >> huge_pte_at() returning NULL for poisoned pmd entries (which in turn is >> due to pmd_present() not handling poisoned pmd entries correctly) >> >> The following is the call chain for the failure case. >> >> do_munmap >> unmap_region >> unmap_vmas >> unmap_single_vma >> __unmap_hugepage_range_final # The test case uses hugepages >> __unmap_hugepage_range >> huge_pte_offset # Returns NULL for a poisoned pmd >> >> Reverting 5bb1cc0ff9a6 ("arm64: Ensure pmd_present() returns false after >> pmd_mknotpresent()") fixes the problem for me but I don't think that is >> the right fix. >> >> While I work on a proper fix, it would be great if you can confirm that >> reverting 5bb1cc0ff9a6 makes the problem go away at your end. > Thanks Punit! I haven't got a chance to do this yet, but I will let > you know once I get it tested :) This time with a patch. Please test this instead. After a lot of head scratching, I've bit the bullet and added a check to return the poisoned entry from huge_pte_offset(). What with having to deal with contiguous hugepages et al., there just doesn't seem to be any leeway in how we handle the situation here. Let's see if there are any other ideas. Patch follows. Thanks, Punit ----------->8------------- From d5ad3f428e629c80b0f93f2bbdf99b4cae28c9bc Mon Sep 17 00:00:00 2001 From: Punit Agrawal Date: Thu, 9 Mar 2017 16:16:29 +0000 Subject: [PATCH] arm64: hugetlb: Fix huge_pte_offset to return poisoned pmd When memory failure is enabled, a poisoned hugepage PMD is marked as a swap entry. As pmd_present() only checks for VALID and PROT_NONE bits (turned off for swap entries), it causues huge_pte_offset() to return NULL for poisoned PMDs. This behaviour of huge_pte_offset() leads to the error such as below when munmap is called on poisoned hugepages. [ 344.165544] mm/pgtable-generic.c:33: bad pmd 000000083af00074. Fix huge_pte_offset() to return the poisoned PMD which is then appropriately handled by the generic layer code. Signed-off-by: Punit Agrawal Cc: Catalin Marinas Cc: Steve Capper --- arch/arm64/mm/hugetlbpage.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) -- 2.11.0 diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index e25584d72396..9263f206353c 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -150,8 +150,17 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr) if (pud_huge(*pud)) return (pte_t *)pud; pmd = pmd_offset(pud, addr); + + /* + * In case of HW Poisoning, a hugepage pmd can contain + * poisoned entries. Poisoned entries are marked as swap + * entries. + * + * For pmds that are not present, check to see if it could be + * a swap entry (!present and !none) before giving up. + */ if (!pmd_present(*pmd)) - return NULL; + return !pmd_none(*pmd) ? (pte_t *)pmd : NULL; if (pte_cont(pmd_pte(*pmd))) { pmd = pmd_offset(