From patchwork Mon Sep 28 17:54:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 11804479 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 30315618 for ; Mon, 28 Sep 2020 17:56:13 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D8142221E7 for ; Mon, 28 Sep 2020 17:56:12 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=sent.com header.i=@sent.com header.b="Yb7AuBe8"; dkim=temperror (0-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="oGnbxuPk" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D8142221E7 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=sent.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 52E93900015; Mon, 28 Sep 2020 13:55:29 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 5093D90000D; Mon, 28 Sep 2020 13:55:29 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2BBFB900017; Mon, 28 Sep 2020 13:55:29 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0155.hostedemail.com [216.40.44.155]) by kanga.kvack.org (Postfix) with ESMTP id 15D9F90000D for ; Mon, 28 Sep 2020 13:55:29 -0400 (EDT) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id CFAE31F1A for ; Mon, 28 Sep 2020 17:55:28 +0000 (UTC) X-FDA: 77313222336.02.rub89_0f116c727183 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin02.hostedemail.com (Postfix) with ESMTP id A906210187192 for ; Mon, 28 Sep 2020 17:55:28 +0000 (UTC) X-Spam-Summary: 1,0,0,f1f780aa4b7aac6c,d41d8cd98f00b204,zi.yan@sent.com,,RULES_HIT:1:2:41:69:355:379:541:800:960:973:988:989:1260:1261:1311:1314:1345:1359:1437:1515:1605:1730:1747:1777:1792:2198:2199:2393:2553:2559:2562:2693:2895:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3874:4049:4250:4321:5007:6119:6120:6261:6653:6742:7576:7875:7903:8957:9592:10004:11026:11473:11658:11914:12043:12291:12296:12438:12555:12679:12683:12895:13894:13972:14096:14110:21080:21451:21611:21627:21990:30003:30054:30064:30069:30070:30090,0,RBL:64.147.123.17:@sent.com:.lbl8.mailshell.net-64.100.201.100 62.18.0.100;04yrgsm54zntodrh9sqefe8gnh8h4ypqrbqbitkdy56c478djnbyhrk4syx14hm.6xupxesc8bbbnam7s9eq8zad3gp748y88aos4f83nadxym37pz93ngtz8mywknq.y-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: rub89_0f116c727183 X-Filterd-Recvd-Size: 10746 Received: from wnew3-smtp.messagingengine.com (wnew3-smtp.messagingengine.com [64.147.123.17]) by imf50.hostedemail.com (Postfix) with ESMTP for ; Mon, 28 Sep 2020 17:55:28 +0000 (UTC) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailnew.west.internal (Postfix) with ESMTP id 1A2C8E9C; Mon, 28 Sep 2020 13:55:26 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Mon, 28 Sep 2020 13:55:27 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm1; bh=UVh0VnweyeEVQ +rLxzGuT0SDw8Id+iKtTujnmk1NI3o=; b=Yb7AuBe8LHothHCDCimJ8qPdzQi1S quLpm9keAzZDYTfqU54E4JCS5ZdmrkneT+HXaj9O3ILJZOHlXT9lLTJkSANrmNQu TFqu45k1gOhrNAhusDgnzexJKLvbhDG9GnVaDh+MdBy/7o0TEvEdIzfY6hvgw17t YSLVjFGFoi5tVcYX07TTiwlL1KVR6qpsL1+KPS1wJvu8X/zRh9RMct36aye+jIig RrO09bYeL7o0QmUlCCdIoEJqnE+uFiaSAQiIVbtO13lbDXM5yKuW0gIoHraCf8AA CSMCk0U/36t09FVtE/rkjJXU1IV75lZJ+xqRvrHUAil8PHgm20OVd9T6Q== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; bh=UVh0VnweyeEVQ+rLxzGuT0SDw8Id+iKtTujnmk1NI3o=; b=oGnbxuPk CZH9J57gwO5uDt84xEjPgIu+70w0mWYzXKR/G/o0UnKJ8pNMLqKPkA+8460HJALA jbmeiQ9TUraNbS+IyBs8QqMNN7oVuHRUH4ESxCbWqnh7JIlNYhGRBLPvYmlgH3TF rKWtL0vkqeh8uLyP2xnkhFO9ZuvA+4H88wGpunfDeTKgCseCZN9sSa8PmUGMgOg1 nDcGg6Ljd/zbASr1OXJG8lkiD34tdgI5my1qDTyOe5VRSYt6jLXvNEAAbBRHjXkU aMMD7LoRlxpmx2+BpfdQnnJWlek4131d0xmUTz/7VZxgnInZ48q2FNFyOyEZeiva qZo63V4nA7+CbA== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedujedrvdeigdeliecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpefhvffufffkofgjfhhrggfgsedtkeertdertddtnecuhfhrohhmpegkihcujggr nhcuoeiiihdrhigrnhesshgvnhhtrdgtohhmqeenucggtffrrghtthgvrhhnpeduhfffve ektdduhfdutdfgtdekkedvhfetuedufedtgffgvdevleehheevjefgtdenucfkphepuddv rdegiedruddtiedrudeigeenucevlhhushhtvghrufhiiigvpeduvdenucfrrghrrghmpe hmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomh X-ME-Proxy: Received: from nvrsysarch6.NVidia.COM (unknown [12.46.106.164]) by mail.messagingengine.com (Postfix) with ESMTPA id 2E2A9306468C; Mon, 28 Sep 2020 13:55:25 -0400 (EDT) From: Zi Yan To: linux-mm@kvack.org Cc: "Kirill A . Shutemov" , Roman Gushchin , Rik van Riel , Matthew Wilcox , Shakeel Butt , Yang Shi , Jason Gunthorpe , Mike Kravetz , Michal Hocko , David Hildenbrand , William Kucharski , Andrea Arcangeli , John Hubbard , David Nellans , linux-kernel@vger.kernel.org, Zi Yan Subject: [RFC PATCH v2 20/30] mm: page_vma_walk: teach it about PMD-mapped PUD THP. Date: Mon, 28 Sep 2020 13:54:18 -0400 Message-Id: <20200928175428.4110504-21-zi.yan@sent.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200928175428.4110504-1-zi.yan@sent.com> References: <20200928175428.4110504-1-zi.yan@sent.com> Reply-To: Zi Yan MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Zi Yan We now have PMD-mapped PUD THP and PTE-mapped PUD THP, page_vma_walk should handle them properly. Signed-off-by: Zi Yan --- mm/page_vma_mapped.c | 152 +++++++++++++++++++++++++++++++++---------- 1 file changed, 118 insertions(+), 34 deletions(-) diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index f88e845ad5e6..5a3c1b561ff5 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -7,6 +7,12 @@ #include "internal.h" +enum check_pmd_result { + PVM_NOT_MAPPED = 0, + PVM_LEAF_ENTRY, + PVM_NONLEAF_ENTRY, +}; + static inline bool not_found(struct page_vma_mapped_walk *pvmw) { page_vma_mapped_walk_done(pvmw); @@ -52,6 +58,22 @@ static bool map_pte(struct page_vma_mapped_walk *pvmw) return true; } +static bool map_pmd(struct page_vma_mapped_walk *pvmw) +{ + pmd_t pmde; + + pvmw->pmd = pmd_offset(pvmw->pud, pvmw->address); + pmde = READ_ONCE(*pvmw->pmd); + if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) { + pvmw->ptl = pmd_lock(pvmw->vma->vm_mm, pvmw->pmd); + return true; + } else if (!pmd_present(pmde)) + return false; + + pvmw->ptl = pmd_lock(pvmw->vma->vm_mm, pvmw->pmd); + return true; +} + static inline bool pfn_is_match(struct page *page, unsigned long pfn) { unsigned long page_pfn = page_to_pfn(page); @@ -115,6 +137,57 @@ static bool check_pte(struct page_vma_mapped_walk *pvmw) return pfn_is_match(pvmw->page, pfn); } +/** + * check_pmd - check if @pvmw->page is mapped at the @pvmw->pmd + * + * page_vma_mapped_walk() found a place where @pvmw->page is *potentially* + * mapped. check_pmd() has to validate this. + * + * @pvmw->pmd may point to empty PMD, migraiton PMD, PMD pointing to arbitrary + * huge page, or PMD pointing to a PTE page table page. + * + * If PVMW_MIGRATION flag is set, returns PVM_LEAF_ENTRY if @pvmw->pmd contains + * migration entry that points to @pvmw->page. + * + * If PVMW_MIGRATION flag is not set, returns PVM_LEAF_ENTRY if @pvmw->pmd + * points to @pvmw->page. + * + * If @pvmw->pmd points to a PTE page table page, returns PVM_NONLEAF_ENTRY. + * + * Otherwise, return PVM_NOT_MAPPED. + * + */ +static enum check_pmd_result check_pmd(struct page_vma_mapped_walk *pvmw) +{ + unsigned long pfn; + + if (likely(pmd_trans_huge(*pvmw->pmd))) { + if (pvmw->flags & PVMW_MIGRATION) + return 0; + pfn = pmd_pfn(*pvmw->pmd); + if (!pfn_is_match(pvmw->page, pfn)) + return PVM_NOT_MAPPED; + return PVM_LEAF_ENTRY; + } else if (!pmd_present(*pvmw->pmd)) { + if (thp_migration_supported()) { + if (!(pvmw->flags & PVMW_MIGRATION)) + return 0; + if (is_migration_entry(pmd_to_swp_entry(*pvmw->pmd))) { + swp_entry_t entry = pmd_to_swp_entry(*pvmw->pmd); + + pfn = migration_entry_to_pfn(entry); + if (!pfn_is_match(pvmw->page, pfn)) + return PVM_NOT_MAPPED; + return PVM_LEAF_ENTRY; + } + } + return 0; + } + /* THP pmd was split under us: handle on pte level */ + spin_unlock(pvmw->ptl); + pvmw->ptl = NULL; + return PVM_NONLEAF_ENTRY; +} /** * page_vma_mapped_walk - check if @pvmw->page is mapped in @pvmw->vma at * @pvmw->address @@ -146,14 +219,14 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) pgd_t *pgd; p4d_t *p4d; pud_t pude; - pmd_t pmde; + enum check_pmd_result pmd_check_res; if (!pvmw->pte && !pvmw->pmd && pvmw->pud) return not_found(pvmw); /* The only possible pmd mapping has been handled on last iteration */ if (pvmw->pmd && !pvmw->pte) - return not_found(pvmw); + goto next_pmd; if (pvmw->pte) goto next_pte; @@ -202,42 +275,47 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) } else if (!pud_present(pude)) return false; - pvmw->pmd = pmd_offset(pvmw->pud, pvmw->address); - /* - * Make sure the pmd value isn't cached in a register by the - * compiler and used as a stale value after we've observed a - * subsequent update. - */ - pmde = READ_ONCE(*pvmw->pmd); - if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) { - pvmw->ptl = pmd_lock(mm, pvmw->pmd); - if (likely(pmd_trans_huge(*pvmw->pmd))) { - if (pvmw->flags & PVMW_MIGRATION) - return not_found(pvmw); - if (pmd_page(*pvmw->pmd) != page) - return not_found(pvmw); + if (!map_pmd(pvmw)) + goto next_pmd; + /* pmd locked after map_pmd */ + while (1) { + pmd_check_res = check_pmd(pvmw); + if (pmd_check_res == PVM_LEAF_ENTRY) return true; - } else if (!pmd_present(*pvmw->pmd)) { - if (thp_migration_supported()) { - if (!(pvmw->flags & PVMW_MIGRATION)) - return not_found(pvmw); - if (is_migration_entry(pmd_to_swp_entry(*pvmw->pmd))) { - swp_entry_t entry = pmd_to_swp_entry(*pvmw->pmd); - - if (migration_entry_to_page(entry) != page) - return not_found(pvmw); - return true; + else if (pmd_check_res == PVM_NONLEAF_ENTRY) + goto pte_level; +next_pmd: + /* Only PMD-mapped PUD THP has next pmd. */ + if (!(PageTransHuge(pvmw->page) && compound_order(pvmw->page) == HPAGE_PUD_ORDER)) + return not_found(pvmw); + do { + pvmw->address += HPAGE_PMD_SIZE; + if (pvmw->address >= pvmw->vma->vm_end || + pvmw->address >= + __vma_address(pvmw->page, pvmw->vma) + + thp_nr_pages(pvmw->page) * PAGE_SIZE) + return not_found(pvmw); + /* Did we cross page table boundary? */ + if (pvmw->address % PUD_SIZE == 0) { + /* + * Reset pmd here, so we will no stay at PMD + * level after restart. + */ + pvmw->pmd = NULL; + if (pvmw->ptl) { + spin_unlock(pvmw->ptl); + pvmw->ptl = NULL; } + goto restart; + } else { + pvmw->pmd++; } - return not_found(pvmw); - } else { - /* THP pmd was split under us: handle on pte level */ - spin_unlock(pvmw->ptl); - pvmw->ptl = NULL; - } - } else if (!pmd_present(pmde)) { - return false; + } while (pmd_none(*pvmw->pmd)); + + if (!pvmw->ptl) + pvmw->ptl = pmd_lock(mm, pvmw->pmd); } +pte_level: if (!map_pte(pvmw)) goto next_pte; while (1) { @@ -257,6 +335,12 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) /* Did we cross page table boundary? */ if (pvmw->address % PMD_SIZE == 0) { pte_unmap(pvmw->pte); + /* + * In the case of PTE-mapped PUD THP, next entry + * can be PMD. Reset pte here, so we will not + * stay at PTE level after restart. + */ + pvmw->pte = NULL; if (pvmw->ptl) { spin_unlock(pvmw->ptl); pvmw->ptl = NULL;