From patchwork Tue Sep 25 07:13:33 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10613501 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EB2CF161F for ; Tue, 25 Sep 2018 07:13:51 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9B84D28346 for ; Tue, 25 Sep 2018 07:13:51 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8F16029B13; Tue, 25 Sep 2018 07:13:51 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CDE8628346 for ; Tue, 25 Sep 2018 07:13:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2C2298E006F; Tue, 25 Sep 2018 03:13:48 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 24B6D8E0041; Tue, 25 Sep 2018 03:13:48 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 075B58E006F; Tue, 25 Sep 2018 03:13:48 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) by kanga.kvack.org (Postfix) with ESMTP id B57908E0041 for ; Tue, 25 Sep 2018 03:13:47 -0400 (EDT) Received: by mail-pf1-f198.google.com with SMTP id x85-v6so11948000pfe.13 for ; Tue, 25 Sep 2018 00:13:47 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=dlQlA87oJTdqX12aTN6umkiUPMxs7smtDND6snMEmNQ=; b=KvwcSnLep+Y1lEPQwCOziCj1cEVtzd6+lZdA1VzZjwQD2IyvPtibN+sp7R2YRv0Nhf EF9e+9Qddx0FZP47ttbWa8GzOnIgf80MEwK2FkPdBSzGrBTnLTwqIL+0HYw3yNFMAksp cvnwU+GA+CP8j88q+BBDAcFRulc2iZohVcbPgOrnCzGuOZI2XUWCR9dvC7VM82RqwbRU MijhZ2aP/2/9J4DyvpJsNQkjDC+BJA68iqu3fyh/nSO/R05WXh09Ai7LAA66Y4gtsZBZ JoswySYwwu4M8jWiYZVmdpHJjjTcveNaRpzJ1UEzGcQRkW33M2m+tS6NuiB8DeCtI0Il 52mg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfogbSRprk/0wCW9orYum07bA1All2Ya38K9IserVs+hAHK2Ymmv1 8SLvw5UL74xvAXmXefrePrVnSfSZ5up8Q0kEMJppxFkNTArJaLDrVN8KBuosLx0NfyRMz3nhho0 SoI0GcQf5GH+0bh4DdbB0SvESCy0MHXlNvsBEhOpjWxX4p1lshodDuSri/6VZfSC12g== X-Received: by 2002:a63:aa48:: with SMTP id x8-v6mr1989396pgo.87.1537859627411; Tue, 25 Sep 2018 00:13:47 -0700 (PDT) X-Google-Smtp-Source: ACcGV62wYqVJmIiLKOo0TK5KWAU1XXM30JoSo3EUJ81OPRORkrqnFwGK+TX5Rmk2hm9aalu1aGD7 X-Received: by 2002:a63:aa48:: with SMTP id x8-v6mr1989332pgo.87.1537859626449; Tue, 25 Sep 2018 00:13:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537859626; cv=none; d=google.com; s=arc-20160816; b=0KZTx/2wDg9j7havVoDkeG4FgmAa8swPkxdABYszCKewdn00+ldUbycUHeRmR1lPli F0Bx95mZElFdRgdAZt3chOFTyisMRrebm0c//ICHD0+bBNYZ/Je8CZEEVYaLCy5WzSqN SSVLwuL6LBxzlJAtEf1Ds9TJmCferQsO8CLlP4HSpydFM62+29/+a1gTMVO6m/5dLoHU B7eJfzyWtGO58rPINYW5nUrKjsl76RO2SQ49R78smUyEKp0DZbanfSwfXK/U/JgUbx6B CdS1NtYwQVebDhf02RWIrZ59HWp+IS/DOv3n91dFO0FM6WCkXNvvPwv7Dsxj1BIIpTtI LDLQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=dlQlA87oJTdqX12aTN6umkiUPMxs7smtDND6snMEmNQ=; b=08XqtLcYJnKi2xyFBKPl9/S9OHSS45TM79SE/g54rozSKsRKHbTVkDJkFqa+GwMKB6 1HMCiNHP+hyjlf3GeQ6cBB8S/E826zWyD1oNYeQ50NZSlf/xpY6OToDQi42zfnp896OW oFqhopK6TzVLhFGX96wsG13fCaaNPxatHZIigiQLRO59y2adrj7QWMs5XP44IGq9/oqm 3Bpmw7Li6kZhvIP5CFAdHvKRwk8Tk+lcDtBN4t+dANdeBG8NhGL1t0ZUimtwLXyvMgCf 8I9iuhDT8uWTfZjs+Lyc9rqdp4ZJITKgyYnR+1N1j0imKIauHJk44m5bB2QDbxQoins8 /YuA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga14.intel.com (mga14.intel.com. [192.55.52.115]) by mx.google.com with ESMTPS id h18-v6si1501855pgl.398.2018.09.25.00.13.46 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 25 Sep 2018 00:13:46 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) client-ip=192.55.52.115; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 25 Sep 2018 00:13:46 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,301,1534834800"; d="scan'208";a="89093533" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 25 Sep 2018 00:13:42 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 RESEND 06/21] swap: Support PMD swap mapping when splitting huge PMD Date: Tue, 25 Sep 2018 15:13:33 +0800 Message-Id: <20180925071348.31458-7-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180925071348.31458-1-ying.huang@intel.com> References: <20180925071348.31458-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP A huge PMD need to be split when zap a part of the PMD mapping etc. If the PMD mapping is a swap mapping, we need to split it too. This patch implemented the support for this. This is similar as splitting the PMD page mapping, except we need to decrease the PMD swap mapping count for the huge swap cluster too. If the PMD swap mapping count becomes 0, the huge swap cluster will be split. Notice: is_huge_zero_pmd() and pmd_page() doesn't work well with swap PMD, so pmd_present() check is called before them. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/huge_mm.h | 4 ++++ include/linux/swap.h | 6 ++++++ mm/huge_memory.c | 48 +++++++++++++++++++++++++++++++++++++++++++----- mm/swapfile.c | 32 ++++++++++++++++++++++++++++++++ 4 files changed, 85 insertions(+), 5 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 99c19b06d9a4..0f3e1739986f 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -226,6 +226,10 @@ static inline bool is_huge_zero_page(struct page *page) return READ_ONCE(huge_zero_page) == page; } +/* + * is_huge_zero_pmd() must be called after checking pmd_present(), + * otherwise, it may report false positive for PMD swap entry. + */ static inline bool is_huge_zero_pmd(pmd_t pmd) { return is_huge_zero_page(pmd_page(pmd)); diff --git a/include/linux/swap.h b/include/linux/swap.h index 88eb06eb1444..5bd54b6fd4a1 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -618,11 +618,17 @@ static inline swp_entry_t get_swap_page(struct page *page) #ifdef CONFIG_THP_SWAP extern int split_swap_cluster(swp_entry_t entry); +extern int split_swap_cluster_map(swp_entry_t entry); #else static inline int split_swap_cluster(swp_entry_t entry) { return 0; } + +static inline int split_swap_cluster_map(swp_entry_t entry) +{ + return 0; +} #endif #ifdef CONFIG_MEMCG diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 63edf18ca9cc..3ea7318fcdcd 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1604,6 +1604,40 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd) return 0; } +/* Convert a PMD swap mapping to a set of PTE swap mappings */ +static void __split_huge_swap_pmd(struct vm_area_struct *vma, + unsigned long haddr, + pmd_t *pmd) +{ + struct mm_struct *mm = vma->vm_mm; + pgtable_t pgtable; + pmd_t _pmd; + swp_entry_t entry; + int i, soft_dirty; + + entry = pmd_to_swp_entry(*pmd); + soft_dirty = pmd_soft_dirty(*pmd); + + split_swap_cluster_map(entry); + + pgtable = pgtable_trans_huge_withdraw(mm, pmd); + pmd_populate(mm, &_pmd, pgtable); + + for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE, entry.val++) { + pte_t *pte, ptent; + + pte = pte_offset_map(&_pmd, haddr); + VM_BUG_ON(!pte_none(*pte)); + ptent = swp_entry_to_pte(entry); + if (soft_dirty) + ptent = pte_swp_mksoft_dirty(ptent); + set_pte_at(mm, haddr, pte, ptent); + pte_unmap(pte); + } + smp_wmb(); /* make pte visible before pmd */ + pmd_populate(mm, pmd, pgtable); +} + /* * Return true if we do MADV_FREE successfully on entire pmd page. * Otherwise, return false. @@ -2070,7 +2104,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, VM_BUG_ON(haddr & ~HPAGE_PMD_MASK); VM_BUG_ON_VMA(vma->vm_start > haddr, vma); VM_BUG_ON_VMA(vma->vm_end < haddr + HPAGE_PMD_SIZE, vma); - VM_BUG_ON(!is_pmd_migration_entry(*pmd) && !pmd_trans_huge(*pmd) + VM_BUG_ON(!is_swap_pmd(*pmd) && !pmd_trans_huge(*pmd) && !pmd_devmap(*pmd)); count_vm_event(THP_SPLIT_PMD); @@ -2094,7 +2128,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, put_page(page); add_mm_counter(mm, mm_counter_file(page), -HPAGE_PMD_NR); return; - } else if (is_huge_zero_pmd(*pmd)) { + } else if (pmd_present(*pmd) && is_huge_zero_pmd(*pmd)) { /* * FIXME: Do we want to invalidate secondary mmu by calling * mmu_notifier_invalidate_range() see comments below inside @@ -2138,6 +2172,9 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, page = pfn_to_page(swp_offset(entry)); } else #endif + if (IS_ENABLED(CONFIG_THP_SWAP) && is_swap_pmd(old_pmd)) + return __split_huge_swap_pmd(vma, haddr, pmd); + else page = pmd_page(old_pmd); VM_BUG_ON_PAGE(!page_count(page), page); page_ref_add(page, HPAGE_PMD_NR - 1); @@ -2229,14 +2266,15 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, * pmd against. Otherwise we can end up replacing wrong page. */ VM_BUG_ON(freeze && !page); - if (page && page != pmd_page(*pmd)) - goto out; + /* pmd_page() should be called only if pmd_present() */ + if (page && (!pmd_present(*pmd) || page != pmd_page(*pmd))) + goto out; if (pmd_trans_huge(*pmd)) { page = pmd_page(*pmd); if (PageMlocked(page)) clear_page_mlock(page); - } else if (!(pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd))) + } else if (!(pmd_devmap(*pmd) || is_swap_pmd(*pmd))) goto out; __split_huge_pmd_locked(vma, pmd, haddr, freeze); out: diff --git a/mm/swapfile.c b/mm/swapfile.c index e06cc1581d1e..16723b9d971a 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -4034,6 +4034,38 @@ void mem_cgroup_throttle_swaprate(struct mem_cgroup *memcg, int node, } #endif +#ifdef CONFIG_THP_SWAP +/* + * The corresponding page table shouldn't be changed under us, that + * is, the page table lock should be held. + */ +int split_swap_cluster_map(swp_entry_t entry) +{ + struct swap_info_struct *si; + struct swap_cluster_info *ci; + unsigned long offset = swp_offset(entry); + + VM_BUG_ON(!IS_ALIGNED(offset, SWAPFILE_CLUSTER)); + si = _swap_info_get(entry); + if (!si) + return -EBUSY; + ci = lock_cluster(si, offset); + /* The swap cluster has been split by someone else, we are done */ + if (!cluster_is_huge(ci)) + goto out; + cluster_add_swapcount(ci, -1); + /* + * If the last PMD swap mapping has gone and the THP isn't in + * swap cache, the huge swap cluster will be split. + */ + if (!cluster_swapcount(ci) && !(si->swap_map[offset] & SWAP_HAS_CACHE)) + cluster_clear_huge(ci); +out: + unlock_cluster(ci); + return 0; +} +#endif + static int __init swapfile_init(void) { int nid;