From patchwork Wed May 23 08:26:10 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10420589 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 9282C60224 for ; Wed, 23 May 2018 08:26:54 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8376128E3F for ; Wed, 23 May 2018 08:26:54 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 77E5A28E54; Wed, 23 May 2018 08:26:54 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A9EA128E3F for ; Wed, 23 May 2018 08:26:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D61AF6B000E; Wed, 23 May 2018 04:26:49 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id D0F3A6B0010; Wed, 23 May 2018 04:26:49 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BFFE46B0266; Wed, 23 May 2018 04:26:49 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf0-f198.google.com (mail-pf0-f198.google.com [209.85.192.198]) by kanga.kvack.org (Postfix) with ESMTP id 7CDDA6B000E for ; Wed, 23 May 2018 04:26:49 -0400 (EDT) Received: by mail-pf0-f198.google.com with SMTP id e16-v6so12797020pfn.5 for ; Wed, 23 May 2018 01:26:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=kFQmE/MnvCeSSvOQ9oFjZM1kGL4qknUy6YrJ8yiA6qM=; b=RVNuAb4UYwOTyDvJWVWre4hlAaBt6QVYZYsXRmrtC8kKU8kwdjVQNxnlowt18N87Sm tCKQ05QKtDgvvL0hl3vZIjkYP30UNt6IGN/xNXoOvQVL622nVcUBd9OhmB7k71RZBboc g7WOz7dgMYUUT33gV2NfjBTAFYdR74YcOpWhgiUGdKavuY99rzR7RI8OPfaoYpaym1P8 ITUgKNbIYiRDsI7Eeg1x/h3lSI+KMDox+LYJUULKNxBhcIwJYlpFW9/ys7jOW15BBLH+ cG3MxPnbAHZhjanQnzV2as6P1AE/H/hT8GEbVIwIVoyAKA24OqDBfJzSzCSy3E5ece64 A5oQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.151 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ALKqPwcvfVu50rwBJZz4/qqqTvZWCjSTZIjaCtuN9SFFRGTmo2gWs5+X ZozmZbDWmEIKVK2OZ0PDFLR5p6UakUQrYriba2hCwcFGpuBR/3gApLv746upITQGG7WRDVi8Ix1 VJ6dX1usX10DGI11me1afX0toVRUlr5116sWpBrmVZKnegWu59RKs3bMuAKboumkpPg== X-Received: by 2002:a63:4143:: with SMTP id o64-v6mr1560643pga.280.1527064009172; Wed, 23 May 2018 01:26:49 -0700 (PDT) X-Google-Smtp-Source: AB8JxZpS1epQbf2tL0YGfFufSHYTpENd5zMvJssTyZcDnMJJl90oDLKMz00ltqo5kKi5dgx8Tu+7 X-Received: by 2002:a63:4143:: with SMTP id o64-v6mr1560602pga.280.1527064008324; Wed, 23 May 2018 01:26:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527064008; cv=none; d=google.com; s=arc-20160816; b=UISju5u+Y7tQgfcdhBuO/2fipkgP56S5LVOXu+8ZsorObkneNOypPlwKVllvnHOFMS p6sfQKDp1/4A68p7xZr6koeP5gX40OWtx4R/XZbbtbRzh747r6XYWku/ey3161twAkM2 S1paE90jufLLSVWZf2FJl9CKeMLeAwrSzvhVy2bv6QLnJiksfLAdsRuqsKTRXYp1Fwsr 6KIzoQkWTPNrcUTNJfnVsOt+sdUV5s33PFApTXedOZE9yOopo7EY6muI3U/FPf5E+owb 0oBDcS0Kq7g2I01rGVMkeMm3Hqc8MhD2VYDMiFZLsQwKeHUSN02OTijEVVPB+6j+4q+i 03qA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=kFQmE/MnvCeSSvOQ9oFjZM1kGL4qknUy6YrJ8yiA6qM=; b=hVSYlIcsKs8KTmLt/xLg1+Yuvps7qyenaedQzuBDs0PoahxeaCrVITfmSBO++Zyn+s +Mc7+7FFmY68/k/R/bitdIjN70NyY0NyAZy5sDywhttfO5C1h0Vg9lBYS+Pp8c2j3VQI oeX5CSG26bp9OlFMqX74SAcExXLXnAh4OExv3tejLcf27kHS+cFsMxtpH7Z9OxJfMyZn 6+EMUFZmgijqR45/O7GtDyTLjJ1fKxSFznNDc9iHfN/cIJvygm2C2I4265GdtGAWGT/3 b0ZDv1wv/I7CyqrD6ZyE0lwPzcnb1XpuHY3q3ZV2iRrP3HE2Ve5JaXL0Ap96kd5EIFKI 7qEA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.151 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga17.intel.com (mga17.intel.com. [192.55.52.151]) by mx.google.com with ESMTPS id y16-v6si17687140pfm.140.2018.05.23.01.26.48 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 23 May 2018 01:26:48 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.151 as permitted sender) client-ip=192.55.52.151; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.151 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga107.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 23 May 2018 01:26:48 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,432,1520924400"; d="scan'208";a="57726044" Received: from yhuang6-ux31a.sh.intel.com ([10.239.197.97]) by fmsmga001.fm.intel.com with ESMTP; 23 May 2018 01:26:45 -0700 From: "Huang, Ying" To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan Subject: [PATCH -mm -V3 06/21] mm, THP, swap: Support PMD swap mapping when splitting huge PMD Date: Wed, 23 May 2018 16:26:10 +0800 Message-Id: <20180523082625.6897-7-ying.huang@intel.com> X-Mailer: git-send-email 2.16.1 In-Reply-To: <20180523082625.6897-1-ying.huang@intel.com> References: <20180523082625.6897-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Huang Ying A huge PMD need to be split when zap a part of the PMD mapping etc. If the PMD mapping is a swap mapping, we need to split it too. This patch implemented the support for this. This is similar as splitting the PMD page mapping, except we need to decrease the PMD swap mapping count for the huge swap cluster too. If the PMD swap mapping count becomes 0, the huge swap cluster will be split. Notice: is_huge_zero_pmd() and pmd_page() doesn't work well with swap PMD, so pmd_present() check is called before them. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan --- include/linux/swap.h | 6 ++++++ mm/huge_memory.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++----- mm/swapfile.c | 28 +++++++++++++++++++++++++ 3 files changed, 87 insertions(+), 5 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 7ed2c727c9b6..bb9de2cb952a 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -618,11 +618,17 @@ static inline swp_entry_t get_swap_page(struct page *page) #ifdef CONFIG_THP_SWAP extern int split_swap_cluster(swp_entry_t entry); +extern int split_swap_cluster_map(swp_entry_t entry); #else static inline int split_swap_cluster(swp_entry_t entry) { return 0; } + +static inline int split_swap_cluster_map(swp_entry_t entry) +{ + return 0; +} #endif #ifdef CONFIG_MEMCG diff --git a/mm/huge_memory.c b/mm/huge_memory.c index e9177363fe2e..84d5d8ff869e 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1602,6 +1602,47 @@ int do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd) return 0; } +#ifdef CONFIG_THP_SWAP +static void __split_huge_swap_pmd(struct vm_area_struct *vma, + unsigned long haddr, + pmd_t *pmd) +{ + struct mm_struct *mm = vma->vm_mm; + pgtable_t pgtable; + pmd_t _pmd; + swp_entry_t entry; + int i, soft_dirty; + + entry = pmd_to_swp_entry(*pmd); + soft_dirty = pmd_soft_dirty(*pmd); + + split_swap_cluster_map(entry); + + pgtable = pgtable_trans_huge_withdraw(mm, pmd); + pmd_populate(mm, &_pmd, pgtable); + + for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE, entry.val++) { + pte_t *pte, ptent; + + pte = pte_offset_map(&_pmd, haddr); + VM_BUG_ON(!pte_none(*pte)); + ptent = swp_entry_to_pte(entry); + if (soft_dirty) + ptent = pte_swp_mksoft_dirty(ptent); + set_pte_at(mm, haddr, pte, ptent); + pte_unmap(pte); + } + smp_wmb(); /* make pte visible before pmd */ + pmd_populate(mm, pmd, pgtable); +} +#else +static inline void __split_huge_swap_pmd(struct vm_area_struct *vma, + unsigned long haddr, + pmd_t *pmd) +{ +} +#endif + /* * Return true if we do MADV_FREE successfully on entire pmd page. * Otherwise, return false. @@ -2068,7 +2109,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, VM_BUG_ON(haddr & ~HPAGE_PMD_MASK); VM_BUG_ON_VMA(vma->vm_start > haddr, vma); VM_BUG_ON_VMA(vma->vm_end < haddr + HPAGE_PMD_SIZE, vma); - VM_BUG_ON(!is_pmd_migration_entry(*pmd) && !pmd_trans_huge(*pmd) + VM_BUG_ON(!is_swap_pmd(*pmd) && !pmd_trans_huge(*pmd) && !pmd_devmap(*pmd)); count_vm_event(THP_SPLIT_PMD); @@ -2090,8 +2131,11 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, put_page(page); add_mm_counter(mm, MM_FILEPAGES, -HPAGE_PMD_NR); return; - } else if (is_huge_zero_pmd(*pmd)) { + } else if (pmd_present(*pmd) && is_huge_zero_pmd(*pmd)) { /* + * is_huge_zero_pmd() may return true for PMD swap + * entry, so checking pmd_present() firstly. + * * FIXME: Do we want to invalidate secondary mmu by calling * mmu_notifier_invalidate_range() see comments below inside * __split_huge_pmd() ? @@ -2134,6 +2178,9 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, page = pfn_to_page(swp_offset(entry)); } else #endif + if (thp_swap_supported() && is_swap_pmd(old_pmd)) + return __split_huge_swap_pmd(vma, haddr, pmd); + else page = pmd_page(old_pmd); VM_BUG_ON_PAGE(!page_count(page), page); page_ref_add(page, HPAGE_PMD_NR - 1); @@ -2225,14 +2272,15 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, * pmd against. Otherwise we can end up replacing wrong page. */ VM_BUG_ON(freeze && !page); - if (page && page != pmd_page(*pmd)) - goto out; + /* pmd_page() should be called only if pmd_present() */ + if (page && (!pmd_present(*pmd) || page != pmd_page(*pmd))) + goto out; if (pmd_trans_huge(*pmd)) { page = pmd_page(*pmd); if (PageMlocked(page)) clear_page_mlock(page); - } else if (!(pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd))) + } else if (!(pmd_devmap(*pmd) || is_swap_pmd(*pmd))) goto out; __split_huge_pmd_locked(vma, pmd, haddr, freeze); out: diff --git a/mm/swapfile.c b/mm/swapfile.c index 46117d6913ad..05f53c4c0cfe 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -4046,6 +4046,34 @@ static void free_swap_count_continuations(struct swap_info_struct *si) } } +#ifdef CONFIG_THP_SWAP +/* The corresponding page table shouldn't be changed under us */ +int split_swap_cluster_map(swp_entry_t entry) +{ + struct swap_info_struct *si; + struct swap_cluster_info *ci; + unsigned long offset = swp_offset(entry); + + VM_BUG_ON(!is_cluster_offset(offset)); + si = _swap_info_get(entry); + if (!si) + return -EBUSY; + ci = lock_cluster(si, offset); + /* The swap cluster has been split by someone else */ + if (!cluster_is_huge(ci)) + goto out; + cluster_set_count(ci, cluster_count(ci) - 1); + VM_BUG_ON(cluster_count(ci) < SWAPFILE_CLUSTER); + if (cluster_count(ci) == SWAPFILE_CLUSTER && + !(si->swap_map[offset] & SWAP_HAS_CACHE)) + cluster_clear_huge(ci); + +out: + unlock_cluster(ci); + return 0; +} +#endif + static int __init swapfile_init(void) { int nid;