From patchwork Mon Nov 30 05:08:44 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 7721031 Return-Path: X-Original-To: patchwork-linux-nvdimm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id BB2BABEEE1 for ; Mon, 30 Nov 2015 05:09:15 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 9F31620663 for ; Mon, 30 Nov 2015 05:09:14 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7D8482065E for ; Mon, 30 Nov 2015 05:09:13 +0000 (UTC) Received: from ml01.vlan14.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 728441A1F4B; Sun, 29 Nov 2015 21:09:13 -0800 (PST) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by ml01.01.org (Postfix) with ESMTP id 445B21A1F4B for ; Sun, 29 Nov 2015 21:09:12 -0800 (PST) Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga101.jf.intel.com with ESMTP; 29 Nov 2015 21:09:12 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.20,363,1444719600"; d="scan'208";a="609704025" Received: from dwillia2-desk3.jf.intel.com ([10.54.39.136]) by FMSMGA003.fm.intel.com with ESMTP; 29 Nov 2015 21:09:11 -0800 Subject: [RFC PATCH 2/5] mm, dax: dax-pmd vs thp-pmd vs hugetlbfs-pmd From: Dan Williams To: linux-mm@kvack.org Date: Sun, 29 Nov 2015 21:08:44 -0800 Message-ID: <20151130050844.18366.61858.stgit@dwillia2-desk3.jf.intel.com> In-Reply-To: <20151130050833.18366.21963.stgit@dwillia2-desk3.jf.intel.com> References: <20151130050833.18366.21963.stgit@dwillia2-desk3.jf.intel.com> User-Agent: StGit/0.17.1-9-g687f MIME-Version: 1.0 Cc: Andrea Arcangeli , Dave Hansen , toshi.kani@hp.com, linux-nvdimm@lists.01.org, Peter Zijlstra , Mel Gorman , Andrew Morton , "Kirill A. Shutemov" X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_LOW, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP A dax-huge-page mapping while it uses some thp helpers is ultimately not a transparent huge page. The distinction is especially important in the get_user_pages() path. pmd_devmap() is used to distinguish dax-pmds from pmd_huge() and pmd_trans_huge() which have slightly different semantics. Explicitly mark the pmd_trans_huge() helpers that dax needs by adding pmd_devmap() checks. Also, before we introduce usages of pmd_pfn() in common code, include a definition for archs that have not needed it to date. Cc: Dave Hansen Cc: Mel Gorman Cc: Peter Zijlstra Cc: Andrea Arcangeli Cc: Matthew Wilcox Cc: Andrew Morton Cc: Kirill A. Shutemov Signed-off-by: Dan Williams --- arch/ia64/include/asm/pgtable.h | 1 + arch/sh/include/asm/pgtable-3level.h | 1 + arch/x86/include/asm/pgtable.h | 2 +- include/linux/huge_mm.h | 3 ++- mm/huge_memory.c | 23 +++++++++++++---------- mm/memory.c | 8 ++++---- 6 files changed, 22 insertions(+), 16 deletions(-) diff --git a/arch/ia64/include/asm/pgtable.h b/arch/ia64/include/asm/pgtable.h index 9f3ed9ee8f13..81d2af23958f 100644 --- a/arch/ia64/include/asm/pgtable.h +++ b/arch/ia64/include/asm/pgtable.h @@ -273,6 +273,7 @@ extern unsigned long VMALLOC_END; #define pmd_clear(pmdp) (pmd_val(*(pmdp)) = 0UL) #define pmd_page_vaddr(pmd) ((unsigned long) __va(pmd_val(pmd) & _PFN_MASK)) #define pmd_page(pmd) virt_to_page((pmd_val(pmd) + PAGE_OFFSET)) +#define pmd_pfn(pmd) (pmd_val(pmd) >> PAGE_SHIFT) #define pud_none(pud) (!pud_val(pud)) #define pud_bad(pud) (!ia64_phys_addr_valid(pud_val(pud))) diff --git a/arch/sh/include/asm/pgtable-3level.h b/arch/sh/include/asm/pgtable-3level.h index 249a985d9648..bb29a80fb40e 100644 --- a/arch/sh/include/asm/pgtable-3level.h +++ b/arch/sh/include/asm/pgtable-3level.h @@ -29,6 +29,7 @@ typedef struct { unsigned long long pmd; } pmd_t; #define pmd_val(x) ((x).pmd) +#define pmd_pfn(x) ((pmd_val(x) & PMD_MASK) >> PAGE_SHIFT) #define __pmd(x) ((pmd_t) { (x) } ) static inline unsigned long pud_page_vaddr(pud_t pud) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 02096a5dec2a..d5747ada2a76 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -178,7 +178,7 @@ static inline int pmd_trans_splitting(pmd_t pmd) static inline int pmd_trans_huge(pmd_t pmd) { - return pmd_val(pmd) & _PAGE_PSE; + return (pmd_val(pmd) & (_PAGE_PSE|_PAGE_DEVMAP)) == _PAGE_PSE; } static inline int has_transparent_hugepage(void) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index d218abedfeb9..9c9c1688889a 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -105,7 +105,8 @@ extern void __split_huge_page_pmd(struct vm_area_struct *vma, #define split_huge_page_pmd(__vma, __address, __pmd) \ do { \ pmd_t *____pmd = (__pmd); \ - if (unlikely(pmd_trans_huge(*____pmd))) \ + if (unlikely(pmd_trans_huge(*____pmd) \ + || pmd_devmap(*____pmd))) \ __split_huge_page_pmd(__vma, __address, \ ____pmd); \ } while (0) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 6b506df659ec..329cedf48b8a 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -933,7 +933,7 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, ret = -EAGAIN; pmd = *src_pmd; - if (unlikely(!pmd_trans_huge(pmd))) { + if (unlikely(!pmd_trans_huge(pmd) && !pmd_devmap(pmd))) { pte_free(dst_mm, pgtable); goto out_unlock; } @@ -965,17 +965,20 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, wait_split_huge_page(vma->anon_vma, src_pmd); /* src_vma */ goto out; } - src_page = pmd_page(pmd); - VM_BUG_ON_PAGE(!PageHead(src_page), src_page); - get_page(src_page); - page_dup_rmap(src_page); - add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); + if (pmd_trans_huge(pmd)) { + /* thp accounting separate from pmd_devmap accounting */ + src_page = pmd_page(pmd); + VM_BUG_ON_PAGE(!PageHead(src_page), src_page); + get_page(src_page); + page_dup_rmap(src_page); + add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); + pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); + atomic_long_inc(&dst_mm->nr_ptes); + } pmdp_set_wrprotect(src_mm, addr, src_pmd); pmd = pmd_mkold(pmd_wrprotect(pmd)); - pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); set_pmd_at(dst_mm, addr, dst_pmd, pmd); - atomic_long_inc(&dst_mm->nr_ptes); ret = 0; out_unlock: @@ -1599,7 +1602,7 @@ int __pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma, spinlock_t **ptl) { *ptl = pmd_lock(vma->vm_mm, pmd); - if (likely(pmd_trans_huge(*pmd))) { + if (likely(pmd_trans_huge(*pmd) || pmd_devmap(*pmd))) { if (unlikely(pmd_trans_splitting(*pmd))) { spin_unlock(*ptl); wait_split_huge_page(vma->anon_vma, pmd); @@ -2975,7 +2978,7 @@ void __split_huge_page_pmd(struct vm_area_struct *vma, unsigned long address, again: mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end); ptl = pmd_lock(mm, pmd); - if (unlikely(!pmd_trans_huge(*pmd))) + if (unlikely(!pmd_trans_huge(*pmd) && !pmd_devmap(*pmd))) goto unlock; if (vma_is_dax(vma)) { pmd_t _pmd = pmdp_huge_clear_flush_notify(vma, haddr, pmd); diff --git a/mm/memory.c b/mm/memory.c index 6a34be836a3b..5b9c6bab80d1 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -961,7 +961,7 @@ static inline int copy_pmd_range(struct mm_struct *dst_mm, struct mm_struct *src src_pmd = pmd_offset(src_pud, addr); do { next = pmd_addr_end(addr, end); - if (pmd_trans_huge(*src_pmd)) { + if (pmd_trans_huge(*src_pmd) || pmd_devmap(*src_pmd)) { int err; VM_BUG_ON(next-addr != HPAGE_PMD_SIZE); err = copy_huge_pmd(dst_mm, src_mm, @@ -1193,7 +1193,7 @@ static inline unsigned long zap_pmd_range(struct mmu_gather *tlb, pmd = pmd_offset(pud, addr); do { next = pmd_addr_end(addr, end); - if (pmd_trans_huge(*pmd)) { + if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) { if (next - addr != HPAGE_PMD_SIZE) { #ifdef CONFIG_DEBUG_VM if (!rwsem_is_locked(&tlb->mm->mmap_sem)) { @@ -3366,7 +3366,7 @@ static int __handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma, int ret; barrier(); - if (pmd_trans_huge(orig_pmd)) { + if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) { unsigned int dirty = flags & FAULT_FLAG_WRITE; /* @@ -3403,7 +3403,7 @@ static int __handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma, unlikely(__pte_alloc(mm, vma, pmd, address))) return VM_FAULT_OOM; /* if an huge pmd materialized from under us just retry later */ - if (unlikely(pmd_trans_huge(*pmd))) + if (unlikely(pmd_trans_huge(*pmd) || pmd_devmap(*pmd))) return 0; /* * A regular pmd is established and it can't morph into a huge pmd