From patchwork Wed May 9 08:38:34 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10388659 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 2C84460170 for ; Wed, 9 May 2018 08:39:39 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C2EF828E13 for ; Wed, 9 May 2018 08:39:38 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B762328E28; Wed, 9 May 2018 08:39:38 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E00ED28E13 for ; Wed, 9 May 2018 08:39:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F0B936B049D; Wed, 9 May 2018 04:39:33 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id EE3DD6B049F; Wed, 9 May 2018 04:39:33 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D59D36B04A0; Wed, 9 May 2018 04:39:33 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl0-f70.google.com (mail-pl0-f70.google.com [209.85.160.70]) by kanga.kvack.org (Postfix) with ESMTP id 9712C6B049D for ; Wed, 9 May 2018 04:39:33 -0400 (EDT) Received: by mail-pl0-f70.google.com with SMTP id b31-v6so3379366plb.5 for ; Wed, 09 May 2018 01:39:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=yFJ2g1wWpLar3l9tacV/4YT1HCvFvOgXt03pzPYnnbw=; b=gnY0BxpIU5LetdFzzhFYMfpMwFTCu23FrSaanCWiuJpdMCpKhGsOejs8A9PQFZ32+p hcYuIbLv6dv+d/YVpp5FdguImB2/qcYm7RLIlqRFqgLzJVk8lwGBmMAxlVNplmTXLekW sl+KCFcbO5SIuqW0gEqli02fsrNoyuy1WEWZMg7CKWBRzp17Q4njao7wMTKyHn3jcBEr ojZOdF6SrhZsG6olsaF6ze63X/lxqPTRL46XfN7rJDoNFiq6TkPJ+T9wysjDlssCpSj+ eNolacJoW0uM5LUbYOix4DgF8H+0/IEfRwsp3kXg7nJ8KWOZemsyTjRQUhLB/H6Cpbw5 450Q== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=ying.huang@intel.com X-Gm-Message-State: ALQs6tAq9aSIlCn4FXtbKLMKicFoTDDVXNIi6x94Wbb+yPm0iAW/UgRR BtUTdvaP4+PysN3dlM9MfFbrhDQJBBjZTJApoLKQqH3G1AvT29nPCSAoWZ64UheSYplscRrPSOM +uufOSok0xf2ZzTttliOGTETy4F2TYMrk8K6R3pUKZV4XQiHsuw+4qba3PY1MwEy9Xg== X-Received: by 2002:a17:902:3381:: with SMTP id b1-v6mr44853919plc.248.1525855173272; Wed, 09 May 2018 01:39:33 -0700 (PDT) X-Google-Smtp-Source: AB8JxZruSRgsVW+OmkUlDLm0XC4YgT3l2ZUahAoSwGZYgY0YgkejQkrkNC74F9M1vN/gja3tqyS3 X-Received: by 2002:a17:902:3381:: with SMTP id b1-v6mr44853881plc.248.1525855172467; Wed, 09 May 2018 01:39:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525855172; cv=none; d=google.com; s=arc-20160816; b=qCeCQDmRe0UugVYR4rhbh02DpWXTz/OsQcgxV4o/rxZsn3/PVjnAOsnVr3elfbR4KU 1GQJwf4IToRA0SYap4o5wSRow2OWk+CzbNm5j+rBQzNFagvn80ntw1xdHEaUiNIuFxpN cDKxtZwz1PB0I5d+GkwDdk9tosW1PFH1O2fMiNDeyHhAZsOok7CLEIDavBmqLVr0AGDM AlWNOh6OzxgzwHAOtzx4qa81N6B+GU1mHknfu2XuJc06naulW7TbEQFmUIzt8dGlJuuJ aE5JPU1Gs/woCJTtILg70GYC7PhI9sg1i52ElbiU8pt7TCdizKuYR/1yek+nVvFulNI8 etIw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=yFJ2g1wWpLar3l9tacV/4YT1HCvFvOgXt03pzPYnnbw=; b=Mwh0+zymlgd6ORAvdqFJ20vpECyMW5g/C7zrJxLNJNzgHOKxHTje+LMWVI4OkZNVCS UHti9drS2YIejtorqEjtpusRWQGi4zc+k4FSvCOPy2gnKPBeaOkDDaaOPmk7mu+5FZbz 8ql7NT+52kgoLp0iyAK0DW/gTjBlFhehl3ELsJtGmtKRElVyBda0cZItuwhAyqc/p7hF FGWNnCFvM30amWxP7DNE8dljjIB4O2k0ad8GlyS3Pp/z2a06WWC1G4gPGVRx4R8UwaIU jfwi+/uMHEbBeQFxKH1WkHrVUNYbMxW6EhiITtQTOE23OWyQtJK12ULl8kPGQn91M+sD tAJw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=ying.huang@intel.com Received: from mga18.intel.com (mga18.intel.com. [134.134.136.126]) by mx.google.com with ESMTPS id y11-v6si18624861pgv.473.2018.05.09.01.39.32 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 09 May 2018 01:39:32 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) client-ip=134.134.136.126; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=ying.huang@intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 09 May 2018 01:39:32 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,381,1520924400"; d="scan'208";a="52769628" Received: from yhuang-gentoo.sh.intel.com ([10.239.193.148]) by fmsmga004.fm.intel.com with ESMTP; 09 May 2018 01:39:28 -0700 From: "Huang, Ying" To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan Subject: [PATCH -mm -V2 09/21] mm, THP, swap: Swapin a THP as a whole Date: Wed, 9 May 2018 16:38:34 +0800 Message-Id: <20180509083846.14823-10-ying.huang@intel.com> X-Mailer: git-send-email 2.16.1 In-Reply-To: <20180509083846.14823-1-ying.huang@intel.com> References: <20180509083846.14823-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Huang Ying With this patch, when page fault handler find a PMD swap mapping, it will swap in a THP as a whole. This avoids the overhead of splitting/collapsing before/after the THP swapping. And improves the swap performance greatly for reduced page fault count etc. do_huge_pmd_swap_page() is added in the patch to implement this. It is similar to do_swap_page() for normal page swapin. If failing to allocate a THP, the huge swap cluster and the PMD swap mapping will be split to fallback to normal page swapin. If the huge swap cluster has been split already, the PMD swap mapping will be split to fallback to normal page swapin. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan --- include/linux/huge_mm.h | 9 +++ include/linux/swap.h | 9 +++ mm/huge_memory.c | 170 ++++++++++++++++++++++++++++++++++++++++++++++++ mm/memory.c | 16 +++-- 4 files changed, 198 insertions(+), 6 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 0dbfbe34b01a..f5348d072351 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -402,4 +402,13 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +#ifdef CONFIG_THP_SWAP +extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd); +#else /* CONFIG_THP_SWAP */ +static inline int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) +{ + return 0; +} +#endif /* CONFIG_THP_SWAP */ + #endif /* _LINUX_HUGE_MM_H */ diff --git a/include/linux/swap.h b/include/linux/swap.h index d2e017dd7bbd..5832a750baed 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -560,6 +560,15 @@ static inline struct page *lookup_swap_cache(swp_entry_t swp, return NULL; } +static inline struct page *read_swap_cache_async(swp_entry_t swp, + gfp_t gft_mask, + struct vm_area_struct *vma, + unsigned long addr, + bool do_poll) +{ + return NULL; +} + static inline int add_to_swap(struct page *page) { return 0; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 9581adae1c77..de6a32226121 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -33,6 +33,8 @@ #include #include #include +#include +#include #include #include @@ -1612,6 +1614,174 @@ static void __split_huge_swap_pmd(struct vm_area_struct *vma, smp_wmb(); /* make pte visible before pmd */ pmd_populate(mm, pmd, pgtable); } + +static int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, pmd_t orig_pmd) +{ + struct mm_struct *mm = vma->vm_mm; + spinlock_t *ptl; + int ret = 0; + + ptl = pmd_lock(mm, pmd); + if (pmd_same(*pmd, orig_pmd)) + __split_huge_swap_pmd(vma, address & HPAGE_PMD_MASK, pmd); + else + ret = -ENOENT; + spin_unlock(ptl); + + return ret; +} + +int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) +{ + struct page *page; + struct mem_cgroup *memcg; + struct vm_area_struct *vma = vmf->vma; + unsigned long haddr = vmf->address & HPAGE_PMD_MASK; + swp_entry_t entry; + pmd_t pmd; + int i, locked, exclusive = 0, ret = 0; + + entry = pmd_to_swp_entry(orig_pmd); + VM_BUG_ON(non_swap_entry(entry)); + delayacct_set_flag(DELAYACCT_PF_SWAPIN); +retry: + page = lookup_swap_cache(entry, NULL, vmf->address); + if (!page) { + page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE, vma, + haddr, false); + if (!page) { + /* + * Back out if somebody else faulted in this pmd + * while we released the pmd lock. + */ + if (likely(pmd_same(*vmf->pmd, orig_pmd))) { + ret = split_swap_cluster(entry, false); + /* + * Retry if somebody else swap in the swap + * entry + */ + if (ret == -EEXIST) { + ret = 0; + goto retry; + /* swapoff occurs under us */ + } else if (ret == -EINVAL) + ret = 0; + else + goto fallback; + } + delayacct_clear_flag(DELAYACCT_PF_SWAPIN); + goto out; + } + + /* Had to read the page from swap area: Major fault */ + ret = VM_FAULT_MAJOR; + count_vm_event(PGMAJFAULT); + count_memcg_event_mm(vma->vm_mm, PGMAJFAULT); + } else if (!PageTransCompound(page)) + goto fallback; + + locked = lock_page_or_retry(page, vma->vm_mm, vmf->flags); + + delayacct_clear_flag(DELAYACCT_PF_SWAPIN); + if (!locked) { + ret |= VM_FAULT_RETRY; + goto out_release; + } + + /* + * Make sure try_to_free_swap or reuse_swap_page or swapoff did not + * release the swapcache from under us. The page pin, and pmd_same + * test below, are not enough to exclude that. Even if it is still + * swapcache, we need to check that the page's swap has not changed. + */ + if (unlikely(!PageSwapCache(page) || page_private(page) != entry.val)) + goto out_page; + + if (mem_cgroup_try_charge(page, vma->vm_mm, GFP_KERNEL, + &memcg, true)) { + ret = VM_FAULT_OOM; + goto out_page; + } + + /* + * Back out if somebody else already faulted in this pmd. + */ + vmf->ptl = pmd_lockptr(vma->vm_mm, vmf->pmd); + spin_lock(vmf->ptl); + if (unlikely(!pmd_same(*vmf->pmd, orig_pmd))) + goto out_nomap; + + if (unlikely(!PageUptodate(page))) { + ret = VM_FAULT_SIGBUS; + goto out_nomap; + } + + /* + * The page isn't present yet, go ahead with the fault. + * + * Be careful about the sequence of operations here. + * To get its accounting right, reuse_swap_page() must be called + * while the page is counted on swap but not yet in mapcount i.e. + * before page_add_anon_rmap() and swap_free(); try_to_free_swap() + * must be called after the swap_free(), or it will never succeed. + */ + + add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR); + add_mm_counter(vma->vm_mm, MM_SWAPENTS, -HPAGE_PMD_NR); + pmd = mk_huge_pmd(page, vma->vm_page_prot); + if ((vmf->flags & FAULT_FLAG_WRITE) && reuse_swap_page(page, NULL)) { + pmd = maybe_pmd_mkwrite(pmd_mkdirty(pmd), vma); + vmf->flags &= ~FAULT_FLAG_WRITE; + ret |= VM_FAULT_WRITE; + exclusive = RMAP_EXCLUSIVE; + } + for (i = 0; i < HPAGE_PMD_NR; i++) + flush_icache_page(vma, page + i); + if (pmd_swp_soft_dirty(orig_pmd)) + pmd = pmd_mksoft_dirty(pmd); + do_page_add_anon_rmap(page, vma, haddr, + exclusive | RMAP_COMPOUND); + mem_cgroup_commit_charge(page, memcg, true, true); + activate_page(page); + set_pmd_at(vma->vm_mm, haddr, vmf->pmd, pmd); + + swap_free(entry, true); + if (mem_cgroup_swap_full(page) || + (vma->vm_flags & VM_LOCKED) || PageMlocked(page)) + try_to_free_swap(page); + unlock_page(page); + + if (vmf->flags & FAULT_FLAG_WRITE) { + ret |= do_huge_pmd_wp_page(vmf, pmd); + if (ret & VM_FAULT_ERROR) + ret &= VM_FAULT_ERROR; + goto out; + } + + /* No need to invalidate - it was non-present before */ + update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); + spin_unlock(vmf->ptl); +out: + return ret; +out_nomap: + mem_cgroup_cancel_charge(page, memcg, true); + spin_unlock(vmf->ptl); +out_page: + unlock_page(page); +out_release: + put_page(page); + return ret; +fallback: + delayacct_clear_flag(DELAYACCT_PF_SWAPIN); + if (!split_huge_swap_pmd(vmf->vma, vmf->pmd, vmf->address, orig_pmd)) + ret = VM_FAULT_FALLBACK; + else + ret = 0; + if (page) + put_page(page); + return ret; +} #else static inline void __split_huge_swap_pmd(struct vm_area_struct *vma, unsigned long haddr, diff --git a/mm/memory.c b/mm/memory.c index 835111148dad..bc88347c8ba8 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4060,13 +4060,17 @@ static int __handle_mm_fault(struct vm_area_struct *vma, unsigned long address, barrier(); if (unlikely(is_swap_pmd(orig_pmd))) { - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(orig_pmd)); - if (is_pmd_migration_entry(orig_pmd)) + if (thp_migration_supported() && + is_pmd_migration_entry(orig_pmd)) { pmd_migration_entry_wait(mm, vmf.pmd); - return 0; - } - if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) { + return 0; + } else if (thp_swap_supported()) { + ret = do_huge_pmd_swap_page(&vmf, orig_pmd); + if (!(ret & VM_FAULT_FALLBACK)) + return ret; + } else + VM_BUG_ON(1); + } else if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) { if (pmd_protnone(orig_pmd) && vma_is_accessible(vma)) return do_huge_pmd_numa_page(&vmf, orig_pmd);