From patchwork Fri Jun 23 07:13:02 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 9805831 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 2B111600C5 for ; Fri, 23 Jun 2017 07:15:05 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 194E82870D for ; Fri, 23 Jun 2017 07:15:05 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0E1672872C; Fri, 23 Jun 2017 07:15:05 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 8B40F2872A for ; Fri, 23 Jun 2017 07:15:04 +0000 (UTC) Received: from [127.0.0.1] (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id D42A021A00AD2; Fri, 23 Jun 2017 00:13:31 -0700 (PDT) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 813ED21BBC421 for ; Fri, 23 Jun 2017 00:13:29 -0700 (PDT) Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga105.fm.intel.com with ESMTP; 23 Jun 2017 00:14:54 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.39,376,1493708400"; d="scan'208";a="116565894" Received: from yhuang-dev.sh.intel.com ([10.239.13.13]) by orsmga005.jf.intel.com with ESMTP; 23 Jun 2017 00:14:30 -0700 From: "Huang, Ying" To: Andrew Morton Subject: [PATCH -mm -v2 11/12] mm, THP, swap: Delay splitting THP after swapped out Date: Fri, 23 Jun 2017 15:13:02 +0800 Message-Id: <20170623071303.13469-12-ying.huang@intel.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20170623071303.13469-1-ying.huang@intel.com> References: <20170623071303.13469-1-ying.huang@intel.com> X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrea Arcangeli , Rik van Riel , linux-nvdimm@lists.01.org, Huang Ying , Hugh Dickins , linux-kernel@vger.kernel.org, Michal Hocko , linux-mm@kvack.org, Johannes Weiner , Minchan Kim , Shaohua Li , "Kirill A . Shutemov" MIME-Version: 1.0 Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Virus-Scanned: ClamAV using ClamSMTP From: Huang Ying In this patch, splitting transparent huge page (THP) during swapping out is delayed from after adding the THP into the swap cache to after swapping out finishes. After the patch, more operations for the anonymous THP reclaiming, such as writing the THP to the swap device, removing the THP from the swap cache could be batched. So that the performance of anonymous THP swapping out could be improved. This is the second step for the THP swap support. The plan is to delay splitting the THP step by step and avoid splitting the THP finally. With the patchset, the swap out throughput improves 42% (from about 5.81GB/s to about 8.25GB/s) in the vm-scalability swap-w-seq test case with 16 processes. At the same time, the IPI (reflect TLB flushing) reduced about 78.9%. The test is done on a Xeon E5 v3 system. The swap device used is a RAM simulated PMEM (persistent memory) device. To test the sequential swapping out, the test case creates 8 processes, which sequentially allocate and write to the anonymous pages until the RAM and part of the swap device is used up. Signed-off-by: "Huang, Ying" Cc: Johannes Weiner Cc: Minchan Kim Cc: Hugh Dickins Cc: Shaohua Li Cc: Rik van Riel Cc: Andrea Arcangeli Cc: "Kirill A . Shutemov" Cc: Michal Hocko --- mm/vmscan.c | 95 +++++++++++++++++++++++++++++++++---------------------------- 1 file changed, 52 insertions(+), 43 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index f84cdd3751e1..f3abaef7c0b5 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -535,7 +535,9 @@ static inline int is_page_cache_freeable(struct page *page) * that isolated the page, the page cache radix tree and * optional buffer heads at page->private. */ - return page_count(page) - page_has_private(page) == 2; + int radix_pins = PageTransHuge(page) && PageSwapCache(page) ? + HPAGE_PMD_NR : 1; + return page_count(page) - page_has_private(page) == 1 + radix_pins; } static int may_write_to_inode(struct inode *inode, struct scan_control *sc) @@ -665,6 +667,7 @@ static int __remove_mapping(struct address_space *mapping, struct page *page, bool reclaimed) { unsigned long flags; + int refcount; BUG_ON(!PageLocked(page)); BUG_ON(mapping != page_mapping(page)); @@ -695,11 +698,15 @@ static int __remove_mapping(struct address_space *mapping, struct page *page, * Note that if SetPageDirty is always performed via set_page_dirty, * and thus under tree_lock, then this ordering is not required. */ - if (!page_ref_freeze(page, 2)) + if (unlikely(PageTransHuge(page)) && PageSwapCache(page)) + refcount = 1 + HPAGE_PMD_NR; + else + refcount = 2; + if (!page_ref_freeze(page, refcount)) goto cannot_free; /* note: atomic_cmpxchg in page_freeze_refs provides the smp_rmb */ if (unlikely(PageDirty(page))) { - page_ref_unfreeze(page, 2); + page_ref_unfreeze(page, refcount); goto cannot_free; } @@ -1121,58 +1128,56 @@ static unsigned long shrink_page_list(struct list_head *page_list, * Try to allocate it some swap space here. * Lazyfree page could be freed directly */ - if (PageAnon(page) && PageSwapBacked(page) && - !PageSwapCache(page)) { - if (!(sc->gfp_mask & __GFP_IO)) - goto keep_locked; - if (PageTransHuge(page)) { - /* cannot split THP, skip it */ - if (!can_split_huge_page(page, NULL)) - goto activate_locked; - /* - * Split pages without a PMD map right - * away. Chances are some or all of the - * tail pages can be freed without IO. - */ - if (!compound_mapcount(page) && - split_huge_page_to_list(page, page_list)) - goto activate_locked; - } - if (!add_to_swap(page)) { - if (!PageTransHuge(page)) - goto activate_locked; - /* Split THP and swap individual base pages */ - if (split_huge_page_to_list(page, page_list)) - goto activate_locked; - if (!add_to_swap(page)) - goto activate_locked; - } - - /* XXX: We don't support THP writes */ - if (PageTransHuge(page) && - split_huge_page_to_list(page, page_list)) { - delete_from_swap_cache(page); - goto activate_locked; - } + if (PageAnon(page) && PageSwapBacked(page)) { + if (!PageSwapCache(page)) { + if (!(sc->gfp_mask & __GFP_IO)) + goto keep_locked; + if (PageTransHuge(page)) { + /* cannot split THP, skip it */ + if (!can_split_huge_page(page, NULL)) + goto activate_locked; + /* + * Split pages without a PMD map right + * away. Chances are some or all of the + * tail pages can be freed without IO. + */ + if (!compound_mapcount(page) && + split_huge_page_to_list(page, + page_list)) + goto activate_locked; + } + if (!add_to_swap(page)) { + if (!PageTransHuge(page)) + goto activate_locked; + /* Fallback to swap normal pages */ + if (split_huge_page_to_list(page, + page_list)) + goto activate_locked; + if (!add_to_swap(page)) + goto activate_locked; + } - may_enter_fs = 1; + may_enter_fs = 1; - /* Adding to swap updated mapping */ - mapping = page_mapping(page); + /* Adding to swap updated mapping */ + mapping = page_mapping(page); + } } else if (unlikely(PageTransHuge(page))) { /* Split file THP */ if (split_huge_page_to_list(page, page_list)) goto keep_locked; } - VM_BUG_ON_PAGE(PageTransHuge(page), page); - /* * The page is mapped into the page tables of one or more * processes. Try to unmap it here. */ if (page_mapped(page)) { - if (!try_to_unmap(page, ttu_flags | TTU_BATCH_FLUSH)) { + enum ttu_flags flags = ttu_flags | TTU_BATCH_FLUSH; + + if (unlikely(PageTransHuge(page))) + flags |= TTU_SPLIT_HUGE_PMD; + if (!try_to_unmap(page, flags)) { nr_unmap_fail++; goto activate_locked; } @@ -1312,7 +1317,11 @@ static unsigned long shrink_page_list(struct list_head *page_list, * Is there need to periodically free_page_list? It would * appear not as the counts should be low */ - list_add(&page->lru, &free_pages); + if (unlikely(PageTransHuge(page))) { + mem_cgroup_uncharge(page); + (*get_compound_page_dtor(page))(page); + } else + list_add(&page->lru, &free_pages); continue; activate_locked: