From patchwork Tue Sep 25 07:13:28 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10613487 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6707A13A4 for ; Tue, 25 Sep 2018 07:13:33 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5575F28346 for ; Tue, 25 Sep 2018 07:13:33 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4884129B13; Tue, 25 Sep 2018 07:13:33 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5947928346 for ; Tue, 25 Sep 2018 07:13:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F1DA28E006A; Tue, 25 Sep 2018 03:13:30 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id ECFFD8E0041; Tue, 25 Sep 2018 03:13:30 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D47F18E006A; Tue, 25 Sep 2018 03:13:30 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f199.google.com (mail-pg1-f199.google.com [209.85.215.199]) by kanga.kvack.org (Postfix) with ESMTP id 8C0A78E0041 for ; Tue, 25 Sep 2018 03:13:30 -0400 (EDT) Received: by mail-pg1-f199.google.com with SMTP id m4-v6so9419619pgq.19 for ; Tue, 25 Sep 2018 00:13:30 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=jcdFtbS1vOM/ZV2aqtNMm2r5Bx7BciLRqlAC0tdShXE=; b=un1tLFL4FYvg63N4hiEIX3mEcx4Mh5QzZYrfY/806JZkCed5gL1Ux7I2vSsQFFfKbn kXwRvZ54MN+ejpHXrWo1jLEJKzVf6AUotYet2tQ43tXyy6mmZOM1z1bSyiNhIJwo9as7 Hqa2tarIVs1rqYp30tyP4440e9Fusq5PC11dD3skAr13yd+ZZ9iGcQxYRe6XPOB1o1Ls znvyYhfF/JiH6f+GxOHdNgpWiqQ9TUr1NAn4i4aT9tztc+eviVQQWyTFHzdUajtflGL4 /MMmxP9ajS67vIQh0Q+j+BvnKqRRAIK20WSYal1iPuzk3sRYnSmFBlbCQ7liRgYu1Yde J8WQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfojib49ropsebPV9gqvJXHoHHFvInuy6o2b8reTdwu9OAP50EjM4 ksuujxZtzd2xa9rK07H2nV0YRfh5cL6SheFK8sCZamAcv2quCBRK55mbZ1x40ySovKEd5iWZRZt DZyDIH5aWOOrfdEGjkNGmvzck2b7XDT20nfq9ti+pKGuaqd2nxKjWoCLVUNXt5JJBmg== X-Received: by 2002:a65:5545:: with SMTP id t5-v6mr2021686pgr.157.1537859610242; Tue, 25 Sep 2018 00:13:30 -0700 (PDT) X-Google-Smtp-Source: ACcGV60ea28r4aexdDI7Qm8NrtbCUB/LG1Y0aX0tmxFOH8fh/ZUaA5FH5DLj6i4OW5oN0BKS1PUm X-Received: by 2002:a65:5545:: with SMTP id t5-v6mr2021628pgr.157.1537859609211; Tue, 25 Sep 2018 00:13:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537859609; cv=none; d=google.com; s=arc-20160816; b=VBi883OQl3X/TuO52RIGS3/W7Xxe+o2JZSO8xv/LGbRU6IV/W7f8JWTw4QI7OTyUY1 Mgpz0r2lCnEy/uUY0cXbb3qLW76t/bHu4uCq6YbgvZMsqIFPOgHXJqZz9erq/dqdY6hY T74qgJP0ChX8WQ1NRHy0VcFh0qrYO8F6YNps1yBTv+tzELHcdHq7TyyFtnn92pBHVSQR Vda9LJ5e/V8h2OK3H4KVHdiIV+FDhrx2BcriJaKDpmbRueX7POxwXAq0IFYgnv1TdBvH U35EB851XY+2Tf+5y1VgAnX+IkpBKUTgNlz6aYzn81Til8Kx0HVedO9lSpCfutRZWfx2 Bu8Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=jcdFtbS1vOM/ZV2aqtNMm2r5Bx7BciLRqlAC0tdShXE=; b=axVzKz/RxlsZOSRKVoQE4Yxca9lScO4fgbS3eZW2BfYnO4cNA1CFUQtoJHhlDMHflB 6XiC/ld3slibHuiyHMfXoT9HE8OcX/1IEtpdRf/fBpsY5cgFuds8C74wQ8Tz2lq7Rk5O IVPTfqi/RK6lk6qQBJAUGSPIVmeOnU0W5Ie7nqLbJxYsnSusFXenYNbpz9uaAmS5rgDh 4QGOruamKb6xm0xHf74atx9PunqW0V0RpG4v4RnC12WCKRSZtIVvNKuxA/DF7wEWgSqW lMwm0MlBmOR3fBvoKge7wCtg19Mcw/5YpHtjB+cQyJV1Z6N+Nglie5jhvTkW1lqciHRx um+Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga14.intel.com (mga14.intel.com. [192.55.52.115]) by mx.google.com with ESMTPS id z4-v6si1623480pln.462.2018.09.25.00.13.28 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 25 Sep 2018 00:13:29 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) client-ip=192.55.52.115; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 25 Sep 2018 00:13:28 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,301,1534834800"; d="scan'208";a="89093478" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 25 Sep 2018 00:13:25 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 RESEND 01/21] swap: Enable PMD swap operations for CONFIG_THP_SWAP Date: Tue, 25 Sep 2018 15:13:28 +0800 Message-Id: <20180925071348.31458-2-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180925071348.31458-1-ying.huang@intel.com> References: <20180925071348.31458-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Currently, "the swap entry" in the page tables is used for a number of things outside of actual swap, like page migration, etc. We support the THP/PMD "swap entry" for page migration currently and the functions behind this are tied to page migration's config option (CONFIG_ARCH_ENABLE_THP_MIGRATION). But, we also need them for THP swap optimization. So a new config option (CONFIG_HAVE_PMD_SWAP_ENTRY) is added. It is enabled when either CONFIG_ARCH_ENABLE_THP_MIGRATION or CONFIG_THP_SWAP is enabled. And PMD swap entry functions are tied to this new config option instead. Some functions enabled by CONFIG_ARCH_ENABLE_THP_MIGRATION are for page migration only, they are still enabled only for that. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- arch/x86/include/asm/pgtable.h | 2 +- include/asm-generic/pgtable.h | 2 +- include/linux/swapops.h | 44 ++++++++++++++++++++++-------------------- mm/Kconfig | 8 ++++++++ 4 files changed, 33 insertions(+), 23 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 40616e805292..e830ab345551 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1333,7 +1333,7 @@ static inline pte_t pte_swp_clear_soft_dirty(pte_t pte) return pte_clear_flags(pte, _PAGE_SWP_SOFT_DIRTY); } -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION +#ifdef CONFIG_HAVE_PMD_SWAP_ENTRY static inline pmd_t pmd_swp_mksoft_dirty(pmd_t pmd) { return pmd_set_flags(pmd, _PAGE_SWP_SOFT_DIRTY); diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index 5657a20e0c59..eb1e9d17371b 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -675,7 +675,7 @@ static inline void ptep_modify_prot_commit(struct mm_struct *mm, #endif #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY -#ifndef CONFIG_ARCH_ENABLE_THP_MIGRATION +#ifndef CONFIG_HAVE_PMD_SWAP_ENTRY static inline pmd_t pmd_swp_mksoft_dirty(pmd_t pmd) { return pmd; diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 22af9d8a84ae..79ccbf8789d5 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -259,17 +259,7 @@ static inline int is_write_migration_entry(swp_entry_t entry) #endif -struct page_vma_mapped_walk; - -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION -extern void set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, - struct page *page); - -extern void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, - struct page *new); - -extern void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd); - +#ifdef CONFIG_HAVE_PMD_SWAP_ENTRY static inline swp_entry_t pmd_to_swp_entry(pmd_t pmd) { swp_entry_t arch_entry; @@ -287,6 +277,28 @@ static inline pmd_t swp_entry_to_pmd(swp_entry_t entry) arch_entry = __swp_entry(swp_type(entry), swp_offset(entry)); return __swp_entry_to_pmd(arch_entry); } +#else +static inline swp_entry_t pmd_to_swp_entry(pmd_t pmd) +{ + return swp_entry(0, 0); +} + +static inline pmd_t swp_entry_to_pmd(swp_entry_t entry) +{ + return __pmd(0); +} +#endif + +struct page_vma_mapped_walk; + +#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION +extern void set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, + struct page *page); + +extern void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, + struct page *new); + +extern void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd); static inline int is_pmd_migration_entry(pmd_t pmd) { @@ -307,16 +319,6 @@ static inline void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, static inline void pmd_migration_entry_wait(struct mm_struct *m, pmd_t *p) { } -static inline swp_entry_t pmd_to_swp_entry(pmd_t pmd) -{ - return swp_entry(0, 0); -} - -static inline pmd_t swp_entry_to_pmd(swp_entry_t entry) -{ - return __pmd(0); -} - static inline int is_pmd_migration_entry(pmd_t pmd) { return 0; diff --git a/mm/Kconfig b/mm/Kconfig index c6a0d82af45f..b7f7fb145d0f 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -424,6 +424,14 @@ config THP_SWAP For selection by architectures with reasonable THP sizes. +# +# "PMD swap entry" in the page table is used both for migration and +# actual swap. +# +config HAVE_PMD_SWAP_ENTRY + def_bool y + depends on THP_SWAP || ARCH_ENABLE_THP_MIGRATION + config TRANSPARENT_HUGE_PAGECACHE def_bool y depends on TRANSPARENT_HUGEPAGE From patchwork Tue Sep 25 07:13:29 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10613489 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8DACC161F for ; Tue, 25 Sep 2018 07:13:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7CFC029B12 for ; Tue, 25 Sep 2018 07:13:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 710E229B17; Tue, 25 Sep 2018 07:13:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E310D29B12 for ; Tue, 25 Sep 2018 07:13:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 447DB8E006B; Tue, 25 Sep 2018 03:13:34 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3D11D8E0041; Tue, 25 Sep 2018 03:13:34 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2758E8E006B; Tue, 25 Sep 2018 03:13:34 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f199.google.com (mail-pg1-f199.google.com [209.85.215.199]) by kanga.kvack.org (Postfix) with ESMTP id CC8A68E0041 for ; Tue, 25 Sep 2018 03:13:33 -0400 (EDT) Received: by mail-pg1-f199.google.com with SMTP id v186-v6so8333674pgb.14 for ; Tue, 25 Sep 2018 00:13:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=PlULZszEr/zinic5XxLsYn/qt2WZkbM5AQ0JfYWFyu4=; b=pbjepLV5BV/ikc8X4ZGGoFbxuAvUHFRVtpIs87+sLjwJWb/E9v60R1bTHPfFPTxzxT PkNuawmEBO2o16Kw1+s00zlLSL6K7deKDKaffpmrHKBxf+IhaBGyCpX3O389XQBIewbT lLqL8lG7pVyob9o2Kdx8byVfa3pME5gzqyBbINQs/v//eGPtXLIJZLGMpVdEP8JLdCXL Jl+ohPukoSEQ5R8NgBW1gB6Nld3/K+W6/o/a//gMs59zvg9ak7EZ/oVYAcUPIFOtDMVm gV5fVhOOweqaKPnXQlk2A5s19AHGafHHof2maVnTy3NFlXzPmzRjQyp5b7L84ztvbYxG 03Pw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfojLxGvv1UoaUfG38k8WPeCy71yUYyESZ9NF+Y4uNrOw+A9RmsC+ 0pSZU6cWoHX+4gXS99jigTFrBstCyzbSBYANmij9rc5KNZ73W+NnZdiOgvq/Sc3O1MZ48qGN2Jg GlE1w/2rCKA5PSL/Yjx//pUAVU298OA22a7IIh/L7y+HXX7YOXq8zLm1Sjgjn+Yn6EQ== X-Received: by 2002:a63:3dc6:: with SMTP id k189-v6mr2024061pga.191.1537859613488; Tue, 25 Sep 2018 00:13:33 -0700 (PDT) X-Google-Smtp-Source: ACcGV63OuRoRZ6ozUzjgOGkm53lvNlDI1kcg6G86NJbZGxSPngEuNCmFW861tGPMXkyelu0gaPdy X-Received: by 2002:a63:3dc6:: with SMTP id k189-v6mr2024007pga.191.1537859612675; Tue, 25 Sep 2018 00:13:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537859612; cv=none; d=google.com; s=arc-20160816; b=CMOQSpczJTFr+dgFSH2Rza+JowybdE66NpobzgizKNxxzb9wczya7xwzAyNhV2s327 YyVe3jqjhFhwDYVehwWbj0DmGxIs5ahK/bpxmN7ciifCmyV6nOGH9C+HcvMTtPfWPRHV stHygjf1vd7WMcPZwfyN/B8RFldLmQdS/XpjhmACZ5bTKtu8QTTI95EqKVHn6cCW2JiE Bvp18LN1NIIjCrKdf+CTEhDKN8a6meLfUEPJcb/cn/QSYijJMemEXN4XEJiwD+GFonuH PS/X38u/xE46wektdTOzgn94ABi4YQuzxwg9L0lhxQzgcSgJZc7nYEaUWDQu4RGtJFRs pBoA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=PlULZszEr/zinic5XxLsYn/qt2WZkbM5AQ0JfYWFyu4=; b=j/X2Bdaftmfv0ejCXZU/FCYni2kMhz0qPMiSUpq+mnHrmUi479FnrQ6YB+y8O54Joj unJeR/qWEztmoC47Lg3MySoAV+8+62x6aTQR0rR/wOqHvTQCtn0RgiUFHb3tJ3Gor8vc WgRw4qtBc+PXuoaQRT+v41jmyeqdofrzOeVA4rbuhHN6i58QrP1H73X6Qcg0iWdx6MFt adGtKsDdBFSgJ/84mazBhJI7NHAPVraz3Q0Hh59auP1WJOJvwFU7jrXUXC9X/9gX7I91 pklMin9WseckZt/ztPaKg4I8nOD/OXM2pws9JzLAUJSp73kxpX8HEevzoQRkMhnO66bo Ep8Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga14.intel.com (mga14.intel.com. [192.55.52.115]) by mx.google.com with ESMTPS id z4-v6si1623480pln.462.2018.09.25.00.13.32 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 25 Sep 2018 00:13:32 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) client-ip=192.55.52.115; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 25 Sep 2018 00:13:32 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,301,1534834800"; d="scan'208";a="89093489" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 25 Sep 2018 00:13:29 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 RESEND 02/21] swap: Add __swap_duplicate_locked() Date: Tue, 25 Sep 2018 15:13:29 +0800 Message-Id: <20180925071348.31458-3-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180925071348.31458-1-ying.huang@intel.com> References: <20180925071348.31458-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The part of __swap_duplicate() with lock held is separated into a new function __swap_duplicate_locked(). Because we will add more logic about the PMD swap mapping into __swap_duplicate() and keep the most PTE swap mapping related logic in __swap_duplicate_locked(). Just mechanical code refactoring, there is no any functional change in this patch. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/swapfile.c | 63 +++++++++++++++++++++++++++++++++-------------------------- 1 file changed, 35 insertions(+), 28 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index 97a1bd1a7c9a..6a570ef00fa7 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -3436,32 +3436,12 @@ void si_swapinfo(struct sysinfo *val) spin_unlock(&swap_lock); } -/* - * Verify that a swap entry is valid and increment its swap map count. - * - * Returns error code in following case. - * - success -> 0 - * - swp_entry is invalid -> EINVAL - * - swp_entry is migration entry -> EINVAL - * - swap-cache reference is requested but there is already one. -> EEXIST - * - swap-cache reference is requested but the entry is not used. -> ENOENT - * - swap-mapped reference requested but needs continued swap count. -> ENOMEM - */ -static int __swap_duplicate(swp_entry_t entry, unsigned char usage) +static int __swap_duplicate_locked(struct swap_info_struct *p, + unsigned long offset, unsigned char usage) { - struct swap_info_struct *p; - struct swap_cluster_info *ci; - unsigned long offset; unsigned char count; unsigned char has_cache; - int err = -EINVAL; - - p = get_swap_device(entry); - if (!p) - goto out; - - offset = swp_offset(entry); - ci = lock_cluster_or_swap_info(p, offset); + int err = 0; count = p->swap_map[offset]; @@ -3471,12 +3451,11 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage) */ if (unlikely(swap_count(count) == SWAP_MAP_BAD)) { err = -ENOENT; - goto unlock_out; + goto out; } has_cache = count & SWAP_HAS_CACHE; count &= ~SWAP_HAS_CACHE; - err = 0; if (usage == SWAP_HAS_CACHE) { @@ -3503,11 +3482,39 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage) p->swap_map[offset] = count | has_cache; -unlock_out: +out: + return err; +} + +/* + * Verify that a swap entry is valid and increment its swap map count. + * + * Returns error code in following case. + * - success -> 0 + * - swp_entry is invalid -> EINVAL + * - swp_entry is migration entry -> EINVAL + * - swap-cache reference is requested but there is already one. -> EEXIST + * - swap-cache reference is requested but the entry is not used. -> ENOENT + * - swap-mapped reference requested but needs continued swap count. -> ENOMEM + */ +static int __swap_duplicate(swp_entry_t entry, unsigned char usage) +{ + struct swap_info_struct *p; + struct swap_cluster_info *ci; + unsigned long offset; + int err = -EINVAL; + + p = get_swap_device(entry); + if (!p) + goto out; + + offset = swp_offset(entry); + ci = lock_cluster_or_swap_info(p, offset); + err = __swap_duplicate_locked(p, offset, usage); unlock_cluster_or_swap_info(p, ci); + + put_swap_device(p); out: - if (p) - put_swap_device(p); return err; } From patchwork Tue Sep 25 07:13:30 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10613493 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 013EF13A4 for ; Tue, 25 Sep 2018 07:13:41 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E40DE28346 for ; Tue, 25 Sep 2018 07:13:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D85AA29B1C; Tue, 25 Sep 2018 07:13:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C671328346 for ; Tue, 25 Sep 2018 07:13:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F1D628E006C; Tue, 25 Sep 2018 03:13:37 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id EA6278E0041; Tue, 25 Sep 2018 03:13:37 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D496D8E006C; Tue, 25 Sep 2018 03:13:37 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f199.google.com (mail-pg1-f199.google.com [209.85.215.199]) by kanga.kvack.org (Postfix) with ESMTP id 8CEAC8E0041 for ; Tue, 25 Sep 2018 03:13:37 -0400 (EDT) Received: by mail-pg1-f199.google.com with SMTP id e6-v6so1873765pge.5 for ; Tue, 25 Sep 2018 00:13:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=ceJJTsks9lE0Ev0sbfXJXUbxw/VIoT4EO/HcdU0JUGA=; b=PG2GwO7DmIQEu4aNni1eL8ku9BZQogMshkMA1T9SmvD8D6eCcxb8y7sV6FJR11SLOn aGO/KsqOubJqcPVQUso29pDmAApUVefojMFfqnRk5O66L0lz73rKVQEi3tKcJIESFUJ8 73cKUpQXQfwyYGtOsFFvvoVLZcweboGOdOmiGrFgS3+RxBRFRbwd6wXFFM4ShFDxxPOD Dz1qsiRtGz82Hsv6bXXCnixkkXAKt5sAvk7x1LEMPL0ZMZ6kYEmU2EQxKXbHi8ND1xhq mXa3U8sruvZL43lq92XYr6OsMqEDYv1f/g0tiUnSO8e+YFE+5xL7b7Bu20zWynsuyUxv rxWg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfog2Det+CXjaxXaQ9NT6Ml90xihGUCChmVvobGYV+tbGaN6ycFjr 4xdhenxJigQUdIr3wDXjteHLhFCBTAaq/Sx3y+aBUluDE/EFUj2KzCtFYDyYuCSYdB4xFpI1Bh5 Oi0nIJNldukO6yN7Ejl3YDlsJ4ovj2jvnEX7NeOX+amAzUzuw/eranEc4UTXBTsBlnA== X-Received: by 2002:a17:902:3a5:: with SMTP id d34-v6mr2208443pld.98.1537859617208; Tue, 25 Sep 2018 00:13:37 -0700 (PDT) X-Google-Smtp-Source: ACcGV60CbKzTWahxxsvsL6puFGMVFq9WH4ZZC3TFLeZkTaVgaeJP0hSlQfFvSexhG3NpGKy43oX6 X-Received: by 2002:a17:902:3a5:: with SMTP id d34-v6mr2208376pld.98.1537859616085; Tue, 25 Sep 2018 00:13:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537859616; cv=none; d=google.com; s=arc-20160816; b=mpIHM4nT1tniXlbSU2Gv3JRg5P0BiElVFnf/JqeUxT3UmYsq0zFbv7g3HvEP091Yms w0NMReIRQVBuS/QTFrCv0dprB3d+juyz7Rav3N0POCNoI/Re/fXNIEJoUJ8eLaJb+sEt kzxqns8pQPesPxhB0qGLRX8veF2O6eTipy6Jgp7CFaoUDCqV505HFcqosly6D8UctBd+ SZq2t05NypRc85oJAK97pt8uCY26tmCbgaSu9ycOJOkwbr/XD1EYZBWx2M0m1mvCOd0k vpEsUbAgCAFCrGpYPsR8pky0GH8JAsMYvHAYeQDlX/rFTQBPubZ5xbmsazbwwiZIMayh sb0w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=ceJJTsks9lE0Ev0sbfXJXUbxw/VIoT4EO/HcdU0JUGA=; b=Kly3u8sr0PFZ7ECv8gKngL7wAdFOdxSn27qdtOHU62ZG9Ey03WaKX90eIUert62s0O jUhTrqfCtKWiqGov//MUi+v8SdQiDZHnkK9HC3POR7mazzM8IjpjDpa6hVYbKk8aDX2n KVGZYFk+TthuvRgIBvg8dh5degaGFtBNwm4D9sdkQfLlyUefvlgCOlUUbSxIHGVrVnwk mHKBgCLn8Pc4IeIBVjDJtZQyeAXxoLuy/KsR2VvDQOor4SNULZR18F9mRu3DOovcaILo 0vR6Cub7oJ8041wZ1iC3TxWbp4j9vz2oGikEcku03TfV5xd0pk1j9eRg73pBaSncUzmW 4y2w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga14.intel.com (mga14.intel.com. [192.55.52.115]) by mx.google.com with ESMTPS id z4-v6si1623480pln.462.2018.09.25.00.13.35 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 25 Sep 2018 00:13:36 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) client-ip=192.55.52.115; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 25 Sep 2018 00:13:35 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,301,1534834800"; d="scan'208";a="89093503" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 25 Sep 2018 00:13:32 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 RESEND 03/21] swap: Support PMD swap mapping in swap_duplicate() Date: Tue, 25 Sep 2018 15:13:30 +0800 Message-Id: <20180925071348.31458-4-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180925071348.31458-1-ying.huang@intel.com> References: <20180925071348.31458-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP To support to swapin the THP in one piece, we need to create PMD swap mapping during swapout, and maintain PMD swap mapping count. This patch implements the support to increase the PMD swap mapping count (for swapout, fork, etc.) and set SWAP_HAS_CACHE flag (for swapin, etc.) for a huge swap cluster in swap_duplicate() function family. Although it only implements a part of the design of the swap reference count with PMD swap mapping, the whole design is described as follow to make it easy to understand the patch and the whole picture. A huge swap cluster is used to hold the contents of a swapouted THP. After swapout, a PMD page mapping to the THP will become a PMD swap mapping to the huge swap cluster via a swap entry in PMD. While a PTE page mapping to a subpage of the THP will become the PTE swap mapping to a swap slot in the huge swap cluster via a swap entry in PTE. If there is no PMD swap mapping and the corresponding THP is removed from the page cache (reclaimed), the huge swap cluster will be split and become a normal swap cluster. The count (cluster_count()) of the huge swap cluster is SWAPFILE_CLUSTER (= HPAGE_PMD_NR) + PMD swap mapping count. Because all swap slots in the huge swap cluster are mapped by PTE or PMD, or has SWAP_HAS_CACHE bit set, the usage count of the swap cluster is HPAGE_PMD_NR. And the PMD swap mapping count is recorded too to make it easy to determine whether there are remaining PMD swap mappings. The count in swap_map[offset] is the sum of PTE and PMD swap mapping count. This means when we increase the PMD swap mapping count, we need to increase swap_map[offset] for all swap slots inside the swap cluster. An alternative choice is to make swap_map[offset] to record PTE swap map count only, given we have recorded PMD swap mapping count in the count of the huge swap cluster. But this need to increase swap_map[offset] when splitting the PMD swap mapping, that may fail because of memory allocation for swap count continuation. That is hard to dealt with. So we choose current solution. The PMD swap mapping to a huge swap cluster may be split when unmap a part of PMD mapping etc. That is easy because only the count of the huge swap cluster need to be changed. When the last PMD swap mapping is gone and SWAP_HAS_CACHE is unset, we will split the huge swap cluster (clear the huge flag). This makes it easy to reason the cluster state. A huge swap cluster will be split when splitting the THP in swap cache, or failing to allocate THP during swapin, etc. But when splitting the huge swap cluster, we will not try to split all PMD swap mappings, because we haven't enough information available for that sometimes. Later, when the PMD swap mapping is duplicated or swapin, etc, the PMD swap mapping will be split and fallback to the PTE operation. When a THP is added into swap cache, the SWAP_HAS_CACHE flag will be set in the swap_map[offset] of all swap slots inside the huge swap cluster backing the THP. This huge swap cluster will not be split unless the THP is split even if its PMD swap mapping count dropped to 0. Later, when the THP is removed from swap cache, the SWAP_HAS_CACHE flag will be cleared in the swap_map[offset] of all swap slots inside the huge swap cluster. And this huge swap cluster will be split if its PMD swap mapping count is 0. The first parameter of swap_duplicate() is changed to return the swap entry to call add_swap_count_continuation() for. Because we may need to call it for a swap entry in the middle of a huge swap cluster. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/swap.h | 9 +++-- mm/memory.c | 2 +- mm/rmap.c | 2 +- mm/swap_state.c | 2 +- mm/swapfile.c | 107 ++++++++++++++++++++++++++++++++++++++++++--------- 5 files changed, 97 insertions(+), 25 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index f32f94639b13..3149cdb52e6d 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -451,8 +451,8 @@ extern swp_entry_t get_swap_page_of_type(int); extern int get_swap_pages(int n, swp_entry_t swp_entries[], int entry_size); extern int add_swap_count_continuation(swp_entry_t, gfp_t); extern void swap_shmem_alloc(swp_entry_t); -extern int swap_duplicate(swp_entry_t); -extern int swapcache_prepare(swp_entry_t); +extern int swap_duplicate(swp_entry_t *entry, int entry_size); +extern int swapcache_prepare(swp_entry_t entry, int entry_size); extern void swap_free(swp_entry_t); extern void swapcache_free_entries(swp_entry_t *entries, int n); extern int free_swap_and_cache(swp_entry_t); @@ -510,7 +510,8 @@ static inline void show_swap_cache_info(void) } #define free_swap_and_cache(e) ({(is_migration_entry(e) || is_device_private_entry(e));}) -#define swapcache_prepare(e) ({(is_migration_entry(e) || is_device_private_entry(e));}) +#define swapcache_prepare(e, s) \ + ({(is_migration_entry(e) || is_device_private_entry(e)); }) static inline int add_swap_count_continuation(swp_entry_t swp, gfp_t gfp_mask) { @@ -521,7 +522,7 @@ static inline void swap_shmem_alloc(swp_entry_t swp) { } -static inline int swap_duplicate(swp_entry_t swp) +static inline int swap_duplicate(swp_entry_t *swp, int entry_size) { return 0; } diff --git a/mm/memory.c b/mm/memory.c index 96caf5fe7531..d36b693cef61 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -709,7 +709,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, swp_entry_t entry = pte_to_swp_entry(pte); if (likely(!non_swap_entry(entry))) { - if (swap_duplicate(entry) < 0) + if (swap_duplicate(&entry, 1) < 0) return entry.val; /* make sure dst_mm is on swapoff's mmlist. */ diff --git a/mm/rmap.c b/mm/rmap.c index 1e79fac3186b..3bb4be720bc0 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1598,7 +1598,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, break; } - if (swap_duplicate(entry) < 0) { + if (swap_duplicate(&entry, 1) < 0) { set_pte_at(mm, address, pvmw.pte, pteval); ret = false; page_vma_mapped_walk_done(&pvmw); diff --git a/mm/swap_state.c b/mm/swap_state.c index b50c9b2ec2b4..376327b7b442 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -433,7 +433,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, /* * Swap entry may have been freed since our caller observed it. */ - err = swapcache_prepare(entry); + err = swapcache_prepare(entry, 1); if (err == -EEXIST) { radix_tree_preload_end(); /* diff --git a/mm/swapfile.c b/mm/swapfile.c index 6a570ef00fa7..138968b79de5 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -534,6 +534,40 @@ static void dec_cluster_info_page(struct swap_info_struct *p, free_cluster(p, idx); } +/* + * When swapout a THP in one piece, PMD page mappings to THP are + * replaced by PMD swap mappings to the corresponding swap cluster. + * cluster_swapcount() returns the PMD swap mapping count. + * + * cluster_count() = PMD swap mapping count + count of allocated swap + * entries in cluster. If a cluster is mapped by PMD, all swap + * entries inside is used, so here cluster_count() = PMD swap mapping + * count + SWAPFILE_CLUSTER. + */ +static inline int cluster_swapcount(struct swap_cluster_info *ci) +{ + VM_BUG_ON(!cluster_is_huge(ci) || cluster_count(ci) < SWAPFILE_CLUSTER); + return cluster_count(ci) - SWAPFILE_CLUSTER; +} + +/* + * Set PMD swap mapping count for the huge cluster + */ +static inline void cluster_set_swapcount(struct swap_cluster_info *ci, + unsigned int count) +{ + VM_BUG_ON(!cluster_is_huge(ci) || cluster_count(ci) < SWAPFILE_CLUSTER); + cluster_set_count(ci, SWAPFILE_CLUSTER + count); +} + +static inline void cluster_add_swapcount(struct swap_cluster_info *ci, int add) +{ + int count = cluster_swapcount(ci) + add; + + VM_BUG_ON(count < 0); + cluster_set_swapcount(ci, count); +} + /* * It's possible scan_swap_map() uses a free cluster in the middle of free * cluster list. Avoiding such abuse to avoid list corruption. @@ -3487,35 +3521,66 @@ static int __swap_duplicate_locked(struct swap_info_struct *p, } /* - * Verify that a swap entry is valid and increment its swap map count. + * Verify that the swap entries from *entry is valid and increment their + * PMD/PTE swap mapping count. * * Returns error code in following case. * - success -> 0 * - swp_entry is invalid -> EINVAL - * - swp_entry is migration entry -> EINVAL * - swap-cache reference is requested but there is already one. -> EEXIST * - swap-cache reference is requested but the entry is not used. -> ENOENT * - swap-mapped reference requested but needs continued swap count. -> ENOMEM + * - the huge swap cluster has been split. -> ENOTDIR */ -static int __swap_duplicate(swp_entry_t entry, unsigned char usage) +static int __swap_duplicate(swp_entry_t *entry, int entry_size, + unsigned char usage) { struct swap_info_struct *p; struct swap_cluster_info *ci; unsigned long offset; int err = -EINVAL; + int i, size = swap_entry_size(entry_size); - p = get_swap_device(entry); + p = get_swap_device(*entry); if (!p) goto out; - offset = swp_offset(entry); + offset = swp_offset(*entry); ci = lock_cluster_or_swap_info(p, offset); - err = __swap_duplicate_locked(p, offset, usage); + if (size == SWAPFILE_CLUSTER) { + /* + * The huge swap cluster has been split, for example, failed to + * allocate huge page during swapin, the caller should split + * the PMD swap mapping and operate on normal swap entries. + */ + if (!cluster_is_huge(ci)) { + err = -ENOTDIR; + goto unlock; + } + VM_BUG_ON(!IS_ALIGNED(offset, size)); + /* If cluster is huge, all swap entries inside is in-use */ + VM_BUG_ON(cluster_count(ci) < SWAPFILE_CLUSTER); + } + /* p->swap_map[] = PMD swap map count + PTE swap map count */ + for (i = 0; i < size; i++) { + err = __swap_duplicate_locked(p, offset + i, usage); + if (err && size != 1) { + *entry = swp_entry(p->type, offset + i); + goto undup; + } + } + if (size == SWAPFILE_CLUSTER && usage == 1) + cluster_add_swapcount(ci, usage); +unlock: unlock_cluster_or_swap_info(p, ci); put_swap_device(p); out: return err; +undup: + for (i--; i >= 0; i--) + __swap_entry_free_locked(p, offset + i, usage); + goto unlock; } /* @@ -3524,36 +3589,42 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage) */ void swap_shmem_alloc(swp_entry_t entry) { - __swap_duplicate(entry, SWAP_MAP_SHMEM); + __swap_duplicate(&entry, 1, SWAP_MAP_SHMEM); } /* * Increase reference count of swap entry by 1. - * Returns 0 for success, or -ENOMEM if a swap_count_continuation is required - * but could not be atomically allocated. Returns 0, just as if it succeeded, - * if __swap_duplicate() fails for another reason (-EINVAL or -ENOENT), which - * might occur if a page table entry has got corrupted. + * + * Return error code in following case. + * - success -> 0 + * - swap_count_continuation is required but could not be atomically allocated. + * *entry is used to return swap entry to call add_swap_count_continuation(). + * -> ENOMEM + * - otherwise same as __swap_duplicate() */ -int swap_duplicate(swp_entry_t entry) +int swap_duplicate(swp_entry_t *entry, int entry_size) { int err = 0; - while (!err && __swap_duplicate(entry, 1) == -ENOMEM) - err = add_swap_count_continuation(entry, GFP_ATOMIC); + while (!err && + (err = __swap_duplicate(entry, entry_size, 1)) == -ENOMEM) + err = add_swap_count_continuation(*entry, GFP_ATOMIC); return err; } /* * @entry: swap entry for which we allocate swap cache. + * @entry_size: size of the swap entry, 1 or SWAPFILE_CLUSTER * * Called when allocating swap cache for existing swap entry, * This can return error codes. Returns 0 at success. - * -EBUSY means there is a swap cache. - * Note: return code is different from swap_duplicate(). + * -EINVAL means the swap device has been swapoff. + * -EEXIST means there is a swap cache. + * Otherwise same as __swap_duplicate() */ -int swapcache_prepare(swp_entry_t entry) +int swapcache_prepare(swp_entry_t entry, int entry_size) { - return __swap_duplicate(entry, SWAP_HAS_CACHE); + return __swap_duplicate(&entry, entry_size, SWAP_HAS_CACHE); } struct swap_info_struct *swp_swap_info(swp_entry_t entry) From patchwork Tue Sep 25 07:13:31 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10613495 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DCC17161F for ; Tue, 25 Sep 2018 07:13:43 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CD7DE29B12 for ; Tue, 25 Sep 2018 07:13:43 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C1CA129B18; Tue, 25 Sep 2018 07:13:43 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 46D7A29B12 for ; Tue, 25 Sep 2018 07:13:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DE78A8E006E; Tue, 25 Sep 2018 03:13:40 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id CD2518E0041; Tue, 25 Sep 2018 03:13:40 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B741D8E006E; Tue, 25 Sep 2018 03:13:40 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f198.google.com (mail-pg1-f198.google.com [209.85.215.198]) by kanga.kvack.org (Postfix) with ESMTP id 6C76D8E0041 for ; Tue, 25 Sep 2018 03:13:40 -0400 (EDT) Received: by mail-pg1-f198.google.com with SMTP id 77-v6so5159110pgg.0 for ; Tue, 25 Sep 2018 00:13:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=CaCWWMD2A4Bim+4Yv358LYKJOc7V1EosNUqm3zG4Vmk=; b=dYOkjphR0ossIWa2x1mV+b4/IsZ0pfzTF0gCiNecG//UMF9p+eRMfeZ1YkvdZduca3 4c3kDVrdXBHIxR6QaZ0k22baohBml+vOif9VSRVTI8pID4hpFNcyM3H7iURLUHWrye9p 9KjGanm/ypoolV9Flf5TxpSxT+w31bVqsnY/7EEtwX7HM3yuYxTDAWUdaLKFuoFHjapq Hlo8cjennon6mwMGY3LV2EBfuICENNQkK8xuMeFKHN8DsE/EZL1TSZ28vxW59MU6j5ya Q+Fe1Jfm9wlE4UXcdRHrYD5X1nesBoA5CkZixnWsn1xSjH/aPDSV/KrpJSrcMupHQXh0 Z33A== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfojgb/jr43EZoSV9t5hcRSRh25X9lWrjt9LRzuMBfofyOw2Wz36A O5hqkGgEUi+hC4sv/3uljPDj498VOXSnMtsOeiOU1BkCNVKmqyXpR8SJpQX2yhvFiXH8irzUXv3 5UVsxXjzc0yIbmnZw4QOUsqaqck20XRMXhAzKEpZA6RO8Nn96pYukpTz8ImMbXrM+EQ== X-Received: by 2002:a17:902:a989:: with SMTP id bh9-v6mr2091272plb.245.1537859620130; Tue, 25 Sep 2018 00:13:40 -0700 (PDT) X-Google-Smtp-Source: ACcGV60nZWqnpWrgFHc4/r+dwD13I937I3b1P9Weo4GRPO0HEtP9ygEIycOh6y2m02vIazl3tLvF X-Received: by 2002:a17:902:a989:: with SMTP id bh9-v6mr2091229plb.245.1537859619409; Tue, 25 Sep 2018 00:13:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537859619; cv=none; d=google.com; s=arc-20160816; b=vEnmZtAEskJ3IhiIyD9dI7n+rer5qIzqeqZpa/Jx8Cnr7ltIk04bPGbjppF7apr1XK xzwWPA0Y2fXPbyuryDkdsgGJwFOEnvLt3EdJiGpal/vsxbtqY1v0hEK+6uyVHFlz6KvQ Ker/H3YHjCQwkVpPeVf/VlV75k39GQ3mBrTWFmYG34ZE0pEtGvfktOinrnSSHL4XQVml gViFK8XteEepQS8aIj7UKqhrvpVtr9IA1j/DI1lWgScUr6Ug3CWS19g6XLIUkaprTpM4 1mRcM0v8e4WQ3u/HZQY1vqPl9G39qi0ObHa30JDTnUqPqWby+ki0u2HFhIBkZSQBk8mR bnmw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=CaCWWMD2A4Bim+4Yv358LYKJOc7V1EosNUqm3zG4Vmk=; b=M10nHGOxDlFwvemXbWKaObeD0g2dzCmP89eM5XWf6Mi6DvDMT/pU6MZutS/cPOOBUh xzBzE5iMyY7rsPy8ef9LHlq4zp45pyza+gVhjeOsuajnE+Glt5WIuwnw5VzEQ19yY6/q AJC/MwyO8MVEpKXhbJSjppBhR8Z7dKb8tRyoHuAY8csFMMqZni+W90G6kDKhpkfEIo0t F04dLg3oevEQOyBU5czjQa2GvANBOFsEjdAFN/b27mMeapXWUkl9TBbMs9b3TlpVTSM4 SVIOiCtDHOG8LO41TXfYyAaWn7Zf8uC+z4g5fQGqFmZlAW+MQMjaaNOG9r+O+1pE3Ari aPGw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga14.intel.com (mga14.intel.com. [192.55.52.115]) by mx.google.com with ESMTPS id h18-v6si1501855pgl.398.2018.09.25.00.13.39 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 25 Sep 2018 00:13:39 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) client-ip=192.55.52.115; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 25 Sep 2018 00:13:38 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,301,1534834800"; d="scan'208";a="89093516" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 25 Sep 2018 00:13:35 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 RESEND 04/21] swap: Support PMD swap mapping in put_swap_page() Date: Tue, 25 Sep 2018 15:13:31 +0800 Message-Id: <20180925071348.31458-5-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180925071348.31458-1-ying.huang@intel.com> References: <20180925071348.31458-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Previously, during swapout, all PMD page mapping will be split and replaced with PTE swap mapping. And when clearing the SWAP_HAS_CACHE flag for the huge swap cluster in put_swap_page(), the huge swap cluster will be split. Now, during swapout, the PMD page mappings to the THP will be changed to PMD swap mappings to the corresponding swap cluster. So when clearing the SWAP_HAS_CACHE flag, the huge swap cluster will only be split if the PMD swap mapping count is 0. Otherwise, we will keep it as the huge swap cluster. So that we can swapin a THP in one piece later. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/swapfile.c | 31 ++++++++++++++++++++++++------- 1 file changed, 24 insertions(+), 7 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index 138968b79de5..553d2551b35a 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1314,6 +1314,15 @@ void swap_free(swp_entry_t entry) /* * Called after dropping swapcache to decrease refcnt to swap entries. + * + * When a THP is added into swap cache, the SWAP_HAS_CACHE flag will + * be set in the swap_map[] of all swap entries in the huge swap + * cluster backing the THP. This huge swap cluster will not be split + * unless the THP is split even if its PMD swap mapping count dropped + * to 0. Later, when the THP is removed from swap cache, the + * SWAP_HAS_CACHE flag will be cleared in the swap_map[] of all swap + * entries in the huge swap cluster. And this huge swap cluster will + * be split if its PMD swap mapping count is 0. */ void put_swap_page(struct page *page, swp_entry_t entry) { @@ -1332,15 +1341,23 @@ void put_swap_page(struct page *page, swp_entry_t entry) ci = lock_cluster_or_swap_info(si, offset); if (size == SWAPFILE_CLUSTER) { - VM_BUG_ON(!cluster_is_huge(ci)); + VM_BUG_ON(!IS_ALIGNED(offset, size)); map = si->swap_map + offset; - for (i = 0; i < SWAPFILE_CLUSTER; i++) { - val = map[i]; - VM_BUG_ON(!(val & SWAP_HAS_CACHE)); - if (val == SWAP_HAS_CACHE) - free_entries++; + /* + * No PMD swap mapping, the swap cluster will be freed + * if all swap entries becoming free, otherwise the + * huge swap cluster will be split. + */ + if (!cluster_swapcount(ci)) { + for (i = 0; i < SWAPFILE_CLUSTER; i++) { + val = map[i]; + VM_BUG_ON(!(val & SWAP_HAS_CACHE)); + if (val == SWAP_HAS_CACHE) + free_entries++; + } + if (free_entries != SWAPFILE_CLUSTER) + cluster_clear_huge(ci); } - cluster_clear_huge(ci); if (free_entries == SWAPFILE_CLUSTER) { unlock_cluster_or_swap_info(si, ci); spin_lock(&si->lock); From patchwork Tue Sep 25 07:13:32 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10613499 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B11F3161F for ; Tue, 25 Sep 2018 07:13:48 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9F6FE28346 for ; Tue, 25 Sep 2018 07:13:48 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9368329B13; Tue, 25 Sep 2018 07:13:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 580F929B12 for ; Tue, 25 Sep 2018 07:13:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6913B8E006D; Tue, 25 Sep 2018 03:13:45 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 61A128E0041; Tue, 25 Sep 2018 03:13:45 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4BA678E006D; Tue, 25 Sep 2018 03:13:45 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f197.google.com (mail-pf1-f197.google.com [209.85.210.197]) by kanga.kvack.org (Postfix) with ESMTP id 03A848E0041 for ; Tue, 25 Sep 2018 03:13:45 -0400 (EDT) Received: by mail-pf1-f197.google.com with SMTP id u13-v6so11749283pfm.8 for ; Tue, 25 Sep 2018 00:13:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=C98YC6esSVvgxWxqnbQdG1PHgKENKo4WJ93BcjZtVjQ=; b=i+iHKKWCbOJcRfePkce1AxAYPV9fqrnviDsKVDQo4M78SzC1DbSoCFnUP7JzqpOBw6 I+oF/9AcPX43JS3my4YE/pIIqFVDurwIKWHksnKWxQe5jwdM+RLqruAVJ5boUeL3Qxng r8bhu6/nSQEutlRGCfHXxxWcXkGizmgZIn9YFbfy89lMEpZ5IftziFu7Bp2KEmRrYGRC 6udkr0TMdFakotih1LqOdzOLYLnpdMbV5JmZzrIaoLaaX+XZPpH0LR8EUS27vM87/JjB LQBaDsG2NToqIEPsdnyaMoczDkxUdRJoU85Uq5MULWhE7HVM/ugi3vXJkuE9jRD16ETb H8PA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfoj2g+SB6nmRKNSSmTTy26ZB2mJzS3ndzH745WRdzteWOGYL5YSu JFwRb4a1S6r0QnFJ07tx1lFHIl2FSs/tkEuPW8/sKDdxsHIWO7CU90hfRWXOyM6gNHEIGnDXGIS I1Uy2HTp5jSgqpU0WjFrFAT5vdhK7u0pxkvS/D8aY2S8iBuOV1sjjpRPU/LFysxDhcQ== X-Received: by 2002:a62:985a:: with SMTP id q87-v6mr2212862pfd.64.1537859624654; Tue, 25 Sep 2018 00:13:44 -0700 (PDT) X-Google-Smtp-Source: ACcGV63zmY1TeGj5XjOKFBOrfMBmAdVZLhWHFJgJiRpsMuCJczfHZM0So/N/XemAvEDvsQOr04ON X-Received: by 2002:a62:985a:: with SMTP id q87-v6mr2212768pfd.64.1537859623169; Tue, 25 Sep 2018 00:13:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537859623; cv=none; d=google.com; s=arc-20160816; b=0PT21BcgzCoY2XUJi92rJ8OsiWMhRtx8uVUpqhkkaBdQmTYcuqR8vWpaW6SbsWUDo1 i0eNnWgbG94Mf3/yRC/7ixuc344IQXUVUxCKUDAlroFwlBe77YZPLclc5pUNft2dHVkd Ry61dwAyabMPCKLg6NHMOjJoJmLTXxT55uiHZHjQGVRWGw9eXNN51RXyRRiTUEb7O9bD fBPEOx6iutyeIN6wiU/FPH3vdU/8FE+PcJcCrumWjl7jzlf7IO0fGtOGHKE9u2+2gnhF ZY1y0zyrrBgcMyarnVeBKp2Mx/a9oFlP7S/0ylBMPcPRZGFkLTQqu4zfXdlARnBc4Z2m A07w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=C98YC6esSVvgxWxqnbQdG1PHgKENKo4WJ93BcjZtVjQ=; b=rDF5Tl4rl9rMrN/LKQANczZh3DuRSb92Bryd6FyNTzBIldeQKxmei61FGlnzNGDT9v ob2fP0yfgTAF9zua47tub76uGeECKGRqyuuppHZkA7BmTI6rNsgV5j89X1JZU1KSx9Ql DfzWMTwrNjCSkxHI4+2mG2TvEQ8LEq/ui1iX8SxyvvicpDyx74angPlfZ9YRpFnip8Ja gSD8mWXRTB8dX0STGKxuDxKBFQa5WJ/5lEZ9bMBVnBkohCwr4bDTiUlHsWNFabXNk1+y pz18rY1NQI6tDzkCC/JXRRqECR39mBcDFdyqNXCxkSJRQHY9E5VGd+QBC5NQ4EeweKjx q+/Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga14.intel.com (mga14.intel.com. [192.55.52.115]) by mx.google.com with ESMTPS id h18-v6si1501855pgl.398.2018.09.25.00.13.42 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 25 Sep 2018 00:13:43 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) client-ip=192.55.52.115; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 25 Sep 2018 00:13:42 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,301,1534834800"; d="scan'208";a="89093525" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 25 Sep 2018 00:13:39 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 RESEND 05/21] swap: Support PMD swap mapping in free_swap_and_cache()/swap_free() Date: Tue, 25 Sep 2018 15:13:32 +0800 Message-Id: <20180925071348.31458-6-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180925071348.31458-1-ying.huang@intel.com> References: <20180925071348.31458-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When a PMD swap mapping is removed from a huge swap cluster, for example, unmap a memory range mapped with PMD swap mapping, etc, free_swap_and_cache() will be called to decrease the reference count to the huge swap cluster. free_swap_and_cache() may also free or split the huge swap cluster, and free the corresponding THP in swap cache if necessary. swap_free() is similar, and shares most implementation with free_swap_and_cache(). This patch revises free_swap_and_cache() and swap_free() to implement this. If the swap cluster has been split already, for example, because of failing to allocate a THP during swapin, we just decrease one from the reference count of all swap slots. Otherwise, we will decrease one from the reference count of all swap slots and the PMD swap mapping count in cluster_count(). When the corresponding THP isn't in swap cache, if PMD swap mapping count becomes 0, the huge swap cluster will be split, and if all swap count becomes 0, the huge swap cluster will be freed. When the corresponding THP is in swap cache, if every swap_map[offset] == SWAP_HAS_CACHE, we will try to delete the THP from swap cache. Which will cause the THP and the huge swap cluster be freed. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- arch/s390/mm/pgtable.c | 2 +- include/linux/swap.h | 9 +-- kernel/power/swap.c | 4 +- mm/madvise.c | 2 +- mm/memory.c | 4 +- mm/shmem.c | 6 +- mm/swapfile.c | 171 ++++++++++++++++++++++++++++++++++++++----------- 7 files changed, 149 insertions(+), 49 deletions(-) diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c index f2cc7da473e4..ffd4b68adbb3 100644 --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -675,7 +675,7 @@ static void ptep_zap_swap_entry(struct mm_struct *mm, swp_entry_t entry) dec_mm_counter(mm, mm_counter(page)); } - free_swap_and_cache(entry); + free_swap_and_cache(entry, 1); } void ptep_zap_unused(struct mm_struct *mm, unsigned long addr, diff --git a/include/linux/swap.h b/include/linux/swap.h index 3149cdb52e6d..88eb06eb1444 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -453,9 +453,9 @@ extern int add_swap_count_continuation(swp_entry_t, gfp_t); extern void swap_shmem_alloc(swp_entry_t); extern int swap_duplicate(swp_entry_t *entry, int entry_size); extern int swapcache_prepare(swp_entry_t entry, int entry_size); -extern void swap_free(swp_entry_t); +extern void swap_free(swp_entry_t entry, int entry_size); extern void swapcache_free_entries(swp_entry_t *entries, int n); -extern int free_swap_and_cache(swp_entry_t); +extern int free_swap_and_cache(swp_entry_t entry, int entry_size); extern int swap_type_of(dev_t, sector_t, struct block_device **); extern unsigned int count_swap_pages(int, int); extern sector_t map_swap_page(struct page *, struct block_device **); @@ -509,7 +509,8 @@ static inline void show_swap_cache_info(void) { } -#define free_swap_and_cache(e) ({(is_migration_entry(e) || is_device_private_entry(e));}) +#define free_swap_and_cache(e, s) \ + ({(is_migration_entry(e) || is_device_private_entry(e)); }) #define swapcache_prepare(e, s) \ ({(is_migration_entry(e) || is_device_private_entry(e)); }) @@ -527,7 +528,7 @@ static inline int swap_duplicate(swp_entry_t *swp, int entry_size) return 0; } -static inline void swap_free(swp_entry_t swp) +static inline void swap_free(swp_entry_t swp, int entry_size) { } diff --git a/kernel/power/swap.c b/kernel/power/swap.c index d7f6c1a288d3..0275df84ed3d 100644 --- a/kernel/power/swap.c +++ b/kernel/power/swap.c @@ -182,7 +182,7 @@ sector_t alloc_swapdev_block(int swap) offset = swp_offset(get_swap_page_of_type(swap)); if (offset) { if (swsusp_extents_insert(offset)) - swap_free(swp_entry(swap, offset)); + swap_free(swp_entry(swap, offset), 1); else return swapdev_block(swap, offset); } @@ -206,7 +206,7 @@ void free_all_swap_pages(int swap) ext = rb_entry(node, struct swsusp_extent, node); rb_erase(node, &swsusp_extents); for (offset = ext->start; offset <= ext->end; offset++) - swap_free(swp_entry(swap, offset)); + swap_free(swp_entry(swap, offset), 1); kfree(ext); } diff --git a/mm/madvise.c b/mm/madvise.c index 972a9eaa898b..6fff1c1d2009 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -349,7 +349,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, if (non_swap_entry(entry)) continue; nr_swap--; - free_swap_and_cache(entry); + free_swap_and_cache(entry, 1); pte_clear_not_present_full(mm, addr, pte, tlb->fullmm); continue; } diff --git a/mm/memory.c b/mm/memory.c index d36b693cef61..f48ef19070b1 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1134,7 +1134,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, page = migration_entry_to_page(entry); rss[mm_counter(page)]--; } - if (unlikely(!free_swap_and_cache(entry))) + if (unlikely(!free_swap_and_cache(entry, 1))) print_bad_pte(vma, addr, ptent, NULL); pte_clear_not_present_full(mm, addr, pte, tlb->fullmm); } while (pte++, addr += PAGE_SIZE, addr != end); @@ -2818,7 +2818,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) } set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte); - swap_free(entry); + swap_free(entry, 1); if (mem_cgroup_swap_full(page) || (vma->vm_flags & VM_LOCKED) || PageMlocked(page)) try_to_free_swap(page); diff --git a/mm/shmem.c b/mm/shmem.c index 616cf1b085c5..8e79047dafa0 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -677,7 +677,7 @@ static int shmem_free_swap(struct address_space *mapping, xa_unlock_irq(&mapping->i_pages); if (old != radswap) return -ENOENT; - free_swap_and_cache(radix_to_swp_entry(radswap)); + free_swap_and_cache(radix_to_swp_entry(radswap), 1); return 0; } @@ -1212,7 +1212,7 @@ static int shmem_unuse_inode(struct shmem_inode_info *info, spin_lock_irq(&info->lock); info->swapped--; spin_unlock_irq(&info->lock); - swap_free(swap); + swap_free(swap, 1); } } return error; @@ -1751,7 +1751,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, delete_from_swap_cache(page); set_page_dirty(page); - swap_free(swap); + swap_free(swap, 1); } else { if (vma && userfaultfd_missing(vma)) { diff --git a/mm/swapfile.c b/mm/swapfile.c index 553d2551b35a..e06cc1581d1e 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -49,6 +49,9 @@ static bool swap_count_continued(struct swap_info_struct *, pgoff_t, unsigned char); static void free_swap_count_continuations(struct swap_info_struct *); static sector_t map_swap_entry(swp_entry_t, struct block_device**); +static bool __swap_page_trans_huge_swapped(struct swap_info_struct *si, + struct swap_cluster_info *ci, + unsigned long offset); DEFINE_SPINLOCK(swap_lock); static unsigned int nr_swapfiles; @@ -1267,19 +1270,106 @@ struct swap_info_struct *get_swap_device(swp_entry_t entry) return NULL; } -static unsigned char __swap_entry_free(struct swap_info_struct *p, - swp_entry_t entry, unsigned char usage) +#define SF_FREE_CACHE 0x1 + +static void __swap_free(struct swap_info_struct *p, swp_entry_t entry, + int entry_size, unsigned long flags) { struct swap_cluster_info *ci; unsigned long offset = swp_offset(entry); + int i, free_entries = 0, cache_only = 0; + int size = swap_entry_size(entry_size); + unsigned char *map, count; ci = lock_cluster_or_swap_info(p, offset); - usage = __swap_entry_free_locked(p, offset, usage); + VM_BUG_ON(!IS_ALIGNED(offset, size)); + /* + * Normal swap entry or huge swap cluster has been split, free + * each swap entry + */ + if (size == 1 || !cluster_is_huge(ci)) { + for (i = 0; i < size; i++, entry.val++) { + count = __swap_entry_free_locked(p, offset + i, 1); + if (!count || + (flags & SF_FREE_CACHE && + count == SWAP_HAS_CACHE && + !__swap_page_trans_huge_swapped(p, ci, + offset + i))) { + unlock_cluster_or_swap_info(p, ci); + if (!count) + free_swap_slot(entry); + else + __try_to_reclaim_swap(p, offset + i, + TTRS_UNMAPPED | TTRS_FULL); + if (i == size - 1) + return; + lock_cluster_or_swap_info(p, offset); + } + } + unlock_cluster_or_swap_info(p, ci); + return; + } + /* + * Return for normal swap entry above, the following code is + * for huge swap cluster only. + */ + cluster_add_swapcount(ci, -1); + /* + * Decrease mapping count for each swap entry in cluster. + * Because PMD swap mapping is counted in p->swap_map[] too. + */ + map = p->swap_map + offset; + for (i = 0; i < size; i++) { + /* + * Mark swap entries to become free as SWAP_MAP_BAD + * temporarily. + */ + if (map[i] == 1) { + map[i] = SWAP_MAP_BAD; + free_entries++; + } else if (__swap_entry_free_locked(p, offset + i, 1) == + SWAP_HAS_CACHE) + cache_only++; + } + /* + * If there are PMD swap mapping or the THP is in swap cache, + * it's impossible for some swap entries to become free. + */ + VM_BUG_ON(free_entries && + (cluster_swapcount(ci) || (map[0] & SWAP_HAS_CACHE))); + if (free_entries == SWAPFILE_CLUSTER) + memset(map, SWAP_HAS_CACHE, SWAPFILE_CLUSTER); + /* + * If there are no PMD swap mappings remain and the THP isn't + * in swap cache, split the huge swap cluster. + */ + else if (!cluster_swapcount(ci) && !(map[0] & SWAP_HAS_CACHE)) + cluster_clear_huge(ci); unlock_cluster_or_swap_info(p, ci); - if (!usage) - free_swap_slot(entry); - - return usage; + if (free_entries == SWAPFILE_CLUSTER) { + spin_lock(&p->lock); + mem_cgroup_uncharge_swap(entry, SWAPFILE_CLUSTER); + swap_free_cluster(p, offset / SWAPFILE_CLUSTER); + spin_unlock(&p->lock); + } else if (free_entries) { + ci = lock_cluster(p, offset); + for (i = 0; i < size; i++, entry.val++) { + /* + * To be freed swap entries are marked as SWAP_MAP_BAD + * temporarily as above + */ + if (map[i] == SWAP_MAP_BAD) { + map[i] = SWAP_HAS_CACHE; + unlock_cluster(ci); + free_swap_slot(entry); + if (i == size - 1) + return; + ci = lock_cluster(p, offset); + } + } + unlock_cluster(ci); + } else if (cache_only == SWAPFILE_CLUSTER && flags & SF_FREE_CACHE) + __try_to_reclaim_swap(p, offset, TTRS_UNMAPPED | TTRS_FULL); } static void swap_entry_free(struct swap_info_struct *p, swp_entry_t entry) @@ -1303,13 +1393,13 @@ static void swap_entry_free(struct swap_info_struct *p, swp_entry_t entry) * Caller has made sure that the swap device corresponding to entry * is still around or has not been recycled. */ -void swap_free(swp_entry_t entry) +void swap_free(swp_entry_t entry, int entry_size) { struct swap_info_struct *p; p = _swap_info_get(entry); if (p) - __swap_entry_free(p, entry, 1); + __swap_free(p, entry, entry_size, 0); } /* @@ -1545,29 +1635,33 @@ int swp_swapcount(swp_entry_t entry) return count; } -static bool swap_page_trans_huge_swapped(struct swap_info_struct *si, - swp_entry_t entry) +/* si->lock or ci->lock must be held before calling this function */ +static bool __swap_page_trans_huge_swapped(struct swap_info_struct *si, + struct swap_cluster_info *ci, + unsigned long offset) { - struct swap_cluster_info *ci; unsigned char *map = si->swap_map; - unsigned long roffset = swp_offset(entry); - unsigned long offset = round_down(roffset, SWAPFILE_CLUSTER); + unsigned long hoffset = round_down(offset, SWAPFILE_CLUSTER); int i; - bool ret = false; - ci = lock_cluster_or_swap_info(si, offset); - if (!ci || !cluster_is_huge(ci)) { - if (swap_count(map[roffset])) - ret = true; - goto unlock_out; - } + if (!ci || !cluster_is_huge(ci)) + return !!swap_count(map[offset]); for (i = 0; i < SWAPFILE_CLUSTER; i++) { - if (swap_count(map[offset + i])) { - ret = true; - break; - } + if (swap_count(map[hoffset + i])) + return true; } -unlock_out: + return false; +} + +static bool swap_page_trans_huge_swapped(struct swap_info_struct *si, + swp_entry_t entry) +{ + struct swap_cluster_info *ci; + unsigned long offset = swp_offset(entry); + bool ret; + + ci = lock_cluster_or_swap_info(si, offset); + ret = __swap_page_trans_huge_swapped(si, ci, offset); unlock_cluster_or_swap_info(si, ci); return ret; } @@ -1739,22 +1833,17 @@ int try_to_free_swap(struct page *page) * Free the swap entry like above, but also try to * free the page cache entry if it is the last user. */ -int free_swap_and_cache(swp_entry_t entry) +int free_swap_and_cache(swp_entry_t entry, int entry_size) { struct swap_info_struct *p; - unsigned char count; if (non_swap_entry(entry)) return 1; p = _swap_info_get(entry); - if (p) { - count = __swap_entry_free(p, entry, 1); - if (count == SWAP_HAS_CACHE && - !swap_page_trans_huge_swapped(p, entry)) - __try_to_reclaim_swap(p, swp_offset(entry), - TTRS_UNMAPPED | TTRS_FULL); - } + if (p) + __swap_free(p, entry, entry_size, SF_FREE_CACHE); + return p != NULL; } @@ -1901,7 +1990,7 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd, } set_pte_at(vma->vm_mm, addr, pte, pte_mkold(mk_pte(page, vma->vm_page_prot))); - swap_free(entry); + swap_free(entry, 1); /* * Move the page to the active list so it is not * immediately swapped out again after swapon. @@ -2340,6 +2429,16 @@ int try_to_unuse(unsigned int type, bool frontswap, } mmput(start_mm); + + /* + * Swap entries may be marked as SWAP_MAP_BAD temporarily in + * __swap_free() before being freed really. + * find_next_to_unuse() will skip these swap entries, that is + * OK. But we need to wait until they are freed really. + */ + while (!retval && READ_ONCE(si->inuse_pages)) + schedule_timeout_uninterruptible(1); + return retval; } From patchwork Tue Sep 25 07:13:33 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10613501 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EB2CF161F for ; Tue, 25 Sep 2018 07:13:51 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9B84D28346 for ; Tue, 25 Sep 2018 07:13:51 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8F16029B13; Tue, 25 Sep 2018 07:13:51 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CDE8628346 for ; Tue, 25 Sep 2018 07:13:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2C2298E006F; Tue, 25 Sep 2018 03:13:48 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 24B6D8E0041; Tue, 25 Sep 2018 03:13:48 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 075B58E006F; Tue, 25 Sep 2018 03:13:48 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) by kanga.kvack.org (Postfix) with ESMTP id B57908E0041 for ; Tue, 25 Sep 2018 03:13:47 -0400 (EDT) Received: by mail-pf1-f198.google.com with SMTP id x85-v6so11948000pfe.13 for ; Tue, 25 Sep 2018 00:13:47 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=dlQlA87oJTdqX12aTN6umkiUPMxs7smtDND6snMEmNQ=; b=KvwcSnLep+Y1lEPQwCOziCj1cEVtzd6+lZdA1VzZjwQD2IyvPtibN+sp7R2YRv0Nhf EF9e+9Qddx0FZP47ttbWa8GzOnIgf80MEwK2FkPdBSzGrBTnLTwqIL+0HYw3yNFMAksp cvnwU+GA+CP8j88q+BBDAcFRulc2iZohVcbPgOrnCzGuOZI2XUWCR9dvC7VM82RqwbRU MijhZ2aP/2/9J4DyvpJsNQkjDC+BJA68iqu3fyh/nSO/R05WXh09Ai7LAA66Y4gtsZBZ JoswySYwwu4M8jWiYZVmdpHJjjTcveNaRpzJ1UEzGcQRkW33M2m+tS6NuiB8DeCtI0Il 52mg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfogbSRprk/0wCW9orYum07bA1All2Ya38K9IserVs+hAHK2Ymmv1 8SLvw5UL74xvAXmXefrePrVnSfSZ5up8Q0kEMJppxFkNTArJaLDrVN8KBuosLx0NfyRMz3nhho0 SoI0GcQf5GH+0bh4DdbB0SvESCy0MHXlNvsBEhOpjWxX4p1lshodDuSri/6VZfSC12g== X-Received: by 2002:a63:aa48:: with SMTP id x8-v6mr1989396pgo.87.1537859627411; Tue, 25 Sep 2018 00:13:47 -0700 (PDT) X-Google-Smtp-Source: ACcGV62wYqVJmIiLKOo0TK5KWAU1XXM30JoSo3EUJ81OPRORkrqnFwGK+TX5Rmk2hm9aalu1aGD7 X-Received: by 2002:a63:aa48:: with SMTP id x8-v6mr1989332pgo.87.1537859626449; Tue, 25 Sep 2018 00:13:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537859626; cv=none; d=google.com; s=arc-20160816; b=0KZTx/2wDg9j7havVoDkeG4FgmAa8swPkxdABYszCKewdn00+ldUbycUHeRmR1lPli F0Bx95mZElFdRgdAZt3chOFTyisMRrebm0c//ICHD0+bBNYZ/Je8CZEEVYaLCy5WzSqN SSVLwuL6LBxzlJAtEf1Ds9TJmCferQsO8CLlP4HSpydFM62+29/+a1gTMVO6m/5dLoHU B7eJfzyWtGO58rPINYW5nUrKjsl76RO2SQ49R78smUyEKp0DZbanfSwfXK/U/JgUbx6B CdS1NtYwQVebDhf02RWIrZ59HWp+IS/DOv3n91dFO0FM6WCkXNvvPwv7Dsxj1BIIpTtI LDLQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=dlQlA87oJTdqX12aTN6umkiUPMxs7smtDND6snMEmNQ=; b=08XqtLcYJnKi2xyFBKPl9/S9OHSS45TM79SE/g54rozSKsRKHbTVkDJkFqa+GwMKB6 1HMCiNHP+hyjlf3GeQ6cBB8S/E826zWyD1oNYeQ50NZSlf/xpY6OToDQi42zfnp896OW oFqhopK6TzVLhFGX96wsG13fCaaNPxatHZIigiQLRO59y2adrj7QWMs5XP44IGq9/oqm 3Bpmw7Li6kZhvIP5CFAdHvKRwk8Tk+lcDtBN4t+dANdeBG8NhGL1t0ZUimtwLXyvMgCf 8I9iuhDT8uWTfZjs+Lyc9rqdp4ZJITKgyYnR+1N1j0imKIauHJk44m5bB2QDbxQoins8 /YuA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga14.intel.com (mga14.intel.com. [192.55.52.115]) by mx.google.com with ESMTPS id h18-v6si1501855pgl.398.2018.09.25.00.13.46 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 25 Sep 2018 00:13:46 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) client-ip=192.55.52.115; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 25 Sep 2018 00:13:46 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,301,1534834800"; d="scan'208";a="89093533" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 25 Sep 2018 00:13:42 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 RESEND 06/21] swap: Support PMD swap mapping when splitting huge PMD Date: Tue, 25 Sep 2018 15:13:33 +0800 Message-Id: <20180925071348.31458-7-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180925071348.31458-1-ying.huang@intel.com> References: <20180925071348.31458-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP A huge PMD need to be split when zap a part of the PMD mapping etc. If the PMD mapping is a swap mapping, we need to split it too. This patch implemented the support for this. This is similar as splitting the PMD page mapping, except we need to decrease the PMD swap mapping count for the huge swap cluster too. If the PMD swap mapping count becomes 0, the huge swap cluster will be split. Notice: is_huge_zero_pmd() and pmd_page() doesn't work well with swap PMD, so pmd_present() check is called before them. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/huge_mm.h | 4 ++++ include/linux/swap.h | 6 ++++++ mm/huge_memory.c | 48 +++++++++++++++++++++++++++++++++++++++++++----- mm/swapfile.c | 32 ++++++++++++++++++++++++++++++++ 4 files changed, 85 insertions(+), 5 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 99c19b06d9a4..0f3e1739986f 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -226,6 +226,10 @@ static inline bool is_huge_zero_page(struct page *page) return READ_ONCE(huge_zero_page) == page; } +/* + * is_huge_zero_pmd() must be called after checking pmd_present(), + * otherwise, it may report false positive for PMD swap entry. + */ static inline bool is_huge_zero_pmd(pmd_t pmd) { return is_huge_zero_page(pmd_page(pmd)); diff --git a/include/linux/swap.h b/include/linux/swap.h index 88eb06eb1444..5bd54b6fd4a1 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -618,11 +618,17 @@ static inline swp_entry_t get_swap_page(struct page *page) #ifdef CONFIG_THP_SWAP extern int split_swap_cluster(swp_entry_t entry); +extern int split_swap_cluster_map(swp_entry_t entry); #else static inline int split_swap_cluster(swp_entry_t entry) { return 0; } + +static inline int split_swap_cluster_map(swp_entry_t entry) +{ + return 0; +} #endif #ifdef CONFIG_MEMCG diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 63edf18ca9cc..3ea7318fcdcd 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1604,6 +1604,40 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd) return 0; } +/* Convert a PMD swap mapping to a set of PTE swap mappings */ +static void __split_huge_swap_pmd(struct vm_area_struct *vma, + unsigned long haddr, + pmd_t *pmd) +{ + struct mm_struct *mm = vma->vm_mm; + pgtable_t pgtable; + pmd_t _pmd; + swp_entry_t entry; + int i, soft_dirty; + + entry = pmd_to_swp_entry(*pmd); + soft_dirty = pmd_soft_dirty(*pmd); + + split_swap_cluster_map(entry); + + pgtable = pgtable_trans_huge_withdraw(mm, pmd); + pmd_populate(mm, &_pmd, pgtable); + + for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE, entry.val++) { + pte_t *pte, ptent; + + pte = pte_offset_map(&_pmd, haddr); + VM_BUG_ON(!pte_none(*pte)); + ptent = swp_entry_to_pte(entry); + if (soft_dirty) + ptent = pte_swp_mksoft_dirty(ptent); + set_pte_at(mm, haddr, pte, ptent); + pte_unmap(pte); + } + smp_wmb(); /* make pte visible before pmd */ + pmd_populate(mm, pmd, pgtable); +} + /* * Return true if we do MADV_FREE successfully on entire pmd page. * Otherwise, return false. @@ -2070,7 +2104,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, VM_BUG_ON(haddr & ~HPAGE_PMD_MASK); VM_BUG_ON_VMA(vma->vm_start > haddr, vma); VM_BUG_ON_VMA(vma->vm_end < haddr + HPAGE_PMD_SIZE, vma); - VM_BUG_ON(!is_pmd_migration_entry(*pmd) && !pmd_trans_huge(*pmd) + VM_BUG_ON(!is_swap_pmd(*pmd) && !pmd_trans_huge(*pmd) && !pmd_devmap(*pmd)); count_vm_event(THP_SPLIT_PMD); @@ -2094,7 +2128,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, put_page(page); add_mm_counter(mm, mm_counter_file(page), -HPAGE_PMD_NR); return; - } else if (is_huge_zero_pmd(*pmd)) { + } else if (pmd_present(*pmd) && is_huge_zero_pmd(*pmd)) { /* * FIXME: Do we want to invalidate secondary mmu by calling * mmu_notifier_invalidate_range() see comments below inside @@ -2138,6 +2172,9 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, page = pfn_to_page(swp_offset(entry)); } else #endif + if (IS_ENABLED(CONFIG_THP_SWAP) && is_swap_pmd(old_pmd)) + return __split_huge_swap_pmd(vma, haddr, pmd); + else page = pmd_page(old_pmd); VM_BUG_ON_PAGE(!page_count(page), page); page_ref_add(page, HPAGE_PMD_NR - 1); @@ -2229,14 +2266,15 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, * pmd against. Otherwise we can end up replacing wrong page. */ VM_BUG_ON(freeze && !page); - if (page && page != pmd_page(*pmd)) - goto out; + /* pmd_page() should be called only if pmd_present() */ + if (page && (!pmd_present(*pmd) || page != pmd_page(*pmd))) + goto out; if (pmd_trans_huge(*pmd)) { page = pmd_page(*pmd); if (PageMlocked(page)) clear_page_mlock(page); - } else if (!(pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd))) + } else if (!(pmd_devmap(*pmd) || is_swap_pmd(*pmd))) goto out; __split_huge_pmd_locked(vma, pmd, haddr, freeze); out: diff --git a/mm/swapfile.c b/mm/swapfile.c index e06cc1581d1e..16723b9d971a 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -4034,6 +4034,38 @@ void mem_cgroup_throttle_swaprate(struct mem_cgroup *memcg, int node, } #endif +#ifdef CONFIG_THP_SWAP +/* + * The corresponding page table shouldn't be changed under us, that + * is, the page table lock should be held. + */ +int split_swap_cluster_map(swp_entry_t entry) +{ + struct swap_info_struct *si; + struct swap_cluster_info *ci; + unsigned long offset = swp_offset(entry); + + VM_BUG_ON(!IS_ALIGNED(offset, SWAPFILE_CLUSTER)); + si = _swap_info_get(entry); + if (!si) + return -EBUSY; + ci = lock_cluster(si, offset); + /* The swap cluster has been split by someone else, we are done */ + if (!cluster_is_huge(ci)) + goto out; + cluster_add_swapcount(ci, -1); + /* + * If the last PMD swap mapping has gone and the THP isn't in + * swap cache, the huge swap cluster will be split. + */ + if (!cluster_swapcount(ci) && !(si->swap_map[offset] & SWAP_HAS_CACHE)) + cluster_clear_huge(ci); +out: + unlock_cluster(ci); + return 0; +} +#endif + static int __init swapfile_init(void) { int nid; From patchwork Tue Sep 25 07:13:34 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10613505 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 61E43161F for ; Tue, 25 Sep 2018 07:13:55 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5014028346 for ; Tue, 25 Sep 2018 07:13:55 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 43F1529B17; Tue, 25 Sep 2018 07:13:55 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8931628346 for ; Tue, 25 Sep 2018 07:13:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 77C508E0070; Tue, 25 Sep 2018 03:13:51 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 705F98E0041; Tue, 25 Sep 2018 03:13:51 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 580748E0070; Tue, 25 Sep 2018 03:13:51 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by kanga.kvack.org (Postfix) with ESMTP id 0DCFB8E0041 for ; Tue, 25 Sep 2018 03:13:51 -0400 (EDT) Received: by mail-pf1-f200.google.com with SMTP id i68-v6so11914160pfb.9 for ; Tue, 25 Sep 2018 00:13:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=KDLATJgJiurRP0EBbTfS5YPPBc5W5OFC9q3Ld6+ectA=; b=qx/PeJmC9qs/MFDDv4VIjZmJtpiSyxccPQWS5teEvkBkmgiVdxfT/T6wcASBdw+1C9 +DykIy9MuBAPOOQhYye2pqT+NCbB3wBD0fRqm97KCZC6Eu1XQTzXq/SjLZEmu5rKYb4L H8cifVDyrFRbaj6JEhKF2nFdC5ycNSczdy4UJA7sD29T3AEUdtDPNWkeRCZCpQaul+AE IpAXT5+Eo0X2/Loc0TJ103KACi8zKTTy9FY6nwDRC1ao0DRwaCcnEcSwPSJBLqwTdwqS gqIFmhLkOP8GURe+gPtRVwDVdX+3Duijzspy4y6YDWF7phGw4UhZp/JgS+89ctF2mjUd IcIw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfojYuH+hlZY3vQzGSZeW7MmAse26145coNF/ZgDky/7MFfERWxA+ fm8RH8+25qKEmAI5ilUYDwf7Gn5srqZSPOk/c1h2fsWeTssAOrZBav4F9/y2KueWaU5GkTmStck jsvDCE/fJWloVEwEdAWF4PMBjYQ1+b8lhfRcXk4MaP/g8XTOXViXeEBdn8w2v+PORNw== X-Received: by 2002:a63:7156:: with SMTP id b22-v6mr2023776pgn.342.1537859630719; Tue, 25 Sep 2018 00:13:50 -0700 (PDT) X-Google-Smtp-Source: ACcGV62gzDK14Icu7MI/6TllGi39dg6wffbHMRxsRnGz5uF0KieboHfU1ExMG4UI5n9bbIuAiMZM X-Received: by 2002:a63:7156:: with SMTP id b22-v6mr2023724pgn.342.1537859629847; Tue, 25 Sep 2018 00:13:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537859629; cv=none; d=google.com; s=arc-20160816; b=VyG5NgLv6eTmVpKe+mRrO7Uw74lfbEYPQMCm7jWCdA4h5yi8XyOheCDuSK7eJQvZ2Y AQ2QcKD/8hfRlgp5FQVZ/ugVRVi1jjuvP4YKyi+1BBwEVl66JzQuyiP8unnKat6URgNU Y0n1lQYZohVGT9nRBIcyinc35l/OlEve9ouMcPEDAvHzztLi0CRkcOZUCBa3CiDG5GEL WUlSNgy9LQO8TA8vHOUbtjogfo98V/AsQuMylVTPYJUveszOVbcncW0s5YfZLyZDod/C GxCMpWbHMGrhApL+AugrpQiTVc3VpG9rcG/0MAQY46+R312yROTVfwTgdpSUxxKoM5Qk Fq4g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=KDLATJgJiurRP0EBbTfS5YPPBc5W5OFC9q3Ld6+ectA=; b=Yrf7oK/e953roiIvUEdfYqlzn/r6nQe+dFsHl3fN4RskEOiUlIfJ2YaU4PLNhiJRIf bCphUYN1T9Tw0EZwwArX87ISvb3ssfm2I7355aaC9BMM0MdZbuwXF2oYT0TW5s62e6eX LrcSd+arzxNAegDbwvnuaozgyC9TZ3T0dkwmKJydMpagFIvjkYoOyuTFmRQYggVIvKHH Y2wIKpx8h8jHFwFd4kkpR61gqGGoRhS5LYJBeEzd53bfSnabBNxqQLc7kDHPt71OhrO9 f70UhTthT8M3ZY3gC4zKMEgaJni2oNKQQ2HJQaPY/7PHq0urHC38Ycj+Giqlza7G/hRh 3FuA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga14.intel.com (mga14.intel.com. [192.55.52.115]) by mx.google.com with ESMTPS id h18-v6si1501855pgl.398.2018.09.25.00.13.49 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 25 Sep 2018 00:13:49 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) client-ip=192.55.52.115; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 25 Sep 2018 00:13:49 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,301,1534834800"; d="scan'208";a="89093547" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 25 Sep 2018 00:13:46 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 RESEND 07/21] swap: Support PMD swap mapping in split_swap_cluster() Date: Tue, 25 Sep 2018 15:13:34 +0800 Message-Id: <20180925071348.31458-8-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180925071348.31458-1-ying.huang@intel.com> References: <20180925071348.31458-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When splitting a THP in swap cache or failing to allocate a THP when swapin a huge swap cluster, the huge swap cluster will be split. In addition to clear the huge flag of the swap cluster, the PMD swap mapping count recorded in cluster_count() will be set to 0. But we will not touch PMD swap mappings themselves, because it is hard to find them all sometimes. When the PMD swap mappings are operated later, it will be found that the huge swap cluster has been split and the PMD swap mappings will be split at that time. Unless splitting a THP in swap cache (specified via "force" parameter), split_swap_cluster() will return -EEXIST if there is SWAP_HAS_CACHE flag in swap_map[offset]. Because this indicates there is a THP corresponds to this huge swap cluster, and it isn't desired to split the THP. When splitting a THP in swap cache, the position to call split_swap_cluster() is changed to before unlocking sub-pages. So that all sub-pages will be kept locked from the THP has been split to the huge swap cluster is split. This makes the code much easier to be reasoned. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/swap.h | 6 ++++-- mm/huge_memory.c | 18 ++++++++++------ mm/swapfile.c | 58 +++++++++++++++++++++++++++++++++++++--------------- 3 files changed, 57 insertions(+), 25 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 5bd54b6fd4a1..48c159994438 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -616,11 +616,13 @@ static inline swp_entry_t get_swap_page(struct page *page) #endif /* CONFIG_SWAP */ +#define SSC_SPLIT_CACHED 0x1 + #ifdef CONFIG_THP_SWAP -extern int split_swap_cluster(swp_entry_t entry); +extern int split_swap_cluster(swp_entry_t entry, unsigned long flags); extern int split_swap_cluster_map(swp_entry_t entry); #else -static inline int split_swap_cluster(swp_entry_t entry) +static inline int split_swap_cluster(swp_entry_t entry, unsigned long flags) { return 0; } diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 3ea7318fcdcd..e5d995195fd9 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2497,6 +2497,17 @@ static void __split_huge_page(struct page *page, struct list_head *list, unfreeze_page(head); + /* + * Split swap cluster before unlocking sub-pages. So all + * sub-pages will be kept locked from THP has been split to + * swap cluster is split. + */ + if (PageSwapCache(head)) { + swp_entry_t entry = { .val = page_private(head) }; + + split_swap_cluster(entry, SSC_SPLIT_CACHED); + } + for (i = 0; i < HPAGE_PMD_NR; i++) { struct page *subpage = head + i; if (subpage == page) @@ -2723,12 +2734,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) __dec_node_page_state(page, NR_SHMEM_THPS); spin_unlock(&pgdata->split_queue_lock); __split_huge_page(page, list, flags); - if (PageSwapCache(head)) { - swp_entry_t entry = { .val = page_private(head) }; - - ret = split_swap_cluster(entry); - } else - ret = 0; + ret = 0; } else { if (IS_ENABLED(CONFIG_DEBUG_VM) && mapcount) { pr_alert("total_mapcount: %u, page_count(): %u\n", diff --git a/mm/swapfile.c b/mm/swapfile.c index 16723b9d971a..ef2b42c199c0 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1469,23 +1469,6 @@ void put_swap_page(struct page *page, swp_entry_t entry) unlock_cluster_or_swap_info(si, ci); } -#ifdef CONFIG_THP_SWAP -int split_swap_cluster(swp_entry_t entry) -{ - struct swap_info_struct *si; - struct swap_cluster_info *ci; - unsigned long offset = swp_offset(entry); - - si = _swap_info_get(entry); - if (!si) - return -EBUSY; - ci = lock_cluster(si, offset); - cluster_clear_huge(ci); - unlock_cluster(ci); - return 0; -} -#endif - static int swp_entry_cmp(const void *ent1, const void *ent2) { const swp_entry_t *e1 = ent1, *e2 = ent2; @@ -4064,6 +4047,47 @@ int split_swap_cluster_map(swp_entry_t entry) unlock_cluster(ci); return 0; } + +/* + * We will not try to split all PMD swap mappings to the swap cluster, + * because we haven't enough information available for that. Later, + * when the PMD swap mapping is duplicated or swapin, etc, the PMD + * swap mapping will be split and fallback to the PTE operations. + */ +int split_swap_cluster(swp_entry_t entry, unsigned long flags) +{ + struct swap_info_struct *si; + struct swap_cluster_info *ci; + unsigned long offset = swp_offset(entry); + int ret = 0; + + si = get_swap_device(entry); + if (!si) + return -EINVAL; + ci = lock_cluster(si, offset); + /* The swap cluster has been split by someone else, we are done */ + if (!cluster_is_huge(ci)) + goto out; + VM_BUG_ON(!IS_ALIGNED(offset, SWAPFILE_CLUSTER)); + VM_BUG_ON(cluster_count(ci) < SWAPFILE_CLUSTER); + /* + * If not requested, don't split swap cluster that has SWAP_HAS_CACHE + * flag. When the flag is cleared later, the huge swap cluster will + * be split if there is no PMD swap mapping. + */ + if (!(flags & SSC_SPLIT_CACHED) && + si->swap_map[offset] & SWAP_HAS_CACHE) { + ret = -EEXIST; + goto out; + } + cluster_set_swapcount(ci, 0); + cluster_clear_huge(ci); + +out: + unlock_cluster(ci); + put_swap_device(si); + return ret; +} #endif static int __init swapfile_init(void) From patchwork Tue Sep 25 07:13:35 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10613511 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EFEC1161F for ; Tue, 25 Sep 2018 07:14:08 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DCF7D29B12 for ; Tue, 25 Sep 2018 07:14:08 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D13EC29B17; Tue, 25 Sep 2018 07:14:08 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D46F129B12 for ; Tue, 25 Sep 2018 07:14:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3D02D8E0073; Tue, 25 Sep 2018 03:14:06 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 381F38E0071; Tue, 25 Sep 2018 03:14:06 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 222FA8E0073; Tue, 25 Sep 2018 03:14:06 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f197.google.com (mail-pf1-f197.google.com [209.85.210.197]) by kanga.kvack.org (Postfix) with ESMTP id BFCDF8E0041 for ; Tue, 25 Sep 2018 03:14:05 -0400 (EDT) Received: by mail-pf1-f197.google.com with SMTP id j15-v6so11816169pff.12 for ; Tue, 25 Sep 2018 00:14:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=GlUu/+BHbVZHYZvEoYBqwPfb59a3IWFQ/2ZBJYUuCgM=; b=LsOsreTYLNVkQqToVd/spwwElwq26VLAPcSNrDVkY01o4Q9uTPBp/8lCRmJnUJ6Y8N VdwuuRhJl+TNUqM07vm9zJ/mfeys8YJfzF+4qfUEFguUUIHyIH923cbJfJ7/PGJ+aYLX EyWitJfn7mMMDM2ViTxPC//X6bwb/JmZeNfwYBNdQs4Uaoh4MtuGfFUoTc0u1e4OryZz RRFCb+JL34UzJV7R3eufvl6caKloGZHpBjs+qz0GZF++R0DQXjUIr2dJCKBunWd8Byly Gqc65iLICucWrSUNTzh5cn1zbb2HFikOofuS0Mf4IPUYdL2zrJmyB/5H95/utJ/BKg18 g1gw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfojVFH/Fz5Skd0BlDrpzWv+2kpl2wC0/w6fQ0ARrCFzZ7pAGuUxh VWWwgkJ/OeU1FmtOILuqx/tp3Z+5ZsO1bwpiW9w+OMBnojX46Vwu/YfgTKTCiwzA/VQts4IxG3r qv2AZW20x8vujnS4bBg4dEJN21x6QiwwOYVzBlwNfTCtuReZef+FjqDAfpvscJ+yYRg== X-Received: by 2002:a62:1b44:: with SMTP id b65-v6mr2220135pfb.172.1537859645436; Tue, 25 Sep 2018 00:14:05 -0700 (PDT) X-Google-Smtp-Source: ACcGV63rGQo069ZjTjtBb8jEG4f7fRI3pqEGHVkrSf/Fm4KAv/AbDhZCYjARvJ5XM9ggNlXFtY3f X-Received: by 2002:a62:1b44:: with SMTP id b65-v6mr2220069pfb.172.1537859644497; Tue, 25 Sep 2018 00:14:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537859644; cv=none; d=google.com; s=arc-20160816; b=lFcCAwO4sBzW4eG/meXFf2Oc5AevfYHDruzUf6gPsqfGuHLg1fUXs7TPrSuAcyZfm4 4zNjkQbg4uCdvzhzH8sLntIcllPvW3FFrR+qfK66exatX+ZVlYKd065pJXBcZtRJn7G+ NmtwiQpgO+9EpsWB42qcYXWa5tW/ZT5j1rOTQ/6ZoC1qLo3zeKiXjB6twrkHYt/4e1KO ODQnDQMFlh+7/1EKKwHQ46Zhlj1elCPOqzcE3EqrAQTbjDKItX4cMWQD6u+UW9E5sB4t LFHfCbLYscO0E4xkkc0+ycXv6ajPIJlve1PHsnK9MKQKlGUrEZXLULStT0Hr7pKlz0mx kptA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=GlUu/+BHbVZHYZvEoYBqwPfb59a3IWFQ/2ZBJYUuCgM=; b=zLJRrQJWAqKSUkff/g4kz7g7ae8AvxoxFPlDQjYLHsbmughkpX0Zj9CQ7IidEM57Fd xvWq5WRx5XdeJwwJmMdblYP3vLV+o01Nrg4BkvGnweuibhMTaB70Nk72T9fvcDozY3tW 4uW2uJK7HlTtT3tJuljckWqcHG7L7o09DyQRLS0ffiFfz4lqWLRwn48oLJZL3DqkUyVc C166RbdsRywAVHYBdSXS+o/nqhowU1i6G7vA/RFYdmuTfBEvIom2jBpfM5AB3hjyhHro YVBA4/ku5tnE5h3fLAGK3AhaYIniFpiHONkWyruOxyifwX5ZmiFqL9APtjojqrO8iAuJ pZ0g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga02.intel.com (mga02.intel.com. [134.134.136.20]) by mx.google.com with ESMTPS id z127-v6si1787028pgb.455.2018.09.25.00.14.04 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 25 Sep 2018 00:14:04 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.20 as permitted sender) client-ip=134.134.136.20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 25 Sep 2018 00:14:03 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,301,1534834800"; d="scan'208";a="89093556" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 25 Sep 2018 00:13:49 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 RESEND 08/21] swap: Support to read a huge swap cluster for swapin a THP Date: Tue, 25 Sep 2018 15:13:35 +0800 Message-Id: <20180925071348.31458-9-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180925071348.31458-1-ying.huang@intel.com> References: <20180925071348.31458-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP To swapin a THP in one piece, we need to read a huge swap cluster from the swap device. This patch revised the __read_swap_cache_async() and its callers and callees to support this. If __read_swap_cache_async() find the swap cluster of the specified swap entry is huge, it will try to allocate a THP, add it into the swap cache. So later the contents of the huge swap cluster can be read into the THP. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/huge_mm.h | 38 ++++++++++++++++++++++++++ include/linux/swap.h | 4 +-- mm/huge_memory.c | 26 ------------------ mm/swap_state.c | 72 ++++++++++++++++++++++++++++++++++++------------- mm/swapfile.c | 9 ++++--- 5 files changed, 99 insertions(+), 50 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 0f3e1739986f..3fdb29bc250c 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -250,6 +250,39 @@ static inline bool thp_migration_supported(void) return IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION); } +/* + * always: directly stall for all thp allocations + * defer: wake kswapd and fail if not immediately available + * defer+madvise: wake kswapd and directly stall for MADV_HUGEPAGE, otherwise + * fail if not immediately available + * madvise: directly stall for MADV_HUGEPAGE, otherwise fail if not immediately + * available + * never: never stall for any thp allocation + */ +static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) +{ + bool vma_madvised; + + if (!vma) + return GFP_TRANSHUGE_LIGHT; + vma_madvised = !!(vma->vm_flags & VM_HUGEPAGE); + if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, + &transparent_hugepage_flags)) + return GFP_TRANSHUGE | (vma_madvised ? 0 : __GFP_NORETRY); + if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, + &transparent_hugepage_flags)) + return GFP_TRANSHUGE_LIGHT | __GFP_KSWAPD_RECLAIM; + if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, + &transparent_hugepage_flags)) + return GFP_TRANSHUGE_LIGHT | + (vma_madvised ? __GFP_DIRECT_RECLAIM : + __GFP_KSWAPD_RECLAIM); + if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, + &transparent_hugepage_flags)) + return GFP_TRANSHUGE_LIGHT | + (vma_madvised ? __GFP_DIRECT_RECLAIM : 0); + return GFP_TRANSHUGE_LIGHT; +} #else /* CONFIG_TRANSPARENT_HUGEPAGE */ #define HPAGE_PMD_SHIFT ({ BUILD_BUG(); 0; }) #define HPAGE_PMD_MASK ({ BUILD_BUG(); 0; }) @@ -363,6 +396,11 @@ static inline bool thp_migration_supported(void) { return false; } + +static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) +{ + return 0; +} #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #endif /* _LINUX_HUGE_MM_H */ diff --git a/include/linux/swap.h b/include/linux/swap.h index 48c159994438..f0424db46add 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -462,7 +462,7 @@ extern sector_t map_swap_page(struct page *, struct block_device **); extern sector_t swapdev_block(int, pgoff_t); extern int page_swapcount(struct page *); extern int __swap_count(swp_entry_t entry); -extern int __swp_swapcount(swp_entry_t entry); +extern int __swp_swapcount(swp_entry_t entry, int *entry_size); extern int swp_swapcount(swp_entry_t entry); extern struct swap_info_struct *page_swap_info(struct page *); extern struct swap_info_struct *swp_swap_info(swp_entry_t entry); @@ -589,7 +589,7 @@ static inline int __swap_count(swp_entry_t entry) return 0; } -static inline int __swp_swapcount(swp_entry_t entry) +static inline int __swp_swapcount(swp_entry_t entry, int *entry_size) { return 0; } diff --git a/mm/huge_memory.c b/mm/huge_memory.c index e5d995195fd9..4d4a447c29a8 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -620,32 +620,6 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf, } -/* - * always: directly stall for all thp allocations - * defer: wake kswapd and fail if not immediately available - * defer+madvise: wake kswapd and directly stall for MADV_HUGEPAGE, otherwise - * fail if not immediately available - * madvise: directly stall for MADV_HUGEPAGE, otherwise fail if not immediately - * available - * never: never stall for any thp allocation - */ -static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) -{ - const bool vma_madvised = !!(vma->vm_flags & VM_HUGEPAGE); - - if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags)) - return GFP_TRANSHUGE | (vma_madvised ? 0 : __GFP_NORETRY); - if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags)) - return GFP_TRANSHUGE_LIGHT | __GFP_KSWAPD_RECLAIM; - if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, &transparent_hugepage_flags)) - return GFP_TRANSHUGE_LIGHT | (vma_madvised ? __GFP_DIRECT_RECLAIM : - __GFP_KSWAPD_RECLAIM); - if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags)) - return GFP_TRANSHUGE_LIGHT | (vma_madvised ? __GFP_DIRECT_RECLAIM : - 0); - return GFP_TRANSHUGE_LIGHT; -} - /* Caller must hold page table lock. */ static bool set_huge_zero_page(pgtable_t pgtable, struct mm_struct *mm, struct vm_area_struct *vma, unsigned long haddr, pmd_t *pmd, diff --git a/mm/swap_state.c b/mm/swap_state.c index 376327b7b442..06f1b39e2fa8 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -385,7 +385,9 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, { struct page *found_page = NULL, *new_page = NULL; struct swap_info_struct *si; - int err; + int err, entry_size = 1; + swp_entry_t hentry; + *new_page_allocated = false; do { @@ -411,14 +413,40 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, * as SWAP_HAS_CACHE. That's done in later part of code or * else swap_off will be aborted if we return NULL. */ - if (!__swp_swapcount(entry) && swap_slot_cache_enabled) + if (!__swp_swapcount(entry, &entry_size) && + swap_slot_cache_enabled) break; /* * Get a new page to read into from swap. */ - if (!new_page) { - new_page = alloc_page_vma(gfp_mask, vma, addr); + if (!new_page || + (IS_ENABLED(CONFIG_THP_SWAP) && + hpage_nr_pages(new_page) != entry_size)) { + if (new_page) + put_page(new_page); + if (IS_ENABLED(CONFIG_THP_SWAP) && + entry_size == HPAGE_PMD_NR) { + gfp_t gfp = alloc_hugepage_direct_gfpmask(vma); + + /* + * Make sure huge page allocation flags are + * compatible with that of normal page + */ + VM_WARN_ONCE(gfp_mask & ~(gfp | __GFP_RECLAIM), + "ignoring gfp_mask bits: %x", + gfp_mask & ~(gfp | __GFP_RECLAIM)); + new_page = alloc_hugepage_vma(gfp, vma, + addr, HPAGE_PMD_ORDER); + if (new_page) + prep_transhuge_page(new_page); + hentry = swp_entry(swp_type(entry), + round_down(swp_offset(entry), + HPAGE_PMD_NR)); + } else { + new_page = alloc_page_vma(gfp_mask, vma, addr); + hentry = entry; + } if (!new_page) break; /* Out of memory */ } @@ -426,16 +454,18 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, /* * call radix_tree_preload() while we can wait. */ - err = radix_tree_maybe_preload(gfp_mask & GFP_KERNEL); + err = radix_tree_maybe_preload_order(gfp_mask & GFP_KERNEL, + compound_order(new_page)); if (err) break; /* * Swap entry may have been freed since our caller observed it. */ - err = swapcache_prepare(entry, 1); - if (err == -EEXIST) { + err = swapcache_prepare(hentry, entry_size); + if (err) radix_tree_preload_end(); + if (err == -EEXIST) { /* * We might race against get_swap_page() and stumble * across a SWAP_HAS_CACHE swap_map entry whose page @@ -443,33 +473,36 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, */ cond_resched(); continue; - } - if (err) { /* swp entry is obsolete ? */ - radix_tree_preload_end(); + } else if (err == -ENOTDIR) { + /* huge swap cluster has been split under us */ + continue; + } else if (err) { /* swp entry is obsolete ? */ break; } /* May fail (-ENOMEM) if radix-tree node allocation failed. */ __SetPageLocked(new_page); __SetPageSwapBacked(new_page); - err = __add_to_swap_cache(new_page, entry); + err = __add_to_swap_cache(new_page, hentry); + radix_tree_preload_end(); if (likely(!err)) { - radix_tree_preload_end(); /* * Initiate read into locked page and return. */ SetPageWorkingset(new_page); lru_cache_add_anon(new_page); *new_page_allocated = true; + if (IS_ENABLED(CONFIG_THP_SWAP)) + new_page += swp_offset(entry) & + (entry_size - 1); return new_page; } - radix_tree_preload_end(); __ClearPageLocked(new_page); /* * add_to_swap_cache() doesn't return -EEXIST, so we can safely * clear SWAP_HAS_CACHE flag. */ - put_swap_page(new_page, entry); + put_swap_page(new_page, hentry); } while (err != -ENOMEM); if (new_page) @@ -491,7 +524,7 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, vma, addr, &page_was_allocated); if (page_was_allocated) - swap_readpage(retpage, do_poll); + swap_readpage(compound_head(retpage), do_poll); return retpage; } @@ -610,8 +643,9 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, if (!page) continue; if (page_allocated) { - swap_readpage(page, false); - if (offset != entry_offset) { + swap_readpage(compound_head(page), false); + if (offset != entry_offset && + !PageTransCompound(page)) { SetPageReadahead(page); count_vm_event(SWAP_RA); } @@ -772,8 +806,8 @@ static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask, if (!page) continue; if (page_allocated) { - swap_readpage(page, false); - if (i != ra_info.offset) { + swap_readpage(compound_head(page), false); + if (i != ra_info.offset && !PageTransCompound(page)) { SetPageReadahead(page); count_vm_event(SWAP_RA); } diff --git a/mm/swapfile.c b/mm/swapfile.c index ef2b42c199c0..3fe50f1da0a0 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1542,7 +1542,8 @@ int __swap_count(swp_entry_t entry) return count; } -static int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry) +static int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry, + int *entry_size) { int count = 0; pgoff_t offset = swp_offset(entry); @@ -1550,6 +1551,8 @@ static int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry) ci = lock_cluster_or_swap_info(si, offset); count = swap_count(si->swap_map[offset]); + if (entry_size) + *entry_size = ci && cluster_is_huge(ci) ? SWAPFILE_CLUSTER : 1; unlock_cluster_or_swap_info(si, ci); return count; } @@ -1559,14 +1562,14 @@ static int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry) * This does not give an exact answer when swap count is continued, * but does include the high COUNT_CONTINUED flag to allow for that. */ -int __swp_swapcount(swp_entry_t entry) +int __swp_swapcount(swp_entry_t entry, int *entry_size) { int count = 0; struct swap_info_struct *si; si = get_swap_device(entry); if (si) { - count = swap_swapcount(si, entry); + count = swap_swapcount(si, entry, entry_size); put_swap_device(si); } return count; From patchwork Tue Sep 25 07:13:36 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10613519 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9F663161F for ; Tue, 25 Sep 2018 07:14:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8D19329B12 for ; Tue, 25 Sep 2018 07:14:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8139729B17; Tue, 25 Sep 2018 07:14:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AD2EF29B12 for ; Tue, 25 Sep 2018 07:14:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 164418E0074; Tue, 25 Sep 2018 03:14:07 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 0C70C8E0072; Tue, 25 Sep 2018 03:14:07 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E83628E0074; Tue, 25 Sep 2018 03:14:06 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f199.google.com (mail-pf1-f199.google.com [209.85.210.199]) by kanga.kvack.org (Postfix) with ESMTP id 761688E0072 for ; Tue, 25 Sep 2018 03:14:06 -0400 (EDT) Received: by mail-pf1-f199.google.com with SMTP id j15-v6so11908906pfi.10 for ; Tue, 25 Sep 2018 00:14:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=5OppkEucigeeKYhrPv3KZppII7uDPz5/QSV1CyflSjc=; b=BIZnQI0Dfz7GzgXgobGWCVynL75uWTgo+Z9rziutYuk69Ck2UYENqDolKhaWQF9i39 pmcsIwwclnCIzZAZf6R1Mp3iTODcxp7qVL+3DWtBvIWXqTTwIm+YK0MbNqb/IQ7ylyH6 l49ocIRvIriMwPam3+FuYNkM4PnT9GzoVhYP8JJ296gGlZ9L8EPl25yJkBwWuwDfQTgG +q3WuCpAaIiV4BmEAs1JepL/cKKei7I+7iFcc0F7buxuvkLVwncZRWfCvkeo6YQF+DGQ Oxn4SzRWvFZWs3+KjF6aV34vJn+38ZaK6TvcOuuU5dhGGvn78D2sSdF8iq6xs1/1LlEb 3Fdg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfohiQLe1480UBiWGmiLYjTYm7oFveZk3TbCS5MS7nPsEyrUBP7Dl d7mz3V0xn5DOVwcKe/58n8owNX0eG8vP9pH0P2h940L8STxWoy/Wv9kDaeByDcZ4anOM1/tj1Uv 366XDOdsMqdmwT9GeY2o/z4OxL9erjX54DgumVq87XR7g/FZhf7ZfFp7ZPx8nb5opng== X-Received: by 2002:a17:902:e111:: with SMTP id cc17-v6mr2151538plb.175.1537859646118; Tue, 25 Sep 2018 00:14:06 -0700 (PDT) X-Google-Smtp-Source: ACcGV63kVa7Rm/QJ0N3OdEIcr0D4cTPYNDanH5xvzKQqrK3kphckj3bkVhY6gsMWnEVTNnzkLrOU X-Received: by 2002:a17:902:e111:: with SMTP id cc17-v6mr2151471plb.175.1537859645111; Tue, 25 Sep 2018 00:14:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537859645; cv=none; d=google.com; s=arc-20160816; b=RSwYsYSsU3x12peTpNN8J9ji1xsA8KtwypxtN/5bNwsVk6yrAk/wTw1T09XlMtT+Lc 8YYBNCJSTADbWlCXkPcnzi0XPRgdHuVOcjcpOzIyQV/sfePn4rpsblyJDMXQqjICrGGU WvBWzWs54RHFsvpq7tD/ZxhM4zXbyJmDoSyAYjbeASma4cD4AzoVLBcG3n7BeOkAGFf4 qfL9aZ+hGiZLjrSd/aJJ04a3hxlc7aGsAIbS/iidQwPT6QjZ0qaPH5MhmkojtHxwif4q wW0/289ddAyhvmEcrYwLSJmg56HHp/cOnUSQ48bOh0pEWZY0T/BnTZzyukScPXzrgY+b M1vg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=5OppkEucigeeKYhrPv3KZppII7uDPz5/QSV1CyflSjc=; b=DNW4mvxme4CCDOk4KTVdxvrxZ5uKGskruLzxitaTMFhDnR7bLcXquEy6KsTxGh+8h7 x7uqNWlxILlA77cnEoSBPYtq7bVXb0mtJCi0ihMkMb3/7Kkah6gvQfqsUryxiMWTmeRt OeHYTIaNm31f13+KnZFbix1yy/If0PQlrApok0byv4fWGNV5C4X7dnXNeSNrssmCvdjS vGH6dljZCr+Q5cnEzDoOc5UrG3ZiwGkFOUdSFydE96AeC86j++RLv4X5WEcRBaZJOCpL K97Z1+69pVMofYDyitXdJZrD8YUelvFRJTuQ9/j/ya4aJCxfVRhiwmRICAZ5QbyJQPfF zXqQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga02.intel.com (mga02.intel.com. [134.134.136.20]) by mx.google.com with ESMTPS id z127-v6si1787028pgb.455.2018.09.25.00.14.04 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 25 Sep 2018 00:14:05 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.20 as permitted sender) client-ip=134.134.136.20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 25 Sep 2018 00:14:04 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,301,1534834800"; d="scan'208";a="89093562" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 25 Sep 2018 00:13:53 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 RESEND 09/21] swap: Swapin a THP in one piece Date: Tue, 25 Sep 2018 15:13:36 +0800 Message-Id: <20180925071348.31458-10-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180925071348.31458-1-ying.huang@intel.com> References: <20180925071348.31458-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP With this patch, when page fault handler find a PMD swap mapping, it will swap in a THP in one piece. This avoids the overhead of splitting/collapsing before/after the THP swapping. And improves the swap performance greatly for reduced page fault count etc. do_huge_pmd_swap_page() is added in the patch to implement this. It is similar to do_swap_page() for normal page swapin. If failing to allocate a THP, the huge swap cluster and the PMD swap mapping will be split to fallback to normal page swapin. If the huge swap cluster has been split already, the PMD swap mapping will be split to fallback to normal page swapin. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/huge_mm.h | 9 +++ mm/huge_memory.c | 174 ++++++++++++++++++++++++++++++++++++++++++++++++ mm/memory.c | 16 +++-- 3 files changed, 193 insertions(+), 6 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 3fdb29bc250c..c2b8ced6fc2b 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -403,4 +403,13 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +#ifdef CONFIG_THP_SWAP +extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd); +#else /* CONFIG_THP_SWAP */ +static inline int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) +{ + return 0; +} +#endif /* CONFIG_THP_SWAP */ + #endif /* _LINUX_HUGE_MM_H */ diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 4d4a447c29a8..747879cd0e90 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -33,6 +33,8 @@ #include #include #include +#include +#include #include #include @@ -1612,6 +1614,178 @@ static void __split_huge_swap_pmd(struct vm_area_struct *vma, pmd_populate(mm, pmd, pgtable); } +#ifdef CONFIG_THP_SWAP +static int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, pmd_t orig_pmd) +{ + struct mm_struct *mm = vma->vm_mm; + spinlock_t *ptl; + int ret = 0; + + ptl = pmd_lock(mm, pmd); + if (pmd_same(*pmd, orig_pmd)) + __split_huge_swap_pmd(vma, address & HPAGE_PMD_MASK, pmd); + else + ret = -ENOENT; + spin_unlock(ptl); + + return ret; +} + +int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) +{ + struct page *page; + struct mem_cgroup *memcg; + struct vm_area_struct *vma = vmf->vma; + unsigned long haddr = vmf->address & HPAGE_PMD_MASK; + swp_entry_t entry; + pmd_t pmd; + int i, locked, exclusive = 0, ret = 0; + + entry = pmd_to_swp_entry(orig_pmd); + VM_BUG_ON(non_swap_entry(entry)); + delayacct_set_flag(DELAYACCT_PF_SWAPIN); +retry: + page = lookup_swap_cache(entry, NULL, vmf->address); + if (!page) { + page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE, vma, + haddr, false); + if (!page) { + /* + * Back out if somebody else faulted in this pmd + * while we released the pmd lock. + */ + if (likely(pmd_same(*vmf->pmd, orig_pmd))) { + /* + * Failed to allocate huge page, split huge swap + * cluster, and fallback to swapin normal page + */ + ret = split_swap_cluster(entry, 0); + /* Somebody else swapin the swap entry, retry */ + if (ret == -EEXIST) { + ret = 0; + goto retry; + /* swapoff occurs under us */ + } else if (ret == -EINVAL) + ret = 0; + else + goto fallback; + } + delayacct_clear_flag(DELAYACCT_PF_SWAPIN); + goto out; + } + + /* Had to read the page from swap area: Major fault */ + ret = VM_FAULT_MAJOR; + count_vm_event(PGMAJFAULT); + count_memcg_event_mm(vma->vm_mm, PGMAJFAULT); + } else if (!PageTransCompound(page)) + goto fallback; + + locked = lock_page_or_retry(page, vma->vm_mm, vmf->flags); + + delayacct_clear_flag(DELAYACCT_PF_SWAPIN); + if (!locked) { + ret |= VM_FAULT_RETRY; + goto out_release; + } + + /* + * Make sure try_to_free_swap or reuse_swap_page or swapoff did not + * release the swapcache from under us. The page pin, and pmd_same + * test below, are not enough to exclude that. Even if it is still + * swapcache, we need to check that the page's swap has not changed. + */ + if (unlikely(!PageSwapCache(page) || page_private(page) != entry.val)) + goto out_page; + + if (mem_cgroup_try_charge_delay(page, vma->vm_mm, GFP_KERNEL, + &memcg, true)) { + ret = VM_FAULT_OOM; + goto out_page; + } + + /* + * Back out if somebody else already faulted in this pmd. + */ + vmf->ptl = pmd_lockptr(vma->vm_mm, vmf->pmd); + spin_lock(vmf->ptl); + if (unlikely(!pmd_same(*vmf->pmd, orig_pmd))) + goto out_nomap; + + if (unlikely(!PageUptodate(page))) { + ret = VM_FAULT_SIGBUS; + goto out_nomap; + } + + /* + * The page isn't present yet, go ahead with the fault. + * + * Be careful about the sequence of operations here. + * To get its accounting right, reuse_swap_page() must be called + * while the page is counted on swap but not yet in mapcount i.e. + * before page_add_anon_rmap() and swap_free(); try_to_free_swap() + * must be called after the swap_free(), or it will never succeed. + */ + + add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR); + add_mm_counter(vma->vm_mm, MM_SWAPENTS, -HPAGE_PMD_NR); + pmd = mk_huge_pmd(page, vma->vm_page_prot); + if ((vmf->flags & FAULT_FLAG_WRITE) && reuse_swap_page(page, NULL)) { + pmd = maybe_pmd_mkwrite(pmd_mkdirty(pmd), vma); + vmf->flags &= ~FAULT_FLAG_WRITE; + ret |= VM_FAULT_WRITE; + exclusive = RMAP_EXCLUSIVE; + } + for (i = 0; i < HPAGE_PMD_NR; i++) + flush_icache_page(vma, page + i); + if (pmd_swp_soft_dirty(orig_pmd)) + pmd = pmd_mksoft_dirty(pmd); + do_page_add_anon_rmap(page, vma, haddr, + exclusive | RMAP_COMPOUND); + mem_cgroup_commit_charge(page, memcg, true, true); + activate_page(page); + set_pmd_at(vma->vm_mm, haddr, vmf->pmd, pmd); + + swap_free(entry, HPAGE_PMD_NR); + if (mem_cgroup_swap_full(page) || + (vma->vm_flags & VM_LOCKED) || PageMlocked(page)) + try_to_free_swap(page); + unlock_page(page); + + if (vmf->flags & FAULT_FLAG_WRITE) { + spin_unlock(vmf->ptl); + ret |= do_huge_pmd_wp_page(vmf, pmd); + if (ret & VM_FAULT_ERROR) + ret &= VM_FAULT_ERROR; + goto out; + } + + /* No need to invalidate - it was non-present before */ + update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); + spin_unlock(vmf->ptl); +out: + return ret; +out_nomap: + mem_cgroup_cancel_charge(page, memcg, true); + spin_unlock(vmf->ptl); +out_page: + unlock_page(page); +out_release: + put_page(page); + return ret; +fallback: + delayacct_clear_flag(DELAYACCT_PF_SWAPIN); + if (!split_huge_swap_pmd(vmf->vma, vmf->pmd, vmf->address, orig_pmd)) + ret = VM_FAULT_FALLBACK; + else + ret = 0; + if (page) + put_page(page); + return ret; +} +#endif + /* * Return true if we do MADV_FREE successfully on entire pmd page. * Otherwise, return false. diff --git a/mm/memory.c b/mm/memory.c index f48ef19070b1..bccfaae7463d 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3834,13 +3834,17 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, barrier(); if (unlikely(is_swap_pmd(orig_pmd))) { - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(orig_pmd)); - if (is_pmd_migration_entry(orig_pmd)) + if (thp_migration_supported() && + is_pmd_migration_entry(orig_pmd)) { pmd_migration_entry_wait(mm, vmf.pmd); - return 0; - } - if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) { + return 0; + } else if (IS_ENABLED(CONFIG_THP_SWAP)) { + ret = do_huge_pmd_swap_page(&vmf, orig_pmd); + if (!(ret & VM_FAULT_FALLBACK)) + return ret; + } else + VM_BUG_ON(1); + } else if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) { if (pmd_protnone(orig_pmd) && vma_is_accessible(vma)) return do_huge_pmd_numa_page(&vmf, orig_pmd); From patchwork Tue Sep 25 07:13:37 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10613513 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5FDBD13A4 for ; Tue, 25 Sep 2018 07:14:12 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4E9AC29B12 for ; Tue, 25 Sep 2018 07:14:12 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 42A5E29B17; Tue, 25 Sep 2018 07:14:12 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A4A0429B12 for ; Tue, 25 Sep 2018 07:14:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6BAEA8E0071; Tue, 25 Sep 2018 03:14:06 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 565DE8E0041; Tue, 25 Sep 2018 03:14:06 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 30A898E0041; Tue, 25 Sep 2018 03:14:06 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f200.google.com (mail-pg1-f200.google.com [209.85.215.200]) by kanga.kvack.org (Postfix) with ESMTP id CCF758E0071 for ; Tue, 25 Sep 2018 03:14:05 -0400 (EDT) Received: by mail-pg1-f200.google.com with SMTP id a18-v6so4950150pgn.10 for ; Tue, 25 Sep 2018 00:14:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=bQwc5LHjLrIPFspSOc5bf5zi7g6qnAKVnaHAPyQPGZw=; b=LyQBfgVRNW19HUMfybHgoK+LfqCVn29RuGhshpY5XhFPNCDkmlBxr8tCK7Xp9PPZuE UPZYM9B/P9c+WtKR+ysbHeN/+ybA5oCwS7HvOiJG2tyDaCZzwLzbHZBh3+DpHDU4i3MW YV7s5ap8zBQdrTZ3cURxtGs7oZtlnmYehr3kbskWePIDODvEUJAfDJoNZhMZDSL6Iqhp w2e/Drkew/iL4WvuqgGBiL8KN4u7nPAocc7VNuCfaTCTjK8FlcgUihxy4SqCIc7jhrpF Bdgtr5n6QET1U3em7Ks53GiPS2IqQ9Mvof7to4IfKQFsChw+vW0F8hyMcesNPXQQ9aJG 0MCA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfohQxwNP2w1vGHPedjaY9Nuc6KM55uCwRw3O3N1F36fpOLBt7y0b yRahZnFVg7L5rTSN4DI2rib2oP5Rmt5s9Dz8i9yaQVNcHGdMOjD69ts4ELYD+mmQzc1fOcETmB+ t0zYZwqR24+SAS+QQ7MfguCuJwX+pwPDWitlmxpI6Il9dVsitz48jRJMjHfKKxmOR4Q== X-Received: by 2002:a63:9246:: with SMTP id s6-v6mr1999315pgn.141.1537859645504; Tue, 25 Sep 2018 00:14:05 -0700 (PDT) X-Google-Smtp-Source: ACcGV6369inE2orEVgPAs+wxrqgCt3rhf7XkaO4cbpG2FYEQ6MsgzNoXU+fvw0C7r2WCDsrbFmRw X-Received: by 2002:a63:9246:: with SMTP id s6-v6mr1999269pgn.141.1537859644728; Tue, 25 Sep 2018 00:14:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537859644; cv=none; d=google.com; s=arc-20160816; b=XlrVX8hAralP6eDf5u1lHVSkRVo2GOq/mRgmNLpZ8AMIwWbzvgc9EDG03yfESfpaVR SlqaxDUYwHtUSmKCFgB+mY3LtUNYuxNxQ80VK9gdznwXVhyai7rx+dDbANDFSkDXAxrp MmPmV7LvYdNWFLa1Be7ZwfM47Jr3perALVC+40uTXjn8q39iONYqCw8jFW9UH+xclIh2 rh/PErzym7pBZwLJddxgr1ZMjsu/urCuit+YFt38t3wOOS53HaMf3Hzk8JJG9F9EUgCS Q5lEEnmLoSgDj1eMM9IzoBQ5GDqrI+hSb3N+tXjJRznMA9CEQTcFB9XLrDGsY7IQozMt /QGw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=bQwc5LHjLrIPFspSOc5bf5zi7g6qnAKVnaHAPyQPGZw=; b=gP5kv4xSp8KsILgNL1MJTsXfM9FP96JRllU0sal+BGDhgVcLY3PaThQIIttKW8b8od OB6dNu/HOqP+H6AQYDwSBDb7YVE8E5axIIe24MiYTyuGLrsi4rGrdU18VG6GmNGDZ+aI gg91BnTYquFM+GD5Howl9CtTEcEfsMOtbhmGZuvGkZS02lZX/Oz+bNsye5PTWczjq5Yl w7v/ox5zWDKrtQbDkSQOb+wzUC41jSJ9HttuBlE6r7Tw0tgnDkzf7Dy21jBXe4ji7Swg J+8vWUeA2uH2I0wBWF9TWPHSYrrvoQzaBKPYI3E3a+7nRaiz6TeEDtbz7VIqRHBAqq5c t8kw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga02.intel.com (mga02.intel.com. [134.134.136.20]) by mx.google.com with ESMTPS id z127-v6si1787028pgb.455.2018.09.25.00.14.04 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 25 Sep 2018 00:14:04 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.20 as permitted sender) client-ip=134.134.136.20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 25 Sep 2018 00:14:04 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,301,1534834800"; d="scan'208";a="89093568" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 25 Sep 2018 00:13:56 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 RESEND 10/21] swap: Support to count THP swapin and its fallback Date: Tue, 25 Sep 2018 15:13:37 +0800 Message-Id: <20180925071348.31458-11-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180925071348.31458-1-ying.huang@intel.com> References: <20180925071348.31458-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP 2 new /proc/vmstat fields are added, "thp_swapin" and "thp_swapin_fallback" to count swapin a THP from swap device in one piece and fallback to normal page swapin. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- Documentation/admin-guide/mm/transhuge.rst | 8 ++++++++ include/linux/vm_event_item.h | 2 ++ mm/huge_memory.c | 4 +++- mm/page_io.c | 15 ++++++++++++--- mm/vmstat.c | 2 ++ 5 files changed, 27 insertions(+), 4 deletions(-) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index 7ab93a8404b9..85e33f785fd7 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -364,6 +364,14 @@ thp_swpout_fallback Usually because failed to allocate some continuous swap space for the huge page. +thp_swpin + is incremented every time a huge page is swapin in one piece + without splitting. + +thp_swpin_fallback + is incremented if a huge page has to be split during swapin. + Usually because failed to allocate a huge page. + As the system ages, allocating huge pages may be expensive as the system uses memory compaction to copy data around memory to free a huge page for use. There are some counters in ``/proc/vmstat`` to help diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index 47a3441cf4c4..c20b655cfdcc 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -88,6 +88,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, THP_ZERO_PAGE_ALLOC_FAILED, THP_SWPOUT, THP_SWPOUT_FALLBACK, + THP_SWPIN, + THP_SWPIN_FALLBACK, #endif #ifdef CONFIG_MEMORY_BALLOON BALLOON_INFLATE, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 747879cd0e90..6f8676d6cba0 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1668,8 +1668,10 @@ int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) /* swapoff occurs under us */ } else if (ret == -EINVAL) ret = 0; - else + else { + count_vm_event(THP_SWPIN_FALLBACK); goto fallback; + } } delayacct_clear_flag(DELAYACCT_PF_SWAPIN); goto out; diff --git a/mm/page_io.c b/mm/page_io.c index aafd19ec1db4..362254b99955 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -348,6 +348,15 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc, return ret; } +static inline void count_swpin_vm_event(struct page *page) +{ +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + if (unlikely(PageTransHuge(page))) + count_vm_event(THP_SWPIN); +#endif + count_vm_events(PSWPIN, hpage_nr_pages(page)); +} + int swap_readpage(struct page *page, bool synchronous) { struct bio *bio; @@ -371,7 +380,7 @@ int swap_readpage(struct page *page, bool synchronous) ret = mapping->a_ops->readpage(swap_file, page); if (!ret) - count_vm_event(PSWPIN); + count_swpin_vm_event(page); return ret; } @@ -382,7 +391,7 @@ int swap_readpage(struct page *page, bool synchronous) unlock_page(page); } - count_vm_event(PSWPIN); + count_swpin_vm_event(page); return 0; } @@ -401,7 +410,7 @@ int swap_readpage(struct page *page, bool synchronous) get_task_struct(current); bio->bi_private = current; bio_set_op_attrs(bio, REQ_OP_READ, 0); - count_vm_event(PSWPIN); + count_swpin_vm_event(page); bio_get(bio); qc = submit_bio(bio); while (synchronous) { diff --git a/mm/vmstat.c b/mm/vmstat.c index e3fb28d2c923..b954131d9e28 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1264,6 +1264,8 @@ const char * const vmstat_text[] = { "thp_zero_page_alloc_failed", "thp_swpout", "thp_swpout_fallback", + "thp_swpin", + "thp_swpin_fallback", #endif #ifdef CONFIG_MEMORY_BALLOON "balloon_inflate", From patchwork Tue Sep 25 07:13:38 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10613517 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5076A161F for ; Tue, 25 Sep 2018 07:14:17 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3F7B529B12 for ; Tue, 25 Sep 2018 07:14:16 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3323829B17; Tue, 25 Sep 2018 07:14:16 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5F0E929B12 for ; Tue, 25 Sep 2018 07:14:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9ED988E0041; Tue, 25 Sep 2018 03:14:06 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 8B3868E0076; Tue, 25 Sep 2018 03:14:06 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 61A158E0075; Tue, 25 Sep 2018 03:14:06 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f200.google.com (mail-pg1-f200.google.com [209.85.215.200]) by kanga.kvack.org (Postfix) with ESMTP id 105208E0072 for ; Tue, 25 Sep 2018 03:14:06 -0400 (EDT) Received: by mail-pg1-f200.google.com with SMTP id i190-v6so3407324pgd.11 for ; Tue, 25 Sep 2018 00:14:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=YaUODmydji56GCohMXvWAPBRttXHF1GH1WWYCC+xbd4=; b=C910VLxs34ra0+YH8ZbEYFr0B2r+M1Q2w3L3wDOnUk5Igunh+1lqMH/rTLbqxHn4Fa QWaRN7zQ01PVgcs7NMFIw3gPpVIBxqgXrLzstDx5sEOAemJKWeT8OBtnHaJyUwOiPHW2 iTKCB5LyAGW/SAx56OmlAk2Tsz2ll3qyYb20NyjYu2PUDOeUfL7/2WRBdPzsjp9tLm4j 5NTOlVQjmai258ZDg//gn2pkTr7SBp6KyMNlWF/qSZ/RBbEuscn2NknKWYDlDlCfyRMc YWfhFFBv2uSqvhaze6OgMQoVQH1ermhbw7jnIT52K1P5wQVLI8H2elgmyyepinO33Dz3 uPEg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfohLD8RTTlW/8BZVQX2LI9jyWJdDNwcOXJikLculnM7BF1Rd2JJn S69FsM5CfnOD6Ll4CWwg9p3oJ/OzGrcb7nt3V19qf4J1yMlb90MsyM4/oCG/GEMHhwaQf1/WpgJ R/XkJXxG2BVAqV96aXeVabrcZmyEZdFRxl/q2uEhJOKP/U3ZJfnPpfZXIHjAqKBWXnQ== X-Received: by 2002:a62:3545:: with SMTP id c66-v6mr2148681pfa.63.1537859645714; Tue, 25 Sep 2018 00:14:05 -0700 (PDT) X-Google-Smtp-Source: ACcGV60e3FWXZAND9Ss6J/sTvDZ/rVODheqTBOwZbgIQuZ/8SNd4jFjsnPflwlNeM3xI+yfkYr9f X-Received: by 2002:a62:3545:: with SMTP id c66-v6mr2148622pfa.63.1537859644923; Tue, 25 Sep 2018 00:14:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537859644; cv=none; d=google.com; s=arc-20160816; b=Owbgo9pYBESvhJDCtLEPGCj2lIt6HTTA+vAFOsfz2JCSsqH88TiDJXV29ySZIgBRPf zwNhzVPnpjA3DlxYCUfYdrLwwp9aNKbWp0PCPTs0qfkaGHRLOqHR765bh+wjas6XDXOp fjLuEjrVtN6MbZ9kUWTJg3422NxYz2PQ3AaRzAYh1IpeqQdnwaJjhhudv/QRckkrXeWy cq0kfGZrom754/CimlX6K98HW09ExGDartD/wvpV166N7Ob2DYI0dNVz876BAcO/+m00 rJzFMbGmRPL+8J87jEtXhocquTfgdMAX9YjAt0EUBdujKY1EvjqZMbqGpyt/zGu6eak1 4FTQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=YaUODmydji56GCohMXvWAPBRttXHF1GH1WWYCC+xbd4=; b=lESZLioeY4mdcmj9NqWf+EmYCktH1KARsw+JPDkBfOtKS+ycvCHN9OkRKJCgHV8qIM JGYgUUXosj9e5u9mfoNv/E53tjkLIl+S/a/8jW6Wa2a9z9nGCBv0lth6RLMeMFUWJtDK tN1xBTm0UjKeKKtpIS+FGFnt7cmQLsDFm80wf1dEgZk5b06wdwvrsfBGLtHUXqGJWnv3 Wgaw6DGKZADWw7D2I2gLmLldNDYEGi3OuLrUu9IxnG4nQua66YofIjTpRqMWbvr+PUiN 102lP+fwqx9Rx0oA/q9D4zNWuKzNrrG7IWiWbXCtfkrD51F3JiOsIiIj/RXXcLI2Td8v 4M5w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga02.intel.com (mga02.intel.com. [134.134.136.20]) by mx.google.com with ESMTPS id z127-v6si1787028pgb.455.2018.09.25.00.14.04 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 25 Sep 2018 00:14:04 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.20 as permitted sender) client-ip=134.134.136.20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 25 Sep 2018 00:14:04 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,301,1534834800"; d="scan'208";a="89093594" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 25 Sep 2018 00:14:00 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 RESEND 11/21] swap: Add sysfs interface to configure THP swapin Date: Tue, 25 Sep 2018 15:13:38 +0800 Message-Id: <20180925071348.31458-12-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180925071348.31458-1-ying.huang@intel.com> References: <20180925071348.31458-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Swapin a THP as a whole isn't desirable in some situations. For example, for completely random access pattern, swapin a THP in one piece will inflate the reading greatly. So a sysfs interface: /sys/kernel/mm/transparent_hugepage/swapin_enabled is added to configure it. Three options as follow are provided, - always: THP swapin will be enabled always - madvise: THP swapin will be enabled only for VMA with VM_HUGEPAGE flag set. - never: THP swapin will be disabled always The default configuration is: madvise. During page fault, if a PMD swap mapping is found and THP swapin is disabled, the huge swap cluster and the PMD swap mapping will be split and fallback to normal page swapin. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- Documentation/admin-guide/mm/transhuge.rst | 21 +++++++ include/linux/huge_mm.h | 31 ++++++++++ mm/huge_memory.c | 94 ++++++++++++++++++++++++------ 3 files changed, 127 insertions(+), 19 deletions(-) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index 85e33f785fd7..23aefb17101c 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -160,6 +160,27 @@ Some userspace (such as a test program, or an optimized memory allocation cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size +Transparent hugepage may be swapout and swapin in one piece without +splitting. This will improve the utility of transparent hugepage but +may inflate the read/write too. So whether to enable swapin +transparent hugepage in one piece can be configured as follow. + + echo always >/sys/kernel/mm/transparent_hugepage/swapin_enabled + echo madvise >/sys/kernel/mm/transparent_hugepage/swapin_enabled + echo never >/sys/kernel/mm/transparent_hugepage/swapin_enabled + +always + Attempt to allocate a transparent huge page and read it from + swap space in one piece every time. + +never + Always split the swap space and PMD swap mapping and swapin + the fault normal page during swapin. + +madvise + Only swapin the transparent huge page in one piece for + MADV_HUGEPAGE madvise regions. + khugepaged will be automatically started when transparent_hugepage/enabled is set to "always" or "madvise, and it'll be automatically shutdown if it's set to "never". diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index c2b8ced6fc2b..9dedff974def 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -63,6 +63,8 @@ enum transparent_hugepage_flag { #ifdef CONFIG_DEBUG_VM TRANSPARENT_HUGEPAGE_DEBUG_COW_FLAG, #endif + TRANSPARENT_HUGEPAGE_SWAPIN_FLAG, + TRANSPARENT_HUGEPAGE_SWAPIN_REQ_MADV_FLAG, }; struct kobject; @@ -405,11 +407,40 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) #ifdef CONFIG_THP_SWAP extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd); + +static inline bool transparent_hugepage_swapin_enabled( + struct vm_area_struct *vma) +{ + if (vma->vm_flags & VM_NOHUGEPAGE) + return false; + + if (is_vma_temporary_stack(vma)) + return false; + + if (test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) + return false; + + if (transparent_hugepage_flags & + (1 << TRANSPARENT_HUGEPAGE_SWAPIN_FLAG)) + return true; + + if (transparent_hugepage_flags & + (1 << TRANSPARENT_HUGEPAGE_SWAPIN_REQ_MADV_FLAG)) + return !!(vma->vm_flags & VM_HUGEPAGE); + + return false; +} #else /* CONFIG_THP_SWAP */ static inline int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) { return 0; } + +static inline bool transparent_hugepage_swapin_enabled( + struct vm_area_struct *vma) +{ + return false; +} #endif /* CONFIG_THP_SWAP */ #endif /* _LINUX_HUGE_MM_H */ diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 6f8676d6cba0..4d41ce83e3b9 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -57,7 +57,8 @@ unsigned long transparent_hugepage_flags __read_mostly = #endif (1<address); if (!page) { + if (!transparent_hugepage_swapin_enabled(vma)) + goto split; + page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE, vma, haddr, false); if (!page) { @@ -1655,24 +1709,8 @@ int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) * Back out if somebody else faulted in this pmd * while we released the pmd lock. */ - if (likely(pmd_same(*vmf->pmd, orig_pmd))) { - /* - * Failed to allocate huge page, split huge swap - * cluster, and fallback to swapin normal page - */ - ret = split_swap_cluster(entry, 0); - /* Somebody else swapin the swap entry, retry */ - if (ret == -EEXIST) { - ret = 0; - goto retry; - /* swapoff occurs under us */ - } else if (ret == -EINVAL) - ret = 0; - else { - count_vm_event(THP_SWPIN_FALLBACK); - goto fallback; - } - } + if (likely(pmd_same(*vmf->pmd, orig_pmd))) + goto split; delayacct_clear_flag(DELAYACCT_PF_SWAPIN); goto out; } @@ -1785,6 +1823,24 @@ int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) if (page) put_page(page); return ret; +split: + /* + * Failed to allocate huge page, split huge swap cluster, and + * fallback to swapin normal page + */ + ret = split_swap_cluster(entry, 0); + /* Somebody else swapin the swap entry, retry */ + if (ret == -EEXIST) { + ret = 0; + goto retry; + } + /* swapoff occurs under us */ + if (ret == -EINVAL) { + delayacct_clear_flag(DELAYACCT_PF_SWAPIN); + return 0; + } + count_vm_event(THP_SWPIN_FALLBACK); + goto fallback; } #endif From patchwork Tue Sep 25 07:13:39 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10613521 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 879A2161F for ; Tue, 25 Sep 2018 07:14:24 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 75B6529B12 for ; Tue, 25 Sep 2018 07:14:24 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6924F29B17; Tue, 25 Sep 2018 07:14:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9A4A229B12 for ; Tue, 25 Sep 2018 07:14:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 783C48E0075; Tue, 25 Sep 2018 03:14:09 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 669058E0072; Tue, 25 Sep 2018 03:14:09 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4E68C8E0075; Tue, 25 Sep 2018 03:14:09 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f197.google.com (mail-pg1-f197.google.com [209.85.215.197]) by kanga.kvack.org (Postfix) with ESMTP id 028A18E0072 for ; Tue, 25 Sep 2018 03:14:09 -0400 (EDT) Received: by mail-pg1-f197.google.com with SMTP id 77-v6so5160215pgg.0 for ; Tue, 25 Sep 2018 00:14:08 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=Rr8n641Zto9FgSzmQroB/V1VyVbLrvjM6jUYEqEAkhA=; b=QpvfxJ8VnaCG8+Du15A3dmftSSEOp/6KiJXrIVmAb2rqC365iwbSDoMNHxVhO4XpEH 7AooyEq6NOi+nj4JEgkx55DsSvgnK0HC0eHmgFLvq1A9Hr6R3PzTFtTlYxkQNHOry7dQ wUnQzirUU1HiqkcNPJl0DbR5VzxCsrWI+46SsLlNDIfxTucGtKue2Sht+0Q/O+v3AHXV KifuEmr6/EtD64SZHMoF2CrXxnuYGTyb3IcfnyWQyIaPnEi72CLUFrummCoqWt0WO+yH uq1OMUUchm23xy/hqoNv2rAPhf5I2dEFahr21V1co/olZ8yYMh/RxkUfh6LQ63ZAZ28C aVuQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfohp5MX8K8+ki6ohqFqUtj5b8hOi0xUeV5s35UQQ/4Vp4Ecg5VhJ v3Fa6N/4LB4d/CUlG7GyVdkkTAcauPHzCa3K0d0djaqRQWP/K4lDUumSdARpuGgOcYpgoGB/Jtc 35OhiNT/0HrcCoapBKQvMJwe2EYO9CFn6fjVrl9JQ7dZR821KQxGn3WxfyipTGvtKGA== X-Received: by 2002:a63:4e11:: with SMTP id c17-v6mr2054585pgb.6.1537859648658; Tue, 25 Sep 2018 00:14:08 -0700 (PDT) X-Google-Smtp-Source: ACcGV62ZfYLVYS1Ge4PB1NnZTTRQjEXNOhiJ0EWxtWqJi/ZEgt0Gwc+6sdCwC+P1ByBz6p7Im31m X-Received: by 2002:a63:4e11:: with SMTP id c17-v6mr2054535pgb.6.1537859647774; Tue, 25 Sep 2018 00:14:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537859647; cv=none; d=google.com; s=arc-20160816; b=slsAGOjqsbUVgV270evV3NFhor8etG69twqXZAr5Vz1D4xpBF3zj7d6ehhyK2p9pDj 1k/X+xQmmbHMJma79BgybuSKGxQCdQHQ8SqbkamP42RQGcoHNOZM0suGAyvzhqktvXqd P1Nj9C0AEtoFqnAjzvbg/KqwA2LPAThhbE8dOnip7sykkj8ZcnMR1FOEyIfy1pLwSg3P HokshFpex00hQU9Asd4Q7Bs8bIGXyNnfm24BgE+zPk82d6ezNT74nFnTv+bfUOSe5hnc FDvmu2vhXsv+PkFSC+6JMK2OU4b6nICOJUlczMwvxIRCb4ZsOp5pYqlzawfGmrmI+gzZ Eknw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=Rr8n641Zto9FgSzmQroB/V1VyVbLrvjM6jUYEqEAkhA=; b=K0AdAMu6qhJPZEdLCOTvbDUWfWxMBeo+ST0XIDYhv0IujzQq99pfijxmbSSZnQOtsr YmAbQMRRcvFTZI8h4StbA/2OEyiJ49wJ0adY7fjcSjk37OTITb6/ge3XgbBuLQglu/Ko CdaDcMGsQmADecw/xXRWkrsmL8ngAcZFzOgr0r8ceR4A08g+EpIPl9SwJhA4nea3wDhh +5IG7NjSkPsA/+VZik+COlvzx0l7Kwz2prbAU3DG3noWtCOXp0HP7oqE6PZiFfGjnyj8 R1IyrSg+jcINbqps/TlQspuwnj+y40qMIdP/sda9DXknts9ro9N9Hx7jM/WYbSEVya5U n8Ag== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga14.intel.com (mga14.intel.com. [192.55.52.115]) by mx.google.com with ESMTPS id a1-v6si1378544pgw.9.2018.09.25.00.14.07 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 25 Sep 2018 00:14:07 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) client-ip=192.55.52.115; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 25 Sep 2018 00:14:07 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,301,1534834800"; d="scan'208";a="89093614" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 25 Sep 2018 00:14:03 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 RESEND 12/21] swap: Support PMD swap mapping in swapoff Date: Tue, 25 Sep 2018 15:13:39 +0800 Message-Id: <20180925071348.31458-13-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180925071348.31458-1-ying.huang@intel.com> References: <20180925071348.31458-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP During swapoff, for a huge swap cluster, we need to allocate a THP, read its contents into the THP and unuse the PMD and PTE swap mappings to it. If failed to allocate a THP, the huge swap cluster will be split. During unuse, if it is found that the swap cluster mapped by a PMD swap mapping is split already, we will split the PMD swap mapping and unuse the PTEs. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/asm-generic/pgtable.h | 14 +------ include/linux/huge_mm.h | 8 ++++ mm/huge_memory.c | 4 +- mm/swapfile.c | 86 ++++++++++++++++++++++++++++++++++++++++++- 4 files changed, 97 insertions(+), 15 deletions(-) diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index eb1e9d17371b..d64cef2bff04 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -931,22 +931,12 @@ static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t *pmd) barrier(); #endif /* - * !pmd_present() checks for pmd migration entries - * - * The complete check uses is_pmd_migration_entry() in linux/swapops.h - * But using that requires moving current function and pmd_trans_unstable() - * to linux/swapops.h to resovle dependency, which is too much code move. - * - * !pmd_present() is equivalent to is_pmd_migration_entry() currently, - * because !pmd_present() pages can only be under migration not swapped - * out. - * - * pmd_none() is preseved for future condition checks on pmd migration + * pmd_none() is preseved for future condition checks on pmd swap * entries and not confusing with this function name, although it is * redundant with !pmd_present(). */ if (pmd_none(pmdval) || pmd_trans_huge(pmdval) || - (IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION) && !pmd_present(pmdval))) + (IS_ENABLED(CONFIG_HAVE_PMD_SWAP_ENTRY) && !pmd_present(pmdval))) return 1; if (unlikely(pmd_bad(pmdval))) { pmd_clear_bad(pmd); diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 9dedff974def..25ba9b5f1e60 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -406,6 +406,8 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #ifdef CONFIG_THP_SWAP +extern int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, pmd_t orig_pmd); extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd); static inline bool transparent_hugepage_swapin_enabled( @@ -431,6 +433,12 @@ static inline bool transparent_hugepage_swapin_enabled( return false; } #else /* CONFIG_THP_SWAP */ +static inline int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, pmd_t orig_pmd) +{ + return 0; +} + static inline int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) { return 0; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 4d41ce83e3b9..18da840bd049 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1666,8 +1666,8 @@ static void __split_huge_swap_pmd(struct vm_area_struct *vma, } #ifdef CONFIG_THP_SWAP -static int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, - unsigned long address, pmd_t orig_pmd) +int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, pmd_t orig_pmd) { struct mm_struct *mm = vma->vm_mm; spinlock_t *ptl; diff --git a/mm/swapfile.c b/mm/swapfile.c index 3fe50f1da0a0..64067ee6a09c 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1931,6 +1931,11 @@ static inline int pte_same_as_swp(pte_t pte, pte_t swp_pte) return pte_same(pte_swp_clear_soft_dirty(pte), swp_pte); } +static inline int pmd_same_as_swp(pmd_t pmd, pmd_t swp_pmd) +{ + return pmd_same(pmd_swp_clear_soft_dirty(pmd), swp_pmd); +} + /* * No need to decide whether this PTE shares the swap entry with others, * just let do_wp_page work it out if a write is requested later - to @@ -1992,6 +1997,53 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd, return ret; } +#ifdef CONFIG_THP_SWAP +static int unuse_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long addr, swp_entry_t entry, struct page *page) +{ + struct mem_cgroup *memcg; + spinlock_t *ptl; + int ret = 1; + + if (mem_cgroup_try_charge(page, vma->vm_mm, GFP_KERNEL, + &memcg, true)) { + ret = -ENOMEM; + goto out_nolock; + } + + ptl = pmd_lock(vma->vm_mm, pmd); + if (unlikely(!pmd_same_as_swp(*pmd, swp_entry_to_pmd(entry)))) { + mem_cgroup_cancel_charge(page, memcg, true); + ret = 0; + goto out; + } + + add_mm_counter(vma->vm_mm, MM_SWAPENTS, -HPAGE_PMD_NR); + add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR); + get_page(page); + set_pmd_at(vma->vm_mm, addr, pmd, + pmd_mkold(mk_huge_pmd(page, vma->vm_page_prot))); + page_add_anon_rmap(page, vma, addr, true); + mem_cgroup_commit_charge(page, memcg, true, true); + swap_free(entry, HPAGE_PMD_NR); + /* + * Move the page to the active list so it is not + * immediately swapped out again after swapon. + */ + activate_page(page); +out: + spin_unlock(ptl); +out_nolock: + return ret; +} +#else +static int unuse_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long addr, swp_entry_t entry, struct page *page) +{ + return 0; +} +#endif + static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, unsigned long end, swp_entry_t entry, struct page *page) @@ -2032,7 +2084,7 @@ static inline int unuse_pmd_range(struct vm_area_struct *vma, pud_t *pud, unsigned long addr, unsigned long end, swp_entry_t entry, struct page *page) { - pmd_t *pmd; + pmd_t swp_pmd = swp_entry_to_pmd(entry), *pmd, orig_pmd; unsigned long next; int ret; @@ -2040,6 +2092,27 @@ static inline int unuse_pmd_range(struct vm_area_struct *vma, pud_t *pud, do { cond_resched(); next = pmd_addr_end(addr, end); + orig_pmd = *pmd; + if (IS_ENABLED(CONFIG_THP_SWAP) && is_swap_pmd(orig_pmd)) { + if (likely(!pmd_same_as_swp(orig_pmd, swp_pmd))) + continue; + /* + * Huge cluster has been split already, split + * PMD swap mapping and fallback to unuse PTE + */ + if (!PageTransCompound(page)) { + ret = split_huge_swap_pmd(vma, pmd, + addr, orig_pmd); + if (ret) + return ret; + ret = unuse_pte_range(vma, pmd, addr, + next, entry, page); + } else + ret = unuse_pmd(vma, pmd, addr, entry, page); + if (ret) + return ret; + continue; + } if (pmd_none_or_trans_huge_or_clear_bad(pmd)) continue; ret = unuse_pte_range(vma, pmd, addr, next, entry, page); @@ -2233,6 +2306,7 @@ int try_to_unuse(unsigned int type, bool frontswap, * there are races when an instance of an entry might be missed. */ while ((i = find_next_to_unuse(si, i, frontswap)) != 0) { +retry: if (signal_pending(current)) { retval = -EINTR; break; @@ -2248,6 +2322,8 @@ int try_to_unuse(unsigned int type, bool frontswap, page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE, NULL, 0, false); if (!page) { + struct swap_cluster_info *ci = NULL; + /* * Either swap_duplicate() failed because entry * has been freed independently, and will not be @@ -2264,6 +2340,14 @@ int try_to_unuse(unsigned int type, bool frontswap, */ if (!swcount || swcount == SWAP_MAP_BAD) continue; + if (si->cluster_info) + ci = si->cluster_info + i / SWAPFILE_CLUSTER; + /* Split huge cluster if failed to allocate huge page */ + if (cluster_is_huge(ci)) { + retval = split_swap_cluster(entry, 0); + if (!retval || retval == -EEXIST) + goto retry; + } retval = -ENOMEM; break; } From patchwork Tue Sep 25 07:13:40 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10613523 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 836CC13A4 for ; Tue, 25 Sep 2018 07:14:27 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 709CE29B12 for ; Tue, 25 Sep 2018 07:14:27 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6411629B17; Tue, 25 Sep 2018 07:14:27 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C8A5329B12 for ; Tue, 25 Sep 2018 07:14:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DE8D28E0076; Tue, 25 Sep 2018 03:14:12 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id D24398E0072; Tue, 25 Sep 2018 03:14:12 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B9AA78E0076; Tue, 25 Sep 2018 03:14:12 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by kanga.kvack.org (Postfix) with ESMTP id 75B648E0072 for ; Tue, 25 Sep 2018 03:14:12 -0400 (EDT) Received: by mail-pf1-f200.google.com with SMTP id b17-v6so1844632pfo.20 for ; Tue, 25 Sep 2018 00:14:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=r2dRgRqzP0cdtJsDaTq4NbUCDBALXe0f4Y3Ou/b2EoA=; b=ZDvjjZdySaeeHi1a8PS95zYuYZo+mpXSLk4AJbp1dvisjxQPZak84gggvZIa6rS1Eo gfd5bN8kTsltjLMXcYC3buY8StjzBRTZ1lITtDQR9JkX80pdUgmrULXOSf4vxVlL+xdL Khc9Kaicrd0gt6+ppgUcDeB5AVqZiYjetO7T+2WW0KjFYIKJhg3tXuTPwdeasWno273j Ca7GImt69tUB+DWtRUQQEw/1PM2phGEYG2TGEgN4GTGBBPi2wmMOwRig9Loe4Tfb/n9Q ZX1CP5ZMikKJwwjt9Af65BImOjYirpRrP/zXlkzq1yyNNWg4PWIepdfew4mn/wA36uA0 680Q== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfog4G7U4txrPm3brH6PUqBYRjlTD5GnX+g1eY3Q1P4ZlQWv80f/C RqeEHR4zb0sWCxPyxsMJkJ8ZhFEy0FFbi9fXOx4DX53up3kwna83bhy42XXdXJyRU2chHBRGBJ5 iAeUOw1mvTghi6m63c2LXomjZwC/dXP+Mn/eibnYlS7ZaBR3fU4KNwZHqdyZZe9EPxQ== X-Received: by 2002:a63:ea0c:: with SMTP id c12-v6mr2042215pgi.158.1537859652137; Tue, 25 Sep 2018 00:14:12 -0700 (PDT) X-Google-Smtp-Source: ACcGV61HQzp9PwTosg2h2dOU1g9lz9IAAGPbW191VnN8uo/G8gqzGI38saDbaPTWRfap/LUo5OWh X-Received: by 2002:a63:ea0c:: with SMTP id c12-v6mr2042171pgi.158.1537859651357; Tue, 25 Sep 2018 00:14:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537859651; cv=none; d=google.com; s=arc-20160816; b=LZ5TpuRWKG5abcWBdKySqjUZEjGt9Pqk3dg/ZDdFx8kAxNamoG+jD/n5L477NCSpzT Ggr85Z8btOCBTy3sCfQd7vDiJK9J9bQNO6nZjHxTCiA3jNBVo17kPTCmkDEA0lxXitv+ VyDs61VoXdlAN+p/ESQP9VBO4BqmJkK3FsIH6lhRpTOCl0a0XyacoXElaCnckVovfabL 9cYcUWqRSJXM63Qxc8tdvTcp2//vkztYlAqWC9L9h2YPFo2EAjWOIc5TFV6VHbkU4oNC yJR9sMEG+4T2YeEQ4sUXW12oNEjS+QkTluWeyB3tD6UbnWdk81K5/gW55EssHJHwTGX1 paFQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=r2dRgRqzP0cdtJsDaTq4NbUCDBALXe0f4Y3Ou/b2EoA=; b=EUjJBeGNbC7s6bKHboSIAdyTXhu4cYYcwQ+cqFa1LUIRUH8WEIFDltuFa9qHPVip4o XMrJ+lKidab7GDO22hQF146htrcPrSWgy/M7+s72SKNr9QIHIQCRyC3wejLoBMLH14h5 ERzVR487Upw3YyVugQ2joJfESoSAOMV4+C8C6NvI/fS/O1eqkaAOUfNbkLeXpa+bb+pA sLRINLbv+9SQuepKd2t5vc7d33QkBcAqYMxGXqiQxezq6CaXbKfVz5P08XvYt/oblF15 wjEBPm/aqNg3+enLm+GghFRbIUJXOAzsfd+EKuFoJ01TGhqy6HIbMC0s93zQ8ZEpuw0j SzNg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga14.intel.com (mga14.intel.com. [192.55.52.115]) by mx.google.com with ESMTPS id a1-v6si1378544pgw.9.2018.09.25.00.14.11 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 25 Sep 2018 00:14:11 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) client-ip=192.55.52.115; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 25 Sep 2018 00:14:11 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,301,1534834800"; d="scan'208";a="89093629" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 25 Sep 2018 00:14:07 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 RESEND 13/21] swap: Support PMD swap mapping in madvise_free() Date: Tue, 25 Sep 2018 15:13:40 +0800 Message-Id: <20180925071348.31458-14-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180925071348.31458-1-ying.huang@intel.com> References: <20180925071348.31458-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When madvise_free() found a PMD swap mapping, if only part of the huge swap cluster is operated on, the PMD swap mapping will be split and fallback to PTE swap mapping processing. Otherwise, if all huge swap cluster is operated on, free_swap_and_cache() will be called to decrease the PMD swap mapping count and probably free the swap space and the THP in swap cache too. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/huge_memory.c | 54 +++++++++++++++++++++++++++++++++++++++--------------- mm/madvise.c | 2 +- 2 files changed, 40 insertions(+), 16 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 18da840bd049..aee8614e99f7 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1844,6 +1844,15 @@ int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) } #endif +static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd) +{ + pgtable_t pgtable; + + pgtable = pgtable_trans_huge_withdraw(mm, pmd); + pte_free(mm, pgtable); + mm_dec_nr_ptes(mm); +} + /* * Return true if we do MADV_FREE successfully on entire pmd page. * Otherwise, return false. @@ -1864,15 +1873,39 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, goto out_unlocked; orig_pmd = *pmd; - if (is_huge_zero_pmd(orig_pmd)) - goto out; - if (unlikely(!pmd_present(orig_pmd))) { - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(orig_pmd)); - goto out; + swp_entry_t entry = pmd_to_swp_entry(orig_pmd); + + if (is_migration_entry(entry)) { + VM_BUG_ON(!thp_migration_supported()); + goto out; + } else if (IS_ENABLED(CONFIG_THP_SWAP) && + !non_swap_entry(entry)) { + /* + * If part of THP is discarded, split the PMD + * swap mapping and operate on the PTEs + */ + if (next - addr != HPAGE_PMD_SIZE) { + unsigned long haddr = addr & HPAGE_PMD_MASK; + + __split_huge_swap_pmd(vma, haddr, pmd); + goto out; + } + free_swap_and_cache(entry, HPAGE_PMD_NR); + pmd_clear(pmd); + zap_deposited_table(mm, pmd); + if (current->mm == mm) + sync_mm_rss(mm); + add_mm_counter(mm, MM_SWAPENTS, -HPAGE_PMD_NR); + ret = true; + goto out; + } else + VM_BUG_ON(1); } + if (is_huge_zero_pmd(orig_pmd)) + goto out; + page = pmd_page(orig_pmd); /* * If other processes are mapping this page, we couldn't discard @@ -1918,15 +1951,6 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, return ret; } -static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd) -{ - pgtable_t pgtable; - - pgtable = pgtable_trans_huge_withdraw(mm, pmd); - pte_free(mm, pgtable); - mm_dec_nr_ptes(mm); -} - int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr) { diff --git a/mm/madvise.c b/mm/madvise.c index 6fff1c1d2009..07ef599d4255 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -321,7 +321,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, unsigned long next; next = pmd_addr_end(addr, end); - if (pmd_trans_huge(*pmd)) + if (pmd_trans_huge(*pmd) || is_swap_pmd(*pmd)) if (madvise_free_huge_pmd(tlb, vma, pmd, addr, next)) goto next; From patchwork Tue Sep 25 07:13:41 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10613525 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F0ED013A4 for ; Tue, 25 Sep 2018 07:14:31 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E092729B12 for ; Tue, 25 Sep 2018 07:14:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D36AD29B18; Tue, 25 Sep 2018 07:14:31 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9DE6829B12 for ; Tue, 25 Sep 2018 07:14:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E5DE58E0077; Tue, 25 Sep 2018 03:14:16 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id DBCE18E0072; Tue, 25 Sep 2018 03:14:16 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C38028E0078; Tue, 25 Sep 2018 03:14:16 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) by kanga.kvack.org (Postfix) with ESMTP id 635EE8E0072 for ; Tue, 25 Sep 2018 03:14:16 -0400 (EDT) Received: by mail-pf1-f198.google.com with SMTP id j15-v6so11909346pfi.10 for ; Tue, 25 Sep 2018 00:14:16 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=8SXg1gKExHwfqOcEhugYIynnWWpPno9GNAg8eZKaGaQ=; b=oUHmKSVwAlwpX6XIzgEp9gQD8h7h/e8M4oMMA5mcMH751RzCs69Cto1D818BrjPs5F QQW9Q2Zdp9CQlae5vOKtch8MwE6r6X30rZxgpKdeXUtswEG7rDygJrqoj7HypKVwSIz5 8taA6Ewbw62JkQsw16J1/lQPGCC4SkoUTFfTDFd2ZzaoTjX/ymKLiJM907suvw/VBUfV SpJgAzm12j38NkLHImCIDBAZ067p3xQlgfbw7iQmWJDjfspgb0TfzZ+qtoPu9Ij/a5VZ 1iV78pl72BSbQvnWvOvC2mTlaYpwLlxS2xUXdEX5Pq3nfXdtdlznZJ9NE0qIxll/NMLl 4tzA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfoik0vqKIg8td6DU4IIcmnMBonqXVonCzdgdJ3cxdccaEmq/Rc9U iaytoiV9VgsDtv0hYulOsA1cQCIkx1eEYsAPiPCAn+ULW7KUy0y6x92jSL1BkOISycLHdypy6yd gOG7q5JmtblGGLrQRJKVAYr8iYNqTBngk6qvBi3lIe45d5+hOxUTrZP3COIsfD+tLUg== X-Received: by 2002:a63:a112:: with SMTP id b18-v6mr2055214pgf.384.1537859656047; Tue, 25 Sep 2018 00:14:16 -0700 (PDT) X-Google-Smtp-Source: ACcGV60WB7ZQKk/DSdO0cCwk9GWCqSqLwalU4akv/SfzqMmNJ2maDCf3v4QskeBRjRF3Ci4qOvqM X-Received: by 2002:a63:a112:: with SMTP id b18-v6mr2055128pgf.384.1537859654785; Tue, 25 Sep 2018 00:14:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537859654; cv=none; d=google.com; s=arc-20160816; b=FGshrAqHnTg3+VUsh+RvOPgW3rBXCr0YFj5wy6AhK3FUCGtMKFIHoQ3EGNvSpwe/FN J8r90BvN2vQbBwoI9epSzTXUWC7QOR6rE3ed1lp4G36oCutp7f766i5jKiXM5FzZuIJ/ ZfRqeHnSsjheI73paQNelOUh0+X9nnc4rM3JQCoqxnhfQPThLbiH5ipaBeWJKDdhoc45 sy6sCZX9myxKzQlPRkhXtCHbcz2dL3CRmy52TTh8SqItT4yqlDMgfbPh0JwRAJ4ST45Y 8zcCXIVrWbKCDArQ2tKuP3BDrDyIL1tl1gK1SEBEOIOmj0Hrj0FH/GxqzUzB1Kgg/eyr Suig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=8SXg1gKExHwfqOcEhugYIynnWWpPno9GNAg8eZKaGaQ=; b=gc5gSSVRxk7SKYAySswzWdJkLW4L9AW/q4cM/sQ9coIZwnar0wo4FBX4iBie5FyrcU aH/cMSz7i9KLc2fmGUhK52z9yGzjVmq3mTFe5z/5jF4z6XB3zWgvh4TUrKLqkLcuqLRt AInEjOAgpi4B3ZFlavVCXn+5Ts+zPIu+fXCcqG4r0zlR4qtxreXnYTXucoMsJBo0W0DM zWuu1bfIJGM6K7TrrZl+0piYhdAfXkdqWuUnGiFSeG/Xt8vStnj6HMs7/PEvy5iVjXL2 DYsa7JiK1DfqPAMmPo4wWzgDUDYrPRKz+gQHoY5HkmGRhVLc31z+QEF+ISfbpSHu3VJF Xarw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga14.intel.com (mga14.intel.com. [192.55.52.115]) by mx.google.com with ESMTPS id a1-v6si1378544pgw.9.2018.09.25.00.14.14 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 25 Sep 2018 00:14:14 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) client-ip=192.55.52.115; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 25 Sep 2018 00:14:14 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,301,1534834800"; d="scan'208";a="89093639" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 25 Sep 2018 00:14:11 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 RESEND 14/21] swap: Support to move swap account for PMD swap mapping Date: Tue, 25 Sep 2018 15:13:41 +0800 Message-Id: <20180925071348.31458-15-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180925071348.31458-1-ying.huang@intel.com> References: <20180925071348.31458-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Previously the huge swap cluster will be split after the THP is swapout. Now, to support to swapin the THP in one piece, the huge swap cluster will not be split after the THP is reclaimed. So in memcg, we need to move the swap account for PMD swap mappings in the process's page table. When the page table is scanned during moving memcg charge, the PMD swap mapping will be identified. And mem_cgroup_move_swap_account() and its callee is revised to move account for the whole huge swap cluster. If the swap cluster mapped by PMD has been split, the PMD swap mapping will be split and fallback to PTE processing. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/huge_mm.h | 9 ++++ include/linux/swap.h | 6 +++ include/linux/swap_cgroup.h | 3 +- mm/huge_memory.c | 8 +-- mm/memcontrol.c | 129 ++++++++++++++++++++++++++++++++++---------- mm/swap_cgroup.c | 45 +++++++++++++--- mm/swapfile.c | 14 +++++ 7 files changed, 174 insertions(+), 40 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 25ba9b5f1e60..6586c1bfac21 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -406,6 +406,9 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #ifdef CONFIG_THP_SWAP +extern void __split_huge_swap_pmd(struct vm_area_struct *vma, + unsigned long haddr, + pmd_t *pmd); extern int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long address, pmd_t orig_pmd); extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd); @@ -433,6 +436,12 @@ static inline bool transparent_hugepage_swapin_enabled( return false; } #else /* CONFIG_THP_SWAP */ +static inline void __split_huge_swap_pmd(struct vm_area_struct *vma, + unsigned long haddr, + pmd_t *pmd) +{ +} + static inline int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long address, pmd_t orig_pmd) { diff --git a/include/linux/swap.h b/include/linux/swap.h index f0424db46add..74221adc4000 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -621,6 +621,7 @@ static inline swp_entry_t get_swap_page(struct page *page) #ifdef CONFIG_THP_SWAP extern int split_swap_cluster(swp_entry_t entry, unsigned long flags); extern int split_swap_cluster_map(swp_entry_t entry); +extern int get_swap_entry_size(swp_entry_t entry); #else static inline int split_swap_cluster(swp_entry_t entry, unsigned long flags) { @@ -631,6 +632,11 @@ static inline int split_swap_cluster_map(swp_entry_t entry) { return 0; } + +static inline int get_swap_entry_size(swp_entry_t entry) +{ + return 1; +} #endif #ifdef CONFIG_MEMCG diff --git a/include/linux/swap_cgroup.h b/include/linux/swap_cgroup.h index a12dd1c3966c..c40fb52b0563 100644 --- a/include/linux/swap_cgroup.h +++ b/include/linux/swap_cgroup.h @@ -7,7 +7,8 @@ #ifdef CONFIG_MEMCG_SWAP extern unsigned short swap_cgroup_cmpxchg(swp_entry_t ent, - unsigned short old, unsigned short new); + unsigned short old, unsigned short new, + unsigned int nr_ents); extern unsigned short swap_cgroup_record(swp_entry_t ent, unsigned short id, unsigned int nr_ents); extern unsigned short lookup_swap_cgroup_id(swp_entry_t ent); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index aee8614e99f7..35c7243720bc 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1631,10 +1631,11 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd) return 0; } +#ifdef CONFIG_THP_SWAP /* Convert a PMD swap mapping to a set of PTE swap mappings */ -static void __split_huge_swap_pmd(struct vm_area_struct *vma, - unsigned long haddr, - pmd_t *pmd) +void __split_huge_swap_pmd(struct vm_area_struct *vma, + unsigned long haddr, + pmd_t *pmd) { struct mm_struct *mm = vma->vm_mm; pgtable_t pgtable; @@ -1665,7 +1666,6 @@ static void __split_huge_swap_pmd(struct vm_area_struct *vma, pmd_populate(mm, pmd, pgtable); } -#ifdef CONFIG_THP_SWAP int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long address, pmd_t orig_pmd) { diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 0da010a4b3bf..28a8b50c64da 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2660,9 +2660,10 @@ void mem_cgroup_split_huge_fixup(struct page *head) #ifdef CONFIG_MEMCG_SWAP /** * mem_cgroup_move_swap_account - move swap charge and swap_cgroup's record. - * @entry: swap entry to be moved + * @entry: the first swap entry to be moved * @from: mem_cgroup which the entry is moved from * @to: mem_cgroup which the entry is moved to + * @nr_ents: number of swap entries * * It succeeds only when the swap_cgroup's record for this entry is the same * as the mem_cgroup's id of @from. @@ -2673,23 +2674,27 @@ void mem_cgroup_split_huge_fixup(struct page *head) * both res and memsw, and called css_get(). */ static int mem_cgroup_move_swap_account(swp_entry_t entry, - struct mem_cgroup *from, struct mem_cgroup *to) + struct mem_cgroup *from, + struct mem_cgroup *to, + unsigned int nr_ents) { unsigned short old_id, new_id; old_id = mem_cgroup_id(from); new_id = mem_cgroup_id(to); - if (swap_cgroup_cmpxchg(entry, old_id, new_id) == old_id) { - mod_memcg_state(from, MEMCG_SWAP, -1); - mod_memcg_state(to, MEMCG_SWAP, 1); + if (swap_cgroup_cmpxchg(entry, old_id, new_id, nr_ents) == old_id) { + mod_memcg_state(from, MEMCG_SWAP, -nr_ents); + mod_memcg_state(to, MEMCG_SWAP, nr_ents); return 0; } return -EINVAL; } #else static inline int mem_cgroup_move_swap_account(swp_entry_t entry, - struct mem_cgroup *from, struct mem_cgroup *to) + struct mem_cgroup *from, + struct mem_cgroup *to, + unsigned int nr_ents) { return -EINVAL; } @@ -4644,6 +4649,7 @@ enum mc_target_type { MC_TARGET_PAGE, MC_TARGET_SWAP, MC_TARGET_DEVICE, + MC_TARGET_FALLBACK, }; static struct page *mc_handle_present_pte(struct vm_area_struct *vma, @@ -4710,6 +4716,26 @@ static struct page *mc_handle_swap_pte(struct vm_area_struct *vma, } #endif +static struct page *mc_handle_swap_pmd(struct vm_area_struct *vma, + pmd_t pmd, swp_entry_t *entry) +{ + struct page *page = NULL; + swp_entry_t ent = pmd_to_swp_entry(pmd); + + if (!(mc.flags & MOVE_ANON) || non_swap_entry(ent)) + return NULL; + + /* + * Because lookup_swap_cache() updates some statistics counter, + * we call find_get_page() with swapper_space directly. + */ + page = find_get_page(swap_address_space(ent), swp_offset(ent)); + if (do_memsw_account()) + entry->val = ent.val; + + return page; +} + static struct page *mc_handle_file_pte(struct vm_area_struct *vma, unsigned long addr, pte_t ptent, swp_entry_t *entry) { @@ -4898,7 +4924,9 @@ static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma, * There is a swap entry and a page doesn't exist or isn't charged. * But we cannot move a tail-page in a THP. */ - if (ent.val && !ret && (!page || !PageTransCompound(page)) && + if (ent.val && !ret && + ((page && !PageTransCompound(page)) || + (!page && get_swap_entry_size(ent) == 1)) && mem_cgroup_id(mc.from) == lookup_swap_cgroup_id(ent)) { ret = MC_TARGET_SWAP; if (target) @@ -4909,37 +4937,64 @@ static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma, #ifdef CONFIG_TRANSPARENT_HUGEPAGE /* - * We don't consider PMD mapped swapping or file mapped pages because THP does - * not support them for now. - * Caller should make sure that pmd_trans_huge(pmd) is true. + * We don't consider file mapped pages because THP does not support + * them for now. */ static enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma, - unsigned long addr, pmd_t pmd, union mc_target *target) + unsigned long addr, pmd_t *pmdp, union mc_target *target) { + pmd_t pmd = *pmdp; struct page *page = NULL; enum mc_target_type ret = MC_TARGET_NONE; + swp_entry_t ent = { .val = 0 }; if (unlikely(is_swap_pmd(pmd))) { - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(pmd)); - return ret; + if (is_pmd_migration_entry(pmd)) { + VM_BUG_ON(!thp_migration_supported()); + return ret; + } + if (!IS_ENABLED(CONFIG_THP_SWAP)) { + VM_BUG_ON(1); + return ret; + } + page = mc_handle_swap_pmd(vma, pmd, &ent); + /* The swap cluster has been split under us */ + if ((page && !PageTransHuge(page)) || + (!page && ent.val && get_swap_entry_size(ent) == 1)) { + __split_huge_swap_pmd(vma, addr, pmdp); + ret = MC_TARGET_FALLBACK; + goto out; + } + } else { + page = pmd_page(pmd); + get_page(page); } - page = pmd_page(pmd); - VM_BUG_ON_PAGE(!page || !PageHead(page), page); + VM_BUG_ON_PAGE(page && !PageHead(page), page); if (!(mc.flags & MOVE_ANON)) - return ret; - if (page->mem_cgroup == mc.from) { + goto out; + if (!page && !ent.val) + goto out; + if (page && page->mem_cgroup == mc.from) { ret = MC_TARGET_PAGE; if (target) { get_page(page); target->page = page; } } + if (ent.val && !ret && !page && + mem_cgroup_id(mc.from) == lookup_swap_cgroup_id(ent)) { + ret = MC_TARGET_SWAP; + if (target) + target->ent = ent; + } +out: + if (page) + put_page(page); return ret; } #else static inline enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma, - unsigned long addr, pmd_t pmd, union mc_target *target) + unsigned long addr, pmd_t *pmdp, union mc_target *target) { return MC_TARGET_NONE; } @@ -4952,6 +5007,7 @@ static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd, struct vm_area_struct *vma = walk->vma; pte_t *pte; spinlock_t *ptl; + int ret; ptl = pmd_trans_huge_lock(pmd, vma); if (ptl) { @@ -4960,12 +5016,16 @@ static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd, * support transparent huge page with MEMORY_DEVICE_PUBLIC or * MEMORY_DEVICE_PRIVATE but this might change. */ - if (get_mctgt_type_thp(vma, addr, *pmd, NULL) == MC_TARGET_PAGE) - mc.precharge += HPAGE_PMD_NR; + ret = get_mctgt_type_thp(vma, addr, pmd, NULL); spin_unlock(ptl); + if (ret == MC_TARGET_FALLBACK) + goto fallback; + if (ret) + mc.precharge += HPAGE_PMD_NR; return 0; } +fallback: if (pmd_trans_unstable(pmd)) return 0; pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); @@ -5156,6 +5216,7 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, enum mc_target_type target_type; union mc_target target; struct page *page; + swp_entry_t ent; ptl = pmd_trans_huge_lock(pmd, vma); if (ptl) { @@ -5163,8 +5224,9 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, spin_unlock(ptl); return 0; } - target_type = get_mctgt_type_thp(vma, addr, *pmd, &target); - if (target_type == MC_TARGET_PAGE) { + target_type = get_mctgt_type_thp(vma, addr, pmd, &target); + switch (target_type) { + case MC_TARGET_PAGE: page = target.page; if (!isolate_lru_page(page)) { if (!mem_cgroup_move_account(page, true, @@ -5175,7 +5237,8 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, putback_lru_page(page); } put_page(page); - } else if (target_type == MC_TARGET_DEVICE) { + break; + case MC_TARGET_DEVICE: page = target.page; if (!mem_cgroup_move_account(page, true, mc.from, mc.to)) { @@ -5183,9 +5246,21 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, mc.moved_charge += HPAGE_PMD_NR; } put_page(page); + break; + case MC_TARGET_SWAP: + ent = target.ent; + if (!mem_cgroup_move_swap_account(ent, mc.from, mc.to, + HPAGE_PMD_NR)) { + mc.precharge -= HPAGE_PMD_NR; + mc.moved_swap += HPAGE_PMD_NR; + } + break; + default: + break; } spin_unlock(ptl); - return 0; + if (target_type != MC_TARGET_FALLBACK) + return 0; } if (pmd_trans_unstable(pmd)) @@ -5195,7 +5270,6 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, for (; addr != end; addr += PAGE_SIZE) { pte_t ptent = *(pte++); bool device = false; - swp_entry_t ent; if (!mc.precharge) break; @@ -5229,7 +5303,8 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, break; case MC_TARGET_SWAP: ent = target.ent; - if (!mem_cgroup_move_swap_account(ent, mc.from, mc.to)) { + if (!mem_cgroup_move_swap_account(ent, mc.from, + mc.to, 1)) { mc.precharge--; /* we fixup refcnts and charges later. */ mc.moved_swap++; diff --git a/mm/swap_cgroup.c b/mm/swap_cgroup.c index 45affaef3bc6..ccc08e88962a 100644 --- a/mm/swap_cgroup.c +++ b/mm/swap_cgroup.c @@ -87,29 +87,58 @@ static struct swap_cgroup *lookup_swap_cgroup(swp_entry_t ent, /** * swap_cgroup_cmpxchg - cmpxchg mem_cgroup's id for this swp_entry. - * @ent: swap entry to be cmpxchged + * @ent: the first swap entry to be cmpxchged * @old: old id * @new: new id + * @nr_ents: number of swap entries * * Returns old id at success, 0 at failure. * (There is no mem_cgroup using 0 as its id) */ unsigned short swap_cgroup_cmpxchg(swp_entry_t ent, - unsigned short old, unsigned short new) + unsigned short old, unsigned short new, + unsigned int nr_ents) { struct swap_cgroup_ctrl *ctrl; - struct swap_cgroup *sc; + struct swap_cgroup *sc_start, *sc; unsigned long flags; unsigned short retval; + pgoff_t offset_start = swp_offset(ent), offset; + pgoff_t end = offset_start + nr_ents; - sc = lookup_swap_cgroup(ent, &ctrl); + sc_start = lookup_swap_cgroup(ent, &ctrl); spin_lock_irqsave(&ctrl->lock, flags); - retval = sc->id; - if (retval == old) + sc = sc_start; + offset = offset_start; + for (;;) { + if (sc->id != old) { + retval = 0; + goto out; + } + offset++; + if (offset == end) + break; + if (offset % SC_PER_PAGE) + sc++; + else + sc = __lookup_swap_cgroup(ctrl, offset); + } + + sc = sc_start; + offset = offset_start; + for (;;) { sc->id = new; - else - retval = 0; + offset++; + if (offset == end) + break; + if (offset % SC_PER_PAGE) + sc++; + else + sc = __lookup_swap_cgroup(ctrl, offset); + } + retval = old; +out: spin_unlock_irqrestore(&ctrl->lock, flags); return retval; } diff --git a/mm/swapfile.c b/mm/swapfile.c index 64067ee6a09c..bff2cb7badbb 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1730,6 +1730,20 @@ static int page_trans_huge_map_swapcount(struct page *page, int *total_mapcount, return map_swapcount; } +#ifdef CONFIG_THP_SWAP +int get_swap_entry_size(swp_entry_t entry) +{ + struct swap_info_struct *si; + struct swap_cluster_info *ci; + + si = _swap_info_get(entry); + if (!si || !si->cluster_info) + return 1; + ci = si->cluster_info + swp_offset(entry) / SWAPFILE_CLUSTER; + return cluster_is_huge(ci) ? SWAPFILE_CLUSTER : 1; +} +#endif + /* * We can write to an anon page without COW if there are no other references * to it. And as a side-effect, free up its swap: because the old content From patchwork Tue Sep 25 07:13:42 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10613527 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 23066161F for ; Tue, 25 Sep 2018 07:14:35 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1214E29B12 for ; Tue, 25 Sep 2018 07:14:35 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0614E29B17; Tue, 25 Sep 2018 07:14:35 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 68D0829B12 for ; Tue, 25 Sep 2018 07:14:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A3BEB8E0078; Tue, 25 Sep 2018 03:14:19 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 9C46F8E0072; Tue, 25 Sep 2018 03:14:19 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 844C38E0078; Tue, 25 Sep 2018 03:14:19 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f198.google.com (mail-pg1-f198.google.com [209.85.215.198]) by kanga.kvack.org (Postfix) with ESMTP id 3B5C38E0072 for ; Tue, 25 Sep 2018 03:14:19 -0400 (EDT) Received: by mail-pg1-f198.google.com with SMTP id 132-v6so9233439pga.18 for ; Tue, 25 Sep 2018 00:14:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=rji6ltTsTgqZxrSEhTGzjmxih1WUcmAf4gDg6DXH4cs=; b=RDQsj48D9Paw1nTqpQQYMfOOg4JMeitNruQY/gqujv2whQFYk8aVw5tNZRA+0kIOQK BV/JPCwOFrZerBSffbmvw2/0eqbweHdztqPYaE/DS753Y2IoonD1tU6oVFtoOHjV9zkS tlQ9K8iHaWw43VzFcVfxhBFQfgwyf3iEq7lHbxBiALYIZYgu7js3WpKhnTMP9n6udjv6 64Ti996XqpU0g6M5v7cpEgTaOOaSGtNBpwsvaodCC7/NvMvFdehYkD67cWUBCI9HaEUW PXU/8NAl5Zwvq8/Q6KZM7eEsSoG/6OlruPVOwhAcnGSanRUk6yKyTD+K90vO5I3Ll+8Z 863Q== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfoig/+x2UKvqC4JxnCOtzYdqUiuOvTHqFbaYTkxuHtNFcTFf+R/r 1ne1X/pFhQ63t6BKHeeu/dLThlZMQ2VIys6++NcOctVBGyvJWuXKyjoyQRu2Nyrs2j4kazqsvVm bFnlkY6UeKaY36CbFZ0qwLz4W5KpMKk4caC95VZohRvZ35mdGh+iqF+e0wsEsUmMzDQ== X-Received: by 2002:a17:902:8681:: with SMTP id g1-v6mr2154062plo.302.1537859658933; Tue, 25 Sep 2018 00:14:18 -0700 (PDT) X-Google-Smtp-Source: ACcGV62AXd6VP050q3VVIO+v7T1qyqUCAnFKYcbtBlZpx12eiUADhqUfMyIs6fIysy9vKtkyUi+N X-Received: by 2002:a17:902:8681:: with SMTP id g1-v6mr2154009plo.302.1537859658156; Tue, 25 Sep 2018 00:14:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537859658; cv=none; d=google.com; s=arc-20160816; b=HQW8TgWJQJ/TsXDuWY8Jsv7ZmTlvG0y2tvAF9tdudwoGIEAN31c0IoSPjyR3PF/Xjc X0oitISyRJ8vFPxX/edsnAaOTMKB2U6/ESgZD8asOI1gbqNQKanLamgdXRV5eU4IjXRx 5lv5K/fUX8fmpliTihOIrbAAE5HiY4eIs1bDySPLl1sfOsCWQZfMX8teGWCWmf2yXAHs dKOKjI06kSyTm/SptuAMsgCT2OOotqxjL4h6uQGg2timHxN65dRMr7xr5kzFwYlvYmMT ns6lmY8ImHXBDqNhVQh8zS25OV02zw4cGz6Dfp4Za9erPv1gHbvl6hAaROu0Woxpi3Y3 Mmsg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=rji6ltTsTgqZxrSEhTGzjmxih1WUcmAf4gDg6DXH4cs=; b=0/oHDN2wYNE4PkDGTK84wg8uRj8+1pIZy8PVwnqluAVHe03ZJ6AFM5Km+DEHsU1gdn /UaqgCtbeamyIH2iyJVP1CxICph0HoCDzSEAqoJ2NaSnKv+Q39uBSsnPP2L4Jf9VI64x 4YoyfhqMil4M1igr04FIlihhH6jB/nLdxbcSFfkHcNzf/P5ghsZ8Lhx3UwWWLoJlswnt NobAFYCYU29da5NCHRrdAViNE1byf7nk6KWaBwTlFUz4fYSQdS2qUINdrY34/8kpk8gH UjPI8TPqoahi3LM6pSQh9OpSKJwZ/9rQwNxxIO8IMzrZJdN7UHq+Skn5Fqe6zvw7O1eN ZC6w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga14.intel.com (mga14.intel.com. [192.55.52.115]) by mx.google.com with ESMTPS id a1-v6si1378544pgw.9.2018.09.25.00.14.17 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 25 Sep 2018 00:14:18 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) client-ip=192.55.52.115; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 25 Sep 2018 00:14:17 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,301,1534834800"; d="scan'208";a="89093652" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 25 Sep 2018 00:14:14 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 RESEND 15/21] swap: Support to copy PMD swap mapping when fork() Date: Tue, 25 Sep 2018 15:13:42 +0800 Message-Id: <20180925071348.31458-16-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180925071348.31458-1-ying.huang@intel.com> References: <20180925071348.31458-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP During fork, the page table need to be copied from parent to child. A PMD swap mapping need to be copied too and the swap reference count need to be increased. When the huge swap cluster has been split already, we need to split the PMD swap mapping and fallback to PTE copying. When swap count continuation failed to allocate a page with GFP_ATOMIC, we need to unlock the spinlock and try again with GFP_KERNEL. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/huge_memory.c | 72 ++++++++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 57 insertions(+), 15 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 35c7243720bc..c569e5e8ee17 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -941,6 +941,7 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, if (unlikely(!pgtable)) goto out; +retry: dst_ptl = pmd_lock(dst_mm, dst_pmd); src_ptl = pmd_lockptr(src_mm, src_pmd); spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); @@ -948,26 +949,67 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, ret = -EAGAIN; pmd = *src_pmd; -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION if (unlikely(is_swap_pmd(pmd))) { swp_entry_t entry = pmd_to_swp_entry(pmd); - VM_BUG_ON(!is_pmd_migration_entry(pmd)); - if (is_write_migration_entry(entry)) { - make_migration_entry_read(&entry); - pmd = swp_entry_to_pmd(entry); - if (pmd_swp_soft_dirty(*src_pmd)) - pmd = pmd_swp_mksoft_dirty(pmd); - set_pmd_at(src_mm, addr, src_pmd, pmd); +#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION + if (is_migration_entry(entry)) { + if (is_write_migration_entry(entry)) { + make_migration_entry_read(&entry); + pmd = swp_entry_to_pmd(entry); + if (pmd_swp_soft_dirty(*src_pmd)) + pmd = pmd_swp_mksoft_dirty(pmd); + set_pmd_at(src_mm, addr, src_pmd, pmd); + } + add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); + mm_inc_nr_ptes(dst_mm); + pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); + set_pmd_at(dst_mm, addr, dst_pmd, pmd); + ret = 0; + goto out_unlock; } - add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); - mm_inc_nr_ptes(dst_mm); - pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); - set_pmd_at(dst_mm, addr, dst_pmd, pmd); - ret = 0; - goto out_unlock; - } #endif + if (IS_ENABLED(CONFIG_THP_SWAP) && !non_swap_entry(entry)) { + ret = swap_duplicate(&entry, HPAGE_PMD_NR); + if (!ret) { + add_mm_counter(dst_mm, MM_SWAPENTS, + HPAGE_PMD_NR); + mm_inc_nr_ptes(dst_mm); + pgtable_trans_huge_deposit(dst_mm, dst_pmd, + pgtable); + set_pmd_at(dst_mm, addr, dst_pmd, pmd); + /* make sure dst_mm is on swapoff's mmlist. */ + if (unlikely(list_empty(&dst_mm->mmlist))) { + spin_lock(&mmlist_lock); + if (list_empty(&dst_mm->mmlist)) + list_add(&dst_mm->mmlist, + &src_mm->mmlist); + spin_unlock(&mmlist_lock); + } + } else if (ret == -ENOTDIR) { + /* + * The huge swap cluster has been split, split + * the PMD swap mapping and fallback to PTE + */ + __split_huge_swap_pmd(vma, addr, src_pmd); + pte_free(dst_mm, pgtable); + } else if (ret == -ENOMEM) { + spin_unlock(src_ptl); + spin_unlock(dst_ptl); + ret = add_swap_count_continuation(entry, + GFP_KERNEL); + if (ret < 0) { + ret = -ENOMEM; + pte_free(dst_mm, pgtable); + goto out; + } + goto retry; + } else + VM_BUG_ON(1); + goto out_unlock; + } + VM_BUG_ON(1); + } if (unlikely(!pmd_trans_huge(pmd))) { pte_free(dst_mm, pgtable); From patchwork Tue Sep 25 07:13:43 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10613531 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A8D8113A4 for ; Tue, 25 Sep 2018 07:14:38 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9662329B17 for ; Tue, 25 Sep 2018 07:14:38 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 88BEA29B1C; Tue, 25 Sep 2018 07:14:38 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EB5C529B12 for ; Tue, 25 Sep 2018 07:14:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 07A438E0079; Tue, 25 Sep 2018 03:14:23 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id EF3208E0072; Tue, 25 Sep 2018 03:14:22 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CFB458E007A; Tue, 25 Sep 2018 03:14:22 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) by kanga.kvack.org (Postfix) with ESMTP id 85A718E0072 for ; Tue, 25 Sep 2018 03:14:22 -0400 (EDT) Received: by mail-pf1-f198.google.com with SMTP id z1-v6so2245082pfn.14 for ; Tue, 25 Sep 2018 00:14:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=XmoJX+vXP1LKOFlSybL31F0ffOJIU8Y7oseMCNiHoTc=; b=IVhMSxUSw21a1pjoGcb49PtA+yMZwfZNbvbq1+IHuUig+ZPHsWaZNwJQ54PfLs7AHc 5osR6Ah66Yj/nlsozucMaxXvog9jZi5OdUYWIPeywHoL3+1Jq+zDXV0l8pEwZV9IXvj5 HvCv2Lqk8scnf8AoQ+y4/jzoDK9bf0taXIJcFrKMwr22tqYRdPpcx2oBCTY6JiHAEUfJ lJSmsWTpWYWMMyk7ksByvmVAGGRat0CSUOZCnD55gAadTXFvTLU/Vo9qvPO8xkxcujjX a2dmDOV9ebHPEU9XT7L10eYAagfn/16HVH54ilz2JHQ7rgnCXLR4t4d5gD90In/cgKUX Mmvw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfojwQnW+i19kJoFZObprOlDS6pEME7qWrPxaS6HuH/4Lh792o3xn 6bgZCxAxlXQlYf9w6o9ox5G73NjKi/LI9vsQ3se08XvZ+oSTiRzDiUZiFVVkBYxeeDW20OLTdYi /Ymg0CFjsmM/4GWe+InP9hZ/awTjuT1zuVDmkesWIP2oGcBA4brq2LXjpFCzd9mfP8w== X-Received: by 2002:a17:902:4d46:: with SMTP id o6-v6mr2173249plh.59.1537859662243; Tue, 25 Sep 2018 00:14:22 -0700 (PDT) X-Google-Smtp-Source: ACcGV63pPYHxHSrE5DVpK9nem/+PIy4f76BHWsaH2Ku2aBZ+WuZhOmitVgW/RdjWpBFxSngd3Bnj X-Received: by 2002:a17:902:4d46:: with SMTP id o6-v6mr2173184plh.59.1537859661435; Tue, 25 Sep 2018 00:14:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537859661; cv=none; d=google.com; s=arc-20160816; b=n27+smVopVwUqSFA7pQGZ+V0MS1dVK/qCtiH+Z6fKf0AzFsi0pnkSCtNBtjwZDt4Jn 1Mc96Warh6yr+WjrhgBeym6EShzqz8ws2CxB+m1fHEkkaSdBuSiiRduoNnDST3GJ1bXf gzyX+TpZ0gCnIFPbVXXk2C12Vpcz1+NQv9VvtIZm23gJdDQ3+1dHvVphPUJ6+viAQ3fy NQkAmmpl0BMbOqQm0rRrJyTxGTvET1l6+ef1zNOVaQUtieO+1nUO11y/F85Fs1nYkhtV x1/ayuQqJGopd/WUpYE+i9tJZnCmDQhFGkhXtdR6Nkkaehql/Do47URCkt42sIGcnqoR 1n3Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=XmoJX+vXP1LKOFlSybL31F0ffOJIU8Y7oseMCNiHoTc=; b=N3uEmwshLTCUxbf81EjxN46lQE/I95tpLO76B5T9CmyHKdIyg2br6/Czwvu/hIMmhg gZpKKoHa16d9sRiTtl9vaMp2bQhfSB9kZ4E1D0PMsD3ewQSeEYihlQQAbMi1elByEwiI 5UcZw63izzdeGHu5DV4YSfRwmegXMcsi4p/AgKKvCoCEewm+bPmrKhE2AqAF1SgfZP55 haefnhfXle7lCYFpuU8ZgHjeDLZeHRqinLDjYWK/rYAoeAfXFfa8UxWlJyBE/8p470+P VeL43HoAdkfftuaz+7vVcfvuQ3cDtnoNpTdiKGzgH9Sw0p+HBbslolaMuwuhw1xZhUp8 yyjA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga14.intel.com (mga14.intel.com. [192.55.52.115]) by mx.google.com with ESMTPS id a1-v6si1378544pgw.9.2018.09.25.00.14.21 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 25 Sep 2018 00:14:21 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) client-ip=192.55.52.115; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 25 Sep 2018 00:14:21 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,301,1534834800"; d="scan'208";a="89093660" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 25 Sep 2018 00:14:17 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 RESEND 16/21] swap: Free PMD swap mapping when zap_huge_pmd() Date: Tue, 25 Sep 2018 15:13:43 +0800 Message-Id: <20180925071348.31458-17-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180925071348.31458-1-ying.huang@intel.com> References: <20180925071348.31458-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP For a PMD swap mapping, zap_huge_pmd() will clear the PMD and call free_swap_and_cache() to decrease the swap reference count and maybe free or split the huge swap cluster and the THP in swap cache. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/huge_memory.c | 32 +++++++++++++++++++++----------- 1 file changed, 21 insertions(+), 11 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index c569e5e8ee17..accbd54d0ed4 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2019,7 +2019,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, spin_unlock(ptl); if (is_huge_zero_pmd(orig_pmd)) tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE); - } else if (is_huge_zero_pmd(orig_pmd)) { + } else if (pmd_present(orig_pmd) && is_huge_zero_pmd(orig_pmd)) { zap_deposited_table(tlb->mm, pmd); spin_unlock(ptl); tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE); @@ -2032,17 +2032,27 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, page_remove_rmap(page, true); VM_BUG_ON_PAGE(page_mapcount(page) < 0, page); VM_BUG_ON_PAGE(!PageHead(page), page); - } else if (thp_migration_supported()) { - swp_entry_t entry; - - VM_BUG_ON(!is_pmd_migration_entry(orig_pmd)); - entry = pmd_to_swp_entry(orig_pmd); - page = pfn_to_page(swp_offset(entry)); + } else { + swp_entry_t entry = pmd_to_swp_entry(orig_pmd); + + if (thp_migration_supported() && + is_migration_entry(entry)) + page = pfn_to_page(swp_offset(entry)); + else if (IS_ENABLED(CONFIG_THP_SWAP) && + !non_swap_entry(entry)) + free_swap_and_cache(entry, HPAGE_PMD_NR); + else { + WARN_ONCE(1, +"Non present huge pmd without pmd migration or swap enabled!"); + goto unlock; + } flush_needed = 0; - } else - WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!"); + } - if (PageAnon(page)) { + if (!page) { + zap_deposited_table(tlb->mm, pmd); + add_mm_counter(tlb->mm, MM_SWAPENTS, -HPAGE_PMD_NR); + } else if (PageAnon(page)) { zap_deposited_table(tlb->mm, pmd); add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR); } else { @@ -2050,7 +2060,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, zap_deposited_table(tlb->mm, pmd); add_mm_counter(tlb->mm, mm_counter_file(page), -HPAGE_PMD_NR); } - +unlock: spin_unlock(ptl); if (flush_needed) tlb_remove_page_size(tlb, page, HPAGE_PMD_SIZE); From patchwork Tue Sep 25 07:13:44 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10613533 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 51B8E161F for ; Tue, 25 Sep 2018 07:14:42 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4010F29B12 for ; Tue, 25 Sep 2018 07:14:42 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 331F729B17; Tue, 25 Sep 2018 07:14:42 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A260B29B12 for ; Tue, 25 Sep 2018 07:14:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6BBC98E007A; Tue, 25 Sep 2018 03:14:26 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 5CDBF8E0072; Tue, 25 Sep 2018 03:14:26 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 497428E007A; Tue, 25 Sep 2018 03:14:26 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f199.google.com (mail-pf1-f199.google.com [209.85.210.199]) by kanga.kvack.org (Postfix) with ESMTP id 03BB08E0072 for ; Tue, 25 Sep 2018 03:14:26 -0400 (EDT) Received: by mail-pf1-f199.google.com with SMTP id f89-v6so745753pff.7 for ; Tue, 25 Sep 2018 00:14:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=b7M8l068DUkpjDOr5g3Cef5HXa4IqKizLLwhMmuEeMg=; b=Xl6JCZF+JqTNPfiNryeKOFs+RlBP3HBZh0DdANKl04K0xqhJcaKhr258KahpRblmEM DbbiZW8YEXZPHHgbRACZeVIXGW2603C8BPiGxfQbBQsyipP08ysnMD+pDBJ9VTNKTzzG f4G4M9CDSE+rtb+qMwzmTv7+0Y0zUWVcOKJ11yWN4wSB8aMj2AO1gcnmOswPvuXRwJid /bXyNsdFAsfZDjawDX/zNN6l225vrc72tTqrtYHdJ6AiHwXoGI/pvlu8+VSye8urZdCH ZWILZi3Q+MESEV3Q0bXDtNG5fDp8LjhwkBQOMLdkjnFZP3lr5O3MoCdG9AZiVgp6fykn komw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfohHNUBsduzxyyiUkVaSa1QbGlBHtJ23xJsOJFm5/fHJ1zM14FKK /TbtMOpbUs7cMpSRovlTdxxBh7JAoujQwhy8l4v2mOQsGnXKCFhKSLyZcFKqW9a6MUekHOcE+uf naVEwQwXfA8ZHEhNghXeY+92xBgMLExYM37EYqnm4Rqc6d9yBZI61JaQGMUSXw/ZpBw== X-Received: by 2002:a17:902:82c5:: with SMTP id u5-v6mr2180221plz.83.1537859665708; Tue, 25 Sep 2018 00:14:25 -0700 (PDT) X-Google-Smtp-Source: ACcGV60l94EQHoAXvHOlaiS0SX+uNo+Zx9njRXgez+woQFD48aGk6xBgeNr/HeBnAGEG5+6YSyBy X-Received: by 2002:a17:902:82c5:: with SMTP id u5-v6mr2180165plz.83.1537859665046; Tue, 25 Sep 2018 00:14:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537859665; cv=none; d=google.com; s=arc-20160816; b=QdiHyqUgpSWkr0I/zRm4IFKW9zKVK4p2H686/Q/dZa5TRPVFhdfKWIq+Hp4QT9qsQF yU/AdB/G5pnsq2+GeZYwFqneor9iGrm9BDjnKuf4hXujRH1ZT15/ql6kOJ3lJ8gL2ScX KdgpfHz6hMJOk6HvGd5iC9Ywoe59PSre/xW8a/lyIrStaMdkcF0l5xxcPItUUc+iiUfK O/sS/zlPoEbCCQXAZYVsOkTQ8ZYhSUqreGXLZp1CrvRmJeesJOlhYIrQKBASWH6oMPEK Yf/W9HfWPI1oeiOZgaHvyOoNWJNyu5GcnKr0z8Ijey4Z5YWz1K19SO4TQ2dkqWXmpnEL CGRw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=b7M8l068DUkpjDOr5g3Cef5HXa4IqKizLLwhMmuEeMg=; b=PnvTw8H96GVaug+EvkIrE3WOk/TsO0YgHSDBGj7tvoTDEpKSbbiwBZxUJ1943sXfio Ng9g0QSkdE08/R1qr3aBeuGwHFxrVFD4PSjHu0axiCWU9lTRTZDtjgKoQrfQ4wjXK7Kp d7Emof5+qhY6mvJ6Yk5Yb6+lWg5SZc2Bx4XqNPh9wtgMmvADJHcIs0QxaTD+wtCFyO+J 8a8zXmCJUx9Pt8AO19VyIjhOm8d8hYy3b3CzDNKsFiy5rLeBPy7Hq6t4MJwdyjVK1m+4 GCfSoBNfWj14U72zOa8eE4mKFAuzXcivj/4w6k9dXn2021VonPjalqqYYofNcG2E2P81 51Zw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga14.intel.com (mga14.intel.com. [192.55.52.115]) by mx.google.com with ESMTPS id a1-v6si1378544pgw.9.2018.09.25.00.14.24 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 25 Sep 2018 00:14:25 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) client-ip=192.55.52.115; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 25 Sep 2018 00:14:24 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,301,1534834800"; d="scan'208";a="89093666" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 25 Sep 2018 00:14:21 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 RESEND 17/21] swap: Support PMD swap mapping for MADV_WILLNEED Date: Tue, 25 Sep 2018 15:13:44 +0800 Message-Id: <20180925071348.31458-18-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180925071348.31458-1-ying.huang@intel.com> References: <20180925071348.31458-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP During MADV_WILLNEED, for a PMD swap mapping, if THP swapin is enabled for the VMA, the whole swap cluster will be swapin. Otherwise, the huge swap cluster and the PMD swap mapping will be split and fallback to PTE swap mapping. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/madvise.c | 26 ++++++++++++++++++++++++-- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index 07ef599d4255..608c5ae201c6 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -196,14 +196,36 @@ static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start, pte_t *orig_pte; struct vm_area_struct *vma = walk->private; unsigned long index; + swp_entry_t entry; + struct page *page; + pmd_t pmdval; + + pmdval = *pmd; + if (IS_ENABLED(CONFIG_THP_SWAP) && is_swap_pmd(pmdval) && + !is_pmd_migration_entry(pmdval)) { + entry = pmd_to_swp_entry(pmdval); + if (!transparent_hugepage_swapin_enabled(vma)) { + if (!split_swap_cluster(entry, 0)) + split_huge_swap_pmd(vma, pmd, start, pmdval); + } else { + page = read_swap_cache_async(entry, + GFP_HIGHUSER_MOVABLE, + vma, start, false); + if (page) { + /* The swap cluster has been split under us */ + if (!PageTransHuge(page)) + split_huge_swap_pmd(vma, pmd, start, + pmdval); + put_page(page); + } + } + } if (pmd_none_or_trans_huge_or_clear_bad(pmd)) return 0; for (index = start; index != end; index += PAGE_SIZE) { pte_t pte; - swp_entry_t entry; - struct page *page; spinlock_t *ptl; orig_pte = pte_offset_map_lock(vma->vm_mm, pmd, start, &ptl); From patchwork Tue Sep 25 07:13:45 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10613535 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4B1C7161F for ; Tue, 25 Sep 2018 07:14:46 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3A58029B12 for ; Tue, 25 Sep 2018 07:14:46 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2E27A29B17; Tue, 25 Sep 2018 07:14:46 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9155C29B12 for ; Tue, 25 Sep 2018 07:14:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 759908E007B; Tue, 25 Sep 2018 03:14:31 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 6E1C68E0072; Tue, 25 Sep 2018 03:14:31 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4BBEF8E007B; Tue, 25 Sep 2018 03:14:31 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by kanga.kvack.org (Postfix) with ESMTP id 08EEA8E0072 for ; Tue, 25 Sep 2018 03:14:31 -0400 (EDT) Received: by mail-pf1-f200.google.com with SMTP id u13-v6so11750977pfm.8 for ; Tue, 25 Sep 2018 00:14:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=Z6mo2vxKIVbzzmAj4hoPlPERZonwoLdPejVwkVbv0hY=; b=iWYdsQUvICaqpfzBFjbKj7ra3KKq+/D7Sqd9Awc0AH+X+SqJVGaPBtSVfwzGy94LLP Y2pkMogitm+myTANnH5rRycTBCW8uyvHu9qlwvcOWVSmK52AVEO9F2yTajtWM+xJOWSP Ex/PqezU0WaB1V1MAML/wDyOQt1WWVvP9VrwN1EtqAIMUFsK1tEEfF2gX2EIp4nEaq8Y 5eqr3yo0hDMhBH1nDSKSz/Mj2LLk+turqX8Euitg8A3AjKQqJxoBvIxyoJQGS3yybw3N HosRGFJTrUgqM9N7dvqwtbk0eQ3IwMrjVOhxJ1hg8PIBrfu4Q3HyWPK8m5Yxt0r1WCiE pJRQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfoioTrOty8262E4xn9INzTLHNQpXjTfETAyv57wS6JxLVvb880nw A8GeVnZlAIQS0QfhA84IvhDaO2/WbGM4/tmgw4cZJGAeR/fy8k3rM7ji200cmaQQsof6Ddb679h PwWW65MPZGM89OpljQulp6VAaGqZfLOTyLcxwliEbhEjo6wJ5pto/nTN4Op7j+6kv3g== X-Received: by 2002:a17:902:162:: with SMTP id 89-v6mr2156848plb.91.1537859670705; Tue, 25 Sep 2018 00:14:30 -0700 (PDT) X-Google-Smtp-Source: ACcGV60n2HgG/MjwWH7ghC1y6Qd8dlhkEPeWlSykUVBw9pBfZFX/jdlruRFvt20GW45dJYo2CWN9 X-Received: by 2002:a17:902:162:: with SMTP id 89-v6mr2156705plb.91.1537859668678; Tue, 25 Sep 2018 00:14:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537859668; cv=none; d=google.com; s=arc-20160816; b=PV40tVUvc7IzZ1q+k1FBXZQM0lXHVo2LYZb4IDeCwFrJl6I5UZHSqOT0pGjeym62Eq Dv7h7VmORdenp2hyk8pk/CTAk0bWcJZ/mzl0FKooFogs9zIomtd4aysmM/JjJb4f2avU Rd7wSmuuYdF79G7Wun7+O5JYsmVGlx12Ifd24EBYiv+1YDPpBnfFlByzS7DC6+oHCKhb FY46vkjnlsc/tdpaSfbdtT86wmVuFP37an+FVqvV7PL1Hs8oxPPt0JCPk8gPBUeGxThV cI8rKwbO+y1UL/PtPVsWdGLKAVgAg3o8JIdP7xX8tWfIC+iHMgbYL6hfRHMqwIu6+kEg s43g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=Z6mo2vxKIVbzzmAj4hoPlPERZonwoLdPejVwkVbv0hY=; b=wnc+xjA7hAqOqW4wHHKXDxJSDlMuPN02R7oZ0+xLLTCE2nCBMLwJPsaojFEpbB++fH 1ldSDrl8VdW2ZxGW6q/0UsmlNZjee3dIT2T6X4uq0FyqortJw/hIoU/dieXhcWakSp5v 3VftqnN8WoLFk5hqh6LTEwT7VYmvjE0TTsbPLRxdzvFJ9TCfX/bDFXYVzlaKu/0oJPYA 9NnVMBz1HXixrzZd9zjTPoodVcXyP2tr/aZU3R6cQzG9PMNFrox5D5ZvEZW8pGtFrYd+ I98kyrDsMItAm/GGG2r0SxtGJG9SCA8Y8U6HW8rW+E29X4SkOU4IlsJVNhZTiUBMI5SN +z8w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga14.intel.com (mga14.intel.com. [192.55.52.115]) by mx.google.com with ESMTPS id a1-v6si1378544pgw.9.2018.09.25.00.14.28 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 25 Sep 2018 00:14:28 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) client-ip=192.55.52.115; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 25 Sep 2018 00:14:28 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,301,1534834800"; d="scan'208";a="89093673" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 25 Sep 2018 00:14:24 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 RESEND 18/21] swap: Support PMD swap mapping in mincore() Date: Tue, 25 Sep 2018 15:13:45 +0800 Message-Id: <20180925071348.31458-19-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180925071348.31458-1-ying.huang@intel.com> References: <20180925071348.31458-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP During mincore(), for PMD swap mapping, swap cache will be looked up. If the resulting page isn't compound page, the PMD swap mapping will be split and fallback to PTE swap mapping processing. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/mincore.c | 37 +++++++++++++++++++++++++++++++------ 1 file changed, 31 insertions(+), 6 deletions(-) diff --git a/mm/mincore.c b/mm/mincore.c index a66f2052c7b1..a2a66c3c8c6a 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -48,7 +48,8 @@ static int mincore_hugetlb(pte_t *pte, unsigned long hmask, unsigned long addr, * and is up to date; i.e. that no page-in operation would be required * at this time if an application were to map and access this page. */ -static unsigned char mincore_page(struct address_space *mapping, pgoff_t pgoff) +static unsigned char mincore_page(struct address_space *mapping, pgoff_t pgoff, + bool *compound) { unsigned char present = 0; struct page *page; @@ -86,6 +87,8 @@ static unsigned char mincore_page(struct address_space *mapping, pgoff_t pgoff) #endif if (page) { present = PageUptodate(page); + if (compound) + *compound = PageCompound(page); put_page(page); } @@ -103,7 +106,8 @@ static int __mincore_unmapped_range(unsigned long addr, unsigned long end, pgoff = linear_page_index(vma, addr); for (i = 0; i < nr; i++, pgoff++) - vec[i] = mincore_page(vma->vm_file->f_mapping, pgoff); + vec[i] = mincore_page(vma->vm_file->f_mapping, + pgoff, NULL); } else { for (i = 0; i < nr; i++) vec[i] = 0; @@ -127,14 +131,36 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, pte_t *ptep; unsigned char *vec = walk->private; int nr = (end - addr) >> PAGE_SHIFT; + swp_entry_t entry; ptl = pmd_trans_huge_lock(pmd, vma); if (ptl) { - memset(vec, 1, nr); + unsigned char val = 1; + bool compound; + + if (IS_ENABLED(CONFIG_THP_SWAP) && is_swap_pmd(*pmd)) { + entry = pmd_to_swp_entry(*pmd); + if (!non_swap_entry(entry)) { + val = mincore_page(swap_address_space(entry), + swp_offset(entry), + &compound); + /* + * The huge swap cluster has been + * split under us + */ + if (!compound) { + __split_huge_swap_pmd(vma, addr, pmd); + spin_unlock(ptl); + goto fallback; + } + } + } + memset(vec, val, nr); spin_unlock(ptl); goto out; } +fallback: if (pmd_trans_unstable(pmd)) { __mincore_unmapped_range(addr, end, vma, vec); goto out; @@ -150,8 +176,7 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, else if (pte_present(pte)) *vec = 1; else { /* pte is a swap entry */ - swp_entry_t entry = pte_to_swp_entry(pte); - + entry = pte_to_swp_entry(pte); if (non_swap_entry(entry)) { /* * migration or hwpoison entries are always @@ -161,7 +186,7 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, } else { #ifdef CONFIG_SWAP *vec = mincore_page(swap_address_space(entry), - swp_offset(entry)); + swp_offset(entry), NULL); #else WARN_ON(1); *vec = 1; From patchwork Tue Sep 25 07:13:46 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10613537 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A663E13A4 for ; Tue, 25 Sep 2018 07:14:50 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 923F529B13 for ; Tue, 25 Sep 2018 07:14:50 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 84A3D29B18; Tue, 25 Sep 2018 07:14:50 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D354929B13 for ; Tue, 25 Sep 2018 07:14:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C83C98E007C; Tue, 25 Sep 2018 03:14:33 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id BE5918E0072; Tue, 25 Sep 2018 03:14:33 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9EB6A8E007C; Tue, 25 Sep 2018 03:14:33 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f200.google.com (mail-pg1-f200.google.com [209.85.215.200]) by kanga.kvack.org (Postfix) with ESMTP id 5C7E48E0072 for ; Tue, 25 Sep 2018 03:14:33 -0400 (EDT) Received: by mail-pg1-f200.google.com with SMTP id 132-v6so9233934pga.18 for ; Tue, 25 Sep 2018 00:14:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=V79JmaN0gXriRKVnaPwAPYssNzKYo1Sl2V9/b0OviPM=; b=smqLyI330EVJEjR3oOhXxfyNUPojvXTL1qNuw69wD1NIcOw1y+rxbZjd+rAk2/s0/0 UotBSd/5kF3qd1AgMZu5gpPsXRbZnvF/+iUNNlmf0TPq7stfdFKaQVgNC9jshwwZ4QPt EoXwUZ12ahJnhzALNCPtsAeMMtX12UfC/h9RDXlq0n+AusJn3KX5sVSgarfeZTUlZv1q HQ7EV1HuWjG3JTWfwxmDhyrkpYoYCfcKtkPyoWOhaYy+29Lcvl3gLsN0x1TaeUCz0qTG wObYiwuhySGD7mZPcf9EkCfbgYmpZDWPs8+e9LfQamQ1hZ+jy9+ARdEQ3cdQz7jM9oBL ZAbg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfojpzWNMM0P1THkf5EN+mRC2Ft8xl5pYNBoSVxtTtzOSyquk8LA+ sMonwlpviWGzAvpQYPvS155ubQAg5VTcxExjvnTNdGrHorr2QRQ+IJ5ZFuT06oeUKsEDr8OaP2D thDZcxbUGMyue8/ge8tb27NMHW/RWKd+kX4C0t0cs4ZRDtailstMCfrJYrDKCDfSWGA== X-Received: by 2002:a62:174a:: with SMTP id 71-v6mr2176735pfx.217.1537859673012; Tue, 25 Sep 2018 00:14:33 -0700 (PDT) X-Google-Smtp-Source: ACcGV622+wbDGo9MHUV3Za+MypA40ENvun2EqdptWVgNTlDwcHYTGxMbtPKfsg2KjACrKbqTNugi X-Received: by 2002:a62:174a:: with SMTP id 71-v6mr2176692pfx.217.1537859672125; Tue, 25 Sep 2018 00:14:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537859672; cv=none; d=google.com; s=arc-20160816; b=gsCvq1cRxviDqliOWpL5rS32Y5mCeyNjyRj41Xtu6N/IHoPSRO1CABmkSDnKmer/ey t7OwLy60YKwztaMNCzd976xzucgjFn5K316ZCjiS4XBdzejQH5Qz06cPf76Xiv2+l+/+ CvUJ9UVGIla4mDrCUfbUpCNfQd8eZJz69FGWbAeZ/M3y9S3rUJW7KkiId5G/jsAIB6AA 6VN4lDczAJLXrwlUtPjAeUqWhrQRG+BjAJCfkRlMWrL1TWesD632cbNnDqov0aP9fVvn aUTcKNpDlmc/nlHD20G0KFC3GokClLCEvr5vX52tSMv5qCK7+KOMCNJzI+mk7udfpqQ9 Ojow== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=V79JmaN0gXriRKVnaPwAPYssNzKYo1Sl2V9/b0OviPM=; b=CFwmShHJFtNDsrMGMP7Npzbs8NzhTlnImorcoYUtIvB8+aP1sIzdhBUKvDZClcVph3 +zNP/LLk+B4wrpaWlf+8IlioqUp38k6npQjqkPqGzjl1w3UjgyvkMM4xJDz2KiQBtOrk Dfw93KYZPz5gcPEp+lElm8uJXqu9m7hXfX4E3Z5/eTSO/X1qrMoDMegQ2uHlxusuenNL G6agK+S0xPlNFxnUd2z28cewxSqGljDFMmBeqXs3vhUVq/AUxRzZ4mRYoZl4wLFgN4fG TwDdGFGsHfan8JEMVKHk1GE+HP71LFxpwZMqMc0JWgQG4arD/t8+DY7Vw7oKiw1fIGcj /ouA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga14.intel.com (mga14.intel.com. [192.55.52.115]) by mx.google.com with ESMTPS id a1-v6si1378544pgw.9.2018.09.25.00.14.31 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 25 Sep 2018 00:14:32 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) client-ip=192.55.52.115; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 25 Sep 2018 00:14:31 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,301,1534834800"; d="scan'208";a="89093677" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 25 Sep 2018 00:14:28 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 RESEND 19/21] swap: Support PMD swap mapping in common path Date: Tue, 25 Sep 2018 15:13:46 +0800 Message-Id: <20180925071348.31458-20-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180925071348.31458-1-ying.huang@intel.com> References: <20180925071348.31458-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Original code is only for PMD migration entry, it is revised to support PMD swap mapping. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- fs/proc/task_mmu.c | 12 +++++------- mm/gup.c | 36 ++++++++++++++++++++++++------------ mm/huge_memory.c | 7 ++++--- mm/mempolicy.c | 2 +- 4 files changed, 34 insertions(+), 23 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 5ea1d64cb0b4..2d968523c57b 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -972,7 +972,7 @@ static inline void clear_soft_dirty_pmd(struct vm_area_struct *vma, pmd = pmd_clear_soft_dirty(pmd); set_pmd_at(vma->vm_mm, addr, pmdp, pmd); - } else if (is_migration_entry(pmd_to_swp_entry(pmd))) { + } else if (is_swap_pmd(pmd)) { pmd = pmd_swp_clear_soft_dirty(pmd); set_pmd_at(vma->vm_mm, addr, pmdp, pmd); } @@ -1302,9 +1302,8 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end, if (pm->show_pfn) frame = pmd_pfn(pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT); - } -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION - else if (is_swap_pmd(pmd)) { + } else if (IS_ENABLED(CONFIG_HAVE_PMD_SWAP_ENTRY) && + is_swap_pmd(pmd)) { swp_entry_t entry = pmd_to_swp_entry(pmd); unsigned long offset; @@ -1317,10 +1316,9 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end, flags |= PM_SWAP; if (pmd_swp_soft_dirty(pmd)) flags |= PM_SOFT_DIRTY; - VM_BUG_ON(!is_pmd_migration_entry(pmd)); - page = migration_entry_to_page(entry); + if (is_pmd_migration_entry(pmd)) + page = migration_entry_to_page(entry); } -#endif if (page && page_mapcount(page) == 1) flags |= PM_MMAP_EXCLUSIVE; diff --git a/mm/gup.c b/mm/gup.c index 1abc8b4afff6..b35b7729b1b7 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -216,6 +216,7 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, spinlock_t *ptl; struct page *page; struct mm_struct *mm = vma->vm_mm; + swp_entry_t entry; pmd = pmd_offset(pudp, address); /* @@ -243,18 +244,22 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, if (!pmd_present(pmdval)) { if (likely(!(flags & FOLL_MIGRATION))) return no_page_table(vma, flags); - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(pmdval)); - if (is_pmd_migration_entry(pmdval)) + entry = pmd_to_swp_entry(pmdval); + if (thp_migration_supported() && is_migration_entry(entry)) { pmd_migration_entry_wait(mm, pmd); - pmdval = READ_ONCE(*pmd); - /* - * MADV_DONTNEED may convert the pmd to null because - * mmap_sem is held in read mode - */ - if (pmd_none(pmdval)) + pmdval = READ_ONCE(*pmd); + /* + * MADV_DONTNEED may convert the pmd to null because + * mmap_sem is held in read mode + */ + if (pmd_none(pmdval)) + return no_page_table(vma, flags); + goto retry; + } + if (IS_ENABLED(CONFIG_THP_SWAP) && !non_swap_entry(entry)) return no_page_table(vma, flags); - goto retry; + WARN_ON(1); + return no_page_table(vma, flags); } if (pmd_devmap(pmdval)) { ptl = pmd_lock(mm, pmd); @@ -276,11 +281,18 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, return no_page_table(vma, flags); } if (unlikely(!pmd_present(*pmd))) { + entry = pmd_to_swp_entry(*pmd); spin_unlock(ptl); if (likely(!(flags & FOLL_MIGRATION))) return no_page_table(vma, flags); - pmd_migration_entry_wait(mm, pmd); - goto retry_locked; + if (thp_migration_supported() && is_migration_entry(entry)) { + pmd_migration_entry_wait(mm, pmd); + goto retry_locked; + } + if (IS_ENABLED(CONFIG_THP_SWAP) && !non_swap_entry(entry)) + return no_page_table(vma, flags); + WARN_ON(1); + return no_page_table(vma, flags); } if (unlikely(!pmd_trans_huge(*pmd))) { spin_unlock(ptl); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index accbd54d0ed4..8eb16d34ea44 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2086,7 +2086,7 @@ static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl, static pmd_t move_soft_dirty_pmd(pmd_t pmd) { #ifdef CONFIG_MEM_SOFT_DIRTY - if (unlikely(is_pmd_migration_entry(pmd))) + if (unlikely(is_swap_pmd(pmd))) pmd = pmd_swp_mksoft_dirty(pmd); else if (pmd_present(pmd)) pmd = pmd_mksoft_dirty(pmd); @@ -2172,11 +2172,12 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, preserve_write = prot_numa && pmd_write(*pmd); ret = 1; -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION +#if defined(CONFIG_ARCH_ENABLE_THP_MIGRATION) || defined(CONFIG_THP_SWAP) if (is_swap_pmd(*pmd)) { swp_entry_t entry = pmd_to_swp_entry(*pmd); - VM_BUG_ON(!is_pmd_migration_entry(*pmd)); + VM_BUG_ON(!IS_ENABLED(CONFIG_THP_SWAP) && + !is_migration_entry(entry)); if (is_write_migration_entry(entry)) { pmd_t newpmd; /* diff --git a/mm/mempolicy.c b/mm/mempolicy.c index cfd26d7e61a1..0944ee344658 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -436,7 +436,7 @@ static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr, struct queue_pages *qp = walk->private; unsigned long flags; - if (unlikely(is_pmd_migration_entry(*pmd))) { + if (unlikely(is_swap_pmd(*pmd))) { ret = 1; goto unlock; } From patchwork Tue Sep 25 07:13:47 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10613539 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 74D6613A4 for ; Tue, 25 Sep 2018 07:14:54 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 63DE129B12 for ; Tue, 25 Sep 2018 07:14:54 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 582B329B17; Tue, 25 Sep 2018 07:14:54 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8F96829B12 for ; Tue, 25 Sep 2018 07:14:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7C9CD8E007D; Tue, 25 Sep 2018 03:14:37 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 7065A8E0072; Tue, 25 Sep 2018 03:14:37 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5A6408E007D; Tue, 25 Sep 2018 03:14:37 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f199.google.com (mail-pf1-f199.google.com [209.85.210.199]) by kanga.kvack.org (Postfix) with ESMTP id 0ED958E0072 for ; Tue, 25 Sep 2018 03:14:37 -0400 (EDT) Received: by mail-pf1-f199.google.com with SMTP id i68-v6so11915979pfb.9 for ; Tue, 25 Sep 2018 00:14:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=I6BL57oiWBa7T2ZzN1U+MNdq95+OKFCFzHiPXdLBxmk=; b=PoiO5ka2C4mVYJHZejBOrki5L+05S44EVwLnenfoomr+UiaAxCPXtwXQoCp2u1wL2K /tPsfJK/4/T4RH4adoBW9ZHrFf8NjCnS0yPu1zc/pmogyPcQJuP+SPczxXzilC8MZLLH RuXcDE+QNYGlXO+BuzdS5gonXKP94i8G5wQyvu4Gsv4OjI9wc5/1AGPQbgvNIcjeZd1s LeqfvIfuTW3XT7I33CJLVLzGUfs+VXnUt8/6TEJpwzdL3MSHTdQ7wid/ftARgSpI/7rv WRImHOc88JImOFPnTjBiMgiYLfCxgKuqjXFQrkpUUHKssfxSGyYaNS+lLX1G6jesw7wa X01w== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfoh63EwcopfvDeerTuU3NjYftXj6k3xMVx1wTrgYm7Lin2bMbuSq zPXAHw4rlEFHl4bt4+xdxDimtxAWP6jYEk6SpAUJAoLuXHdR8BnDoXRTFaplI8M0+lUu7CKi6xB B1+r7mTeoyCtuxsBr50spR2hACm68prr1AdfuUvNvU27hSm79akGJxmgjf9jWdvFhHw== X-Received: by 2002:a63:f252:: with SMTP id d18-v6mr2043330pgk.2.1537859676727; Tue, 25 Sep 2018 00:14:36 -0700 (PDT) X-Google-Smtp-Source: ACcGV63PWv6gLtPIV1/Yn6rAWH+uuWetlvtCdwrisda3WSIY0+VfnWqOlwEeMNvPg85AQ2mOl60W X-Received: by 2002:a63:f252:: with SMTP id d18-v6mr2043265pgk.2.1537859675627; Tue, 25 Sep 2018 00:14:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537859675; cv=none; d=google.com; s=arc-20160816; b=cwbm3N5Y2k5P/t0LNlOlHLv0n3PPF5Zciq0fxZ3DgmZaSAW0itY6fSpUk/4xmTPk8Z hwJ2Z+C/8EjdNPRTeGPUM79aNTz50lJZ3Ish0iiDpbXeKxgNkeD+YuMo1YDq5dC2/zUE 5XSjopGlOMVz9+BlefTX9INsRUcCY3amlW0+t7AEFdL0L+jfPstJCZ2JcY10qjN/jiPU wjPzG31u7Xwcj2EET7WwcaNRDahZJy+HBgVzu3tTKd5eEX1VtgMkPBaimP417iAD8e8w 1tyr+IzV1PInZ93Q7OZRoPi/BWtN22LGXg3QcLzNBV9bXQw5oNwq+w4qBu69T0cMw8Oi d86Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=I6BL57oiWBa7T2ZzN1U+MNdq95+OKFCFzHiPXdLBxmk=; b=j1hDY/dL5L3vc+2VmthcnIHMTi9HBMk12xYXBXql4dRx0nvVWhGpyk+HVK8k7wc/87 RjbP81wX//H6sIk31IOw71+C8ylGNVBc4i8LEpUXZTl6rAY9hwDW84ZGCHCqdfaZidJe T9EqveD2zmvLpSr+UP3l+9UODSYNJaPKSUx7RoQfZ40no6MUB7hGJpJmPXo+DeS1fVVW 1+g39nudVO2D1mivbfnDiMpcllJNsgRsOIZdFdX+8ZhXXiKz14QTS0dv5PBPfdUT6/oR DfL8Sa9CzhEkFXS0hkC+3dHA+VRvb/rcnZel6A7zDfkEmz7BWfJOxwINe/L0lFX1LWZA jeSw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga14.intel.com (mga14.intel.com. [192.55.52.115]) by mx.google.com with ESMTPS id a1-v6si1378544pgw.9.2018.09.25.00.14.35 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 25 Sep 2018 00:14:35 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) client-ip=192.55.52.115; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 25 Sep 2018 00:14:35 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,301,1534834800"; d="scan'208";a="89093689" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 25 Sep 2018 00:14:31 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 RESEND 20/21] swap: create PMD swap mapping when unmap the THP Date: Tue, 25 Sep 2018 15:13:47 +0800 Message-Id: <20180925071348.31458-21-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180925071348.31458-1-ying.huang@intel.com> References: <20180925071348.31458-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP This is the final step of the THP swapin support. When reclaiming a anonymous THP, after allocating the huge swap cluster and add the THP into swap cache, the PMD page mapping will be changed to the mapping to the swap space. Previously, the PMD page mapping will be split before being changed. In this patch, the unmap code is enhanced not to split the PMD mapping, but create a PMD swap mapping to replace it instead. So later when clear the SWAP_HAS_CACHE flag in the last step of swapout, the huge swap cluster will be kept instead of being split, and when swapin, the huge swap cluster will be read in one piece into a THP. That is, the THP will not be split during swapout/swapin. This can eliminate the overhead of splitting/collapsing, and reduce the page fault count, etc. But more important, the utilization of THP is improved greatly, that is, much more THP will be kept when swapping is used, so that we can take full advantage of THP including its high performance for swapout/swapin. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/huge_mm.h | 11 +++++++++++ mm/huge_memory.c | 30 ++++++++++++++++++++++++++++++ mm/rmap.c | 43 ++++++++++++++++++++++++++++++++++++++++++- mm/vmscan.c | 6 +----- 4 files changed, 84 insertions(+), 6 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 6586c1bfac21..8cbce31bc090 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -405,6 +405,8 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +struct page_vma_mapped_walk; + #ifdef CONFIG_THP_SWAP extern void __split_huge_swap_pmd(struct vm_area_struct *vma, unsigned long haddr, @@ -412,6 +414,8 @@ extern void __split_huge_swap_pmd(struct vm_area_struct *vma, extern int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long address, pmd_t orig_pmd); extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd); +extern bool set_pmd_swap_entry(struct page_vma_mapped_walk *pvmw, + struct page *page, unsigned long address, pmd_t pmdval); static inline bool transparent_hugepage_swapin_enabled( struct vm_area_struct *vma) @@ -453,6 +457,13 @@ static inline int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) return 0; } +static inline bool set_pmd_swap_entry(struct page_vma_mapped_walk *pvmw, + struct page *page, unsigned long address, + pmd_t pmdval) +{ + return false; +} + static inline bool transparent_hugepage_swapin_enabled( struct vm_area_struct *vma) { diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 8eb16d34ea44..2d263771b614 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1884,6 +1884,36 @@ int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) count_vm_event(THP_SWPIN_FALLBACK); goto fallback; } + +bool set_pmd_swap_entry(struct page_vma_mapped_walk *pvmw, struct page *page, + unsigned long address, pmd_t pmdval) +{ + struct vm_area_struct *vma = pvmw->vma; + struct mm_struct *mm = vma->vm_mm; + pmd_t swp_pmd; + swp_entry_t entry = { .val = page_private(page) }; + + if (swap_duplicate(&entry, HPAGE_PMD_NR) < 0) { + set_pmd_at(mm, address, pvmw->pmd, pmdval); + return false; + } + if (list_empty(&mm->mmlist)) { + spin_lock(&mmlist_lock); + if (list_empty(&mm->mmlist)) + list_add(&mm->mmlist, &init_mm.mmlist); + spin_unlock(&mmlist_lock); + } + add_mm_counter(mm, MM_ANONPAGES, -HPAGE_PMD_NR); + add_mm_counter(mm, MM_SWAPENTS, HPAGE_PMD_NR); + swp_pmd = swp_entry_to_pmd(entry); + if (pmd_soft_dirty(pmdval)) + swp_pmd = pmd_swp_mksoft_dirty(swp_pmd); + set_pmd_at(mm, address, pvmw->pmd, swp_pmd); + + page_remove_rmap(page, true); + put_page(page); + return true; +} #endif static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd) diff --git a/mm/rmap.c b/mm/rmap.c index 3bb4be720bc0..a180cb1fe2db 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1413,11 +1413,52 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, continue; } + address = pvmw.address; + +#ifdef CONFIG_THP_SWAP + /* PMD-mapped THP swap entry */ + if (IS_ENABLED(CONFIG_THP_SWAP) && + !pvmw.pte && PageAnon(page)) { + pmd_t pmdval; + + VM_BUG_ON_PAGE(PageHuge(page) || + !PageTransCompound(page), page); + + flush_cache_range(vma, address, + address + HPAGE_PMD_SIZE); + mmu_notifier_invalidate_range_start(mm, address, + address + HPAGE_PMD_SIZE); + if (should_defer_flush(mm, flags)) { + /* check comments for PTE below */ + pmdval = pmdp_huge_get_and_clear(mm, address, + pvmw.pmd); + set_tlb_ubc_flush_pending(mm, + pmd_dirty(pmdval)); + } else + pmdval = pmdp_huge_clear_flush(vma, address, + pvmw.pmd); + + /* + * Move the dirty bit to the page. Now the pmd + * is gone. + */ + if (pmd_dirty(pmdval)) + set_page_dirty(page); + + /* Update high watermark before we lower rss */ + update_hiwater_rss(mm); + + ret = set_pmd_swap_entry(&pvmw, page, address, pmdval); + mmu_notifier_invalidate_range_end(mm, address, + address + HPAGE_PMD_SIZE); + continue; + } +#endif + /* Unexpected PMD-mapped THP? */ VM_BUG_ON_PAGE(!pvmw.pte, page); subpage = page - page_to_pfn(page) + pte_pfn(*pvmw.pte); - address = pvmw.address; if (PageHuge(page)) { if (huge_pmd_unshare(mm, &address, pvmw.pte)) { diff --git a/mm/vmscan.c b/mm/vmscan.c index 0b63d9a2dc17..d93fc19b6d93 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1319,11 +1319,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, * processes. Try to unmap it here. */ if (page_mapped(page)) { - enum ttu_flags flags = ttu_flags | TTU_BATCH_FLUSH; - - if (unlikely(PageTransHuge(page))) - flags |= TTU_SPLIT_HUGE_PMD; - if (!try_to_unmap(page, flags)) { + if (!try_to_unmap(page, ttu_flags | TTU_BATCH_FLUSH)) { nr_unmap_fail++; goto activate_locked; } From patchwork Tue Sep 25 07:13:48 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10613541 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 75829161F for ; Tue, 25 Sep 2018 07:14:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 635EB29B12 for ; Tue, 25 Sep 2018 07:14:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5766D29B17; Tue, 25 Sep 2018 07:14:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2042E29B12 for ; Tue, 25 Sep 2018 07:14:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 059578E007E; Tue, 25 Sep 2018 03:14:41 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id EA47C8E0072; Tue, 25 Sep 2018 03:14:40 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D21C18E007E; Tue, 25 Sep 2018 03:14:40 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f200.google.com (mail-pg1-f200.google.com [209.85.215.200]) by kanga.kvack.org (Postfix) with ESMTP id 8F9C28E0072 for ; Tue, 25 Sep 2018 03:14:40 -0400 (EDT) Received: by mail-pg1-f200.google.com with SMTP id e6-v6so1875994pge.5 for ; Tue, 25 Sep 2018 00:14:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=t/7B19y7NrUTfBsS5SpSveaDHzpMmlS58J4gLoX4p24=; b=KJshFAeu2lW1U2HkEeh2Hp71bjQbxzw6czehyvtzCvWm11sNU66wdNQwoCEZ+ECWOU sSVGWLWIa+69dpupKJWx2bydNf08r879cXCwuK/0WoEf/sV9WzmyS56vNr0jnliNNKyZ cd5HgatFIsvd00pdIi9AUPCchYsPWYy1I2Rr7OEYpDW6H9KOZcEAE5uqP1dYGQSef5+r 2tBxozqCd4SB0LOXI4msUdShGt2Uj4L8BBi7flIP5XbJXHwUC3peF2vkOrBW18AuXL5u wJt3Nw6FaD9IXuAIWqT2Pp54rAekTU0i19g7IOWDB9EmDwKhg8Hkd8H2NHNo/685FJXf Xbmw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfogQY3Z2NpM+oXpdq2yVuk9OMs4tsOJMMdFIj2cRykwXaM6+tXa2 ngdSaQsXLwGrucUBE71PRrOcpnZN6mB60HcPkRLK28rhUJSTzcEKgFPKcB1lZn+1VdPONc4hV5q YYsWHrcsKv2aGYFNby5AtPYylqBQaU+T5zMaIyAGZvCPPcqL6twOpYobM6nQ3pml99Q== X-Received: by 2002:a63:ea43:: with SMTP id l3-v6mr2004678pgk.427.1537859680263; Tue, 25 Sep 2018 00:14:40 -0700 (PDT) X-Google-Smtp-Source: ACcGV63pBDayyzBMvkf24EPhSu5EprgmO++6xjNmmCIr00LfxJnzYm/8+1Izen4Yyxy3UN9J504f X-Received: by 2002:a63:ea43:: with SMTP id l3-v6mr2004626pgk.427.1537859679249; Tue, 25 Sep 2018 00:14:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537859679; cv=none; d=google.com; s=arc-20160816; b=bBCXtlZ7KviLfuP7lHzIUSTZ3kIBnOy0Z3nu+z7FmaDdoaF+UQzR9SegXw6LYYDzUw WD+tqFcmpqFOOiy5FmjL3hODruyK47L5nvfjBA6uVAYcqiTpoEJ0u8IpU5Gsr+pbTfN5 lIRbf8isGeddUdx4GBBTeTMvPGVH/QkYRp/67qKa63bODbHSwsDFdsoDZ87OR2KtlxPa NfVBLpfJJu6XldSR5rB2Swd+lLwQwerWxwBJXc+RO841VayIJpNpjkH7KHJ3FHoJRFLY RXKdmslYSDcg5Un597smGs857TkmCUoUEqbpPdNYE7+xFj3Ja59riLWJ6PW3FQlY2xM8 2saQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=t/7B19y7NrUTfBsS5SpSveaDHzpMmlS58J4gLoX4p24=; b=yj3MdvXkTp8WNwKhAOuRTTXN26HhElWifXtN4CidCvh1ZchDByCcgONOwdWiH+fuD7 gaWg1Q2tKdS9qmUKsrtXhixr+l0w0XPMb1R3rUzcfn9aiVR83LJb4OCgL8dm9tgTsoau USRd5gGxErAYSBD6CmZ3uakrIk7oGIh47WKj1AVi4b9FvwcDMSRM7uL3e3niVt/zsMZa 5yXBc6GJ/V0HUR0iO24OnBaXvs3T34/QtlRfFSQzIrYxQcoIkHSX9FJWKQkezjid49lx 73UMd5Cu2kAU1mT2D0O/g3Bf+tjiKR/iWT+8QaN+fuPV12XQRhbzf7Aa585v2WL7E//G SPzg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga04.intel.com (mga04.intel.com. [192.55.52.120]) by mx.google.com with ESMTPS id p3-v6si1661439pld.329.2018.09.25.00.14.39 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 25 Sep 2018 00:14:39 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.120 as permitted sender) client-ip=192.55.52.120; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 25 Sep 2018 00:14:40 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,301,1534834800"; d="scan'208";a="89093702" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 25 Sep 2018 00:14:35 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan , Dan Williams Subject: [PATCH -V5 RESEND 21/21] swap: Update help of CONFIG_THP_SWAP Date: Tue, 25 Sep 2018 15:13:48 +0800 Message-Id: <20180925071348.31458-22-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180925071348.31458-1-ying.huang@intel.com> References: <20180925071348.31458-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The help of CONFIG_THP_SWAP is updated to reflect the latest progress of THP (Tranparent Huge Page) swap optimization. Signed-off-by: "Huang, Ying" Reviewed-by: Dan Williams Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/Kconfig | 2 -- 1 file changed, 2 deletions(-) diff --git a/mm/Kconfig b/mm/Kconfig index b7f7fb145d0f..061d4e824506 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -419,8 +419,6 @@ config THP_SWAP depends on TRANSPARENT_HUGEPAGE && ARCH_WANTS_THP_SWAP && SWAP help Swap transparent huge pages in one piece, without splitting. - XXX: For now, swap cluster backing transparent huge page - will be split after swapout. For selection by architectures with reasonable THP sizes.