From patchwork Wed Sep 12 00:43:54 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10596521 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 869CC109C for ; Wed, 12 Sep 2018 00:44:10 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 75E9229AC1 for ; Wed, 12 Sep 2018 00:44:10 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6A4FD29AD2; Wed, 12 Sep 2018 00:44:10 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9939229AC1 for ; Wed, 12 Sep 2018 00:44:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0E57C8E0003; Tue, 11 Sep 2018 20:44:08 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 06EE68E0001; Tue, 11 Sep 2018 20:44:07 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E537F8E0003; Tue, 11 Sep 2018 20:44:07 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f200.google.com (mail-pl1-f200.google.com [209.85.214.200]) by kanga.kvack.org (Postfix) with ESMTP id A01788E0001 for ; Tue, 11 Sep 2018 20:44:07 -0400 (EDT) Received: by mail-pl1-f200.google.com with SMTP id bh1-v6so105954plb.15 for ; Tue, 11 Sep 2018 17:44:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=rTxtROfDkWGJkUE1RjNDBHt+BJWzTSNV9bEGFt+NSBE=; b=aEKPAivjF5ursTI2i3554h4C70xXzFw4S04oT6razfCr3HVcB9hU/lmMgeAFbcU5Ut fvXFkijlRrIH9EV2pOsHVgx+93XNmP7eoYXoENhXQByj0YzjnP0wg22mnQlINXhByiJs hfa+3T9zzZMGSsK4EvBSo9d7vVJP5uVphkKtY5u9aJ9TD6ziqTvUJBQubcazK37tcl2c RY1eMbx3yeGccyZuZXs0/rxrsMmq3Bjv1XH4/787heP/CmQeWwf9d7vI0J25XS7qHH6U FYywFE7UtBH+gzYQh0XOwtxU4D09+iQg8wDqA2bEuaNy34dY3tkKI/FybatsXVM2pWHb GVLA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51A7S0R2jhrMKQ1Kdu6j7du8auHieyBxVwl7GTvvLsos7N5HwMyz BkuwBszCnam8BuJGKnDauc3/uZgYuEYtl56Lh0YCM1G5fwlwKI+J4gOLm0fDNj5bE0X+qmTz4F+ JWbUHjQof0SjHpGq3NgEKeob8Kd5VSOzApFgWumk4ui8EE5+L5wuMamVtmMeOXfo+yA== X-Received: by 2002:a63:2354:: with SMTP id u20-v6mr31284758pgm.122.1536713047332; Tue, 11 Sep 2018 17:44:07 -0700 (PDT) X-Google-Smtp-Source: ANB0VdZXm4Xs1nknl9d55YTKqrL/eyPoysPSFMLUQyrwCpwNqYdF4VB2WAGnUC6TLkcPCzHgyDyH X-Received: by 2002:a63:2354:: with SMTP id u20-v6mr31284714pgm.122.1536713046526; Tue, 11 Sep 2018 17:44:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536713046; cv=none; d=google.com; s=arc-20160816; b=zjHy5sGf7EWkTZjIKs0uBzDSWwldRoZ9EwHc0wVY/e1p57dFrtHF4XAvnMfqIDU1je l4u2UvSM4L01Z5dYkSk17kT410I6qErvgS03S0dKMvq8pE8xzVzv4tMM6ILPEkVm8OgB 4FL8p/7hsMs9n5mk/f6i+sz+OBtWrkurbxzLh5/InhQw1dSavNHRB1jLnzVz+6EBWqX8 W3DRMEtB5qY8jpp/mahz9MO4u6giRTgOIV/aizd09zv3EFS1/FJpkkPvmR4vdVBmvdVk GD5KRO1+2WsxWz++xVB9EAAPb62HQYDCm9J1MDQckLNUI7/wvzTpBC1117NuDmLK/dfb K5gQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=rTxtROfDkWGJkUE1RjNDBHt+BJWzTSNV9bEGFt+NSBE=; b=l/q9dODdBmrzyfkA7NTbnfZq7UHU5OZ1OncQgdticmUTIUzMqmUp5OYLk37Xh/LlOP pUnb5aZfR92GWS0WfTkR1pc0K6USpCxOucvbSvohJ9s95PICer1FQBQG6ljS9OQUiMPq jF9fh4WgCwAiFJ7UqeQIfwip44/8DkN1mRQnvDyd/hOuLa9nxP/LLuF7TRy+WFIS77Q5 CV7fU+B4SFeatZUQFjAs7CC9HEXNTCekGE/sFo5m4RTVN40as1+XuFCPOZV0Sbd2p6QW i52PH30B3aZRG4L+/nbCLKqsbaHKnHDV0nGeujrYxjxN9jT6gQEG5MCV0chAKCXbMkRt iPnA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga09.intel.com (mga09.intel.com. [134.134.136.24]) by mx.google.com with ESMTPS id p21-v6si20648717plq.338.2018.09.11.17.44.06 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 11 Sep 2018 17:44:06 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) client-ip=134.134.136.24; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Sep 2018 17:44:06 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,362,1531810800"; d="scan'208";a="69283795" Received: from unknown (HELO yhuang-mobile.sh.intel.com) ([10.239.198.87]) by fmsmga007.fm.intel.com with ESMTP; 11 Sep 2018 17:44:01 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 RESEND 01/21] swap: Enable PMD swap operations for CONFIG_THP_SWAP Date: Wed, 12 Sep 2018 08:43:54 +0800 Message-Id: <20180912004414.22583-2-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180912004414.22583-1-ying.huang@intel.com> References: <20180912004414.22583-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Currently, "the swap entry" in the page tables is used for a number of things outside of actual swap, like page migration, etc. We support the THP/PMD "swap entry" for page migration currently and the functions behind this are tied to page migration's config option (CONFIG_ARCH_ENABLE_THP_MIGRATION). But, we also need them for THP swap optimization. So a new config option (CONFIG_HAVE_PMD_SWAP_ENTRY) is added. It is enabled when either CONFIG_ARCH_ENABLE_THP_MIGRATION or CONFIG_THP_SWAP is enabled. And PMD swap entry functions are tied to this new config option instead. Some functions enabled by CONFIG_ARCH_ENABLE_THP_MIGRATION are for page migration only, they are still enabled only for that. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- arch/x86/include/asm/pgtable.h | 2 +- include/asm-generic/pgtable.h | 2 +- include/linux/swapops.h | 44 ++++++++++++++++++++++-------------------- mm/Kconfig | 8 ++++++++ 4 files changed, 33 insertions(+), 23 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index e4ffa565a69f..194f97dc4583 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1334,7 +1334,7 @@ static inline pte_t pte_swp_clear_soft_dirty(pte_t pte) return pte_clear_flags(pte, _PAGE_SWP_SOFT_DIRTY); } -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION +#ifdef CONFIG_HAVE_PMD_SWAP_ENTRY static inline pmd_t pmd_swp_mksoft_dirty(pmd_t pmd) { return pmd_set_flags(pmd, _PAGE_SWP_SOFT_DIRTY); diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index 5657a20e0c59..eb1e9d17371b 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -675,7 +675,7 @@ static inline void ptep_modify_prot_commit(struct mm_struct *mm, #endif #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY -#ifndef CONFIG_ARCH_ENABLE_THP_MIGRATION +#ifndef CONFIG_HAVE_PMD_SWAP_ENTRY static inline pmd_t pmd_swp_mksoft_dirty(pmd_t pmd) { return pmd; diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 22af9d8a84ae..79ccbf8789d5 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -259,17 +259,7 @@ static inline int is_write_migration_entry(swp_entry_t entry) #endif -struct page_vma_mapped_walk; - -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION -extern void set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, - struct page *page); - -extern void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, - struct page *new); - -extern void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd); - +#ifdef CONFIG_HAVE_PMD_SWAP_ENTRY static inline swp_entry_t pmd_to_swp_entry(pmd_t pmd) { swp_entry_t arch_entry; @@ -287,6 +277,28 @@ static inline pmd_t swp_entry_to_pmd(swp_entry_t entry) arch_entry = __swp_entry(swp_type(entry), swp_offset(entry)); return __swp_entry_to_pmd(arch_entry); } +#else +static inline swp_entry_t pmd_to_swp_entry(pmd_t pmd) +{ + return swp_entry(0, 0); +} + +static inline pmd_t swp_entry_to_pmd(swp_entry_t entry) +{ + return __pmd(0); +} +#endif + +struct page_vma_mapped_walk; + +#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION +extern void set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, + struct page *page); + +extern void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, + struct page *new); + +extern void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd); static inline int is_pmd_migration_entry(pmd_t pmd) { @@ -307,16 +319,6 @@ static inline void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, static inline void pmd_migration_entry_wait(struct mm_struct *m, pmd_t *p) { } -static inline swp_entry_t pmd_to_swp_entry(pmd_t pmd) -{ - return swp_entry(0, 0); -} - -static inline pmd_t swp_entry_to_pmd(swp_entry_t entry) -{ - return __pmd(0); -} - static inline int is_pmd_migration_entry(pmd_t pmd) { return 0; diff --git a/mm/Kconfig b/mm/Kconfig index 7bf074bf79e5..9a6e7e27e8d5 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -430,6 +430,14 @@ config THP_SWAP For selection by architectures with reasonable THP sizes. +# +# "PMD swap entry" in the page table is used both for migration and +# actual swap. +# +config HAVE_PMD_SWAP_ENTRY + def_bool y + depends on THP_SWAP || ARCH_ENABLE_THP_MIGRATION + config TRANSPARENT_HUGE_PAGECACHE def_bool y depends on TRANSPARENT_HUGEPAGE From patchwork Wed Sep 12 00:43:55 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10596523 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D4F926CB for ; Wed, 12 Sep 2018 00:44:14 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C22BC29AC1 for ; Wed, 12 Sep 2018 00:44:14 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B1C3C29AD2; Wed, 12 Sep 2018 00:44:14 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2057529AC1 for ; Wed, 12 Sep 2018 00:44:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0BEAC8E0004; Tue, 11 Sep 2018 20:44:13 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 046298E0001; Tue, 11 Sep 2018 20:44:12 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E2A5B8E0004; Tue, 11 Sep 2018 20:44:12 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f200.google.com (mail-pl1-f200.google.com [209.85.214.200]) by kanga.kvack.org (Postfix) with ESMTP id A0CB18E0001 for ; Tue, 11 Sep 2018 20:44:12 -0400 (EDT) Received: by mail-pl1-f200.google.com with SMTP id n4-v6so114268plk.7 for ; Tue, 11 Sep 2018 17:44:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=PlULZszEr/zinic5XxLsYn/qt2WZkbM5AQ0JfYWFyu4=; b=W/gTwm7OwpPY3AH+zFeDViMulQWzHF5Tqr1GMvk6HV8AFtoFFPVzuHAHY/NgXr0TFl vxV8DMvAZRtHa2Vzyp73McRT3oXbWbHSjBwFMT4MelWNJ57clyG2FaYzyYOqlnVb0UsO +h5aLK7HR8kEZC7yZqhDJSyizmfdyEBlM3gtdcX3ufp1QVxXSP9KdfxafQGKlYtaVZaK wzkIWvaa8VYR00jx6XiRvg3wYIL8/2ubbgA9FJt9QqHYbAiqAXGVfULT4sUtrnjf67lo qnGiVPPzL0jNL3frWbVaHRXYu9aYV8iyeh/5c1+EJUvT2P3KzlhBw7/oo/7PmiKOPUTx 8+gw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51A2+razdpXDi/YZLsBGjv1MIYHfXJ3O7813Kk33DfjC7hkibVTf s4oS2/tSZHYGX0BF2H5KuPtQygMolgVVWzy8MKYFH9UuqoyEAAHy30KANAOs18sM22NDVUNyda3 LMueHWs9fNF+Xlc4EWYk3RvRUgCHyrzweqDPtXEjAtZFNoqcDLqlPjnf4LTKMZ8/n8w== X-Received: by 2002:a63:4c54:: with SMTP id m20-v6mr30919661pgl.292.1536713052334; Tue, 11 Sep 2018 17:44:12 -0700 (PDT) X-Google-Smtp-Source: ANB0Vdb85777eU3uIDFZjMPuy9+hap/e9fYqxDf7OagYO4FuJwCnKbOMpf9WQEbVPhXdNRQQTHaK X-Received: by 2002:a63:4c54:: with SMTP id m20-v6mr30919625pgl.292.1536713051605; Tue, 11 Sep 2018 17:44:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536713051; cv=none; d=google.com; s=arc-20160816; b=b19/3gQZPh5t4uMEi9O02kOQxK0UXbHi16dq/ts/OiyGQB8He6Sv1AEYCyh+v4wS3i Z3G0VJ1fkyXuOKvO/TL8eHMQK0GhyDsyezqzCJGjggn3HOpPkbSNZQcOs4oRymKL/pSV sZLvPdwUdxGmkdMaJyhETcmUs9E99X4HwLEPLNrKKI+3weI/0ubygMO+UkOJtZgV3roG U/qOEf5RPQ1IMUu+y+wiW5RxnT7b4mtUxwXMX69dzB7+RnVzjZtYyJk62WnWNZOWZeuI 1ErKBsai20A4BARkB8Px6bxTLe0tnW7hdRDUy3OLHkja4EMm3VXH208lDM5IWo3ptTrG 1sPA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=PlULZszEr/zinic5XxLsYn/qt2WZkbM5AQ0JfYWFyu4=; b=0ynb+y0ZIvKnzVLcx5VqS33R59y/sG+zS8uu093yYUgbV0q7lpig3V0wDjFqOYFaMx ruAWyE/Nsex+AUUPgQS/c+f0PRdbesQaLvkE9g3koDv3wnnBvyxMVTqJTj0SwKJua4kx wr0vhCVoB1ISY2qipYP15eg2SkV0x/raOcJ47l/ZnaRaTka/DOnX5cGEoRKDvV2G9qNj DfQ8SuA2hGY5PsHmfBPSaYCHaN+QQglQy7sxQ41Rp3Z+CrmSsxfOIi4tXKmicMv6t4ik D+CEi4RrMafnWt4+lp+mLXXmF0vAs8ZlNvImuB9mo4oLqyVcTq48GH+YgQVev94a0i1/ jG5A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga09.intel.com (mga09.intel.com. [134.134.136.24]) by mx.google.com with ESMTPS id p21-v6si20648717plq.338.2018.09.11.17.44.11 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 11 Sep 2018 17:44:11 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) client-ip=134.134.136.24; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Sep 2018 17:44:11 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,362,1531810800"; d="scan'208";a="69283805" Received: from unknown (HELO yhuang-mobile.sh.intel.com) ([10.239.198.87]) by fmsmga007.fm.intel.com with ESMTP; 11 Sep 2018 17:44:06 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 RESEND 02/21] swap: Add __swap_duplicate_locked() Date: Wed, 12 Sep 2018 08:43:55 +0800 Message-Id: <20180912004414.22583-3-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180912004414.22583-1-ying.huang@intel.com> References: <20180912004414.22583-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The part of __swap_duplicate() with lock held is separated into a new function __swap_duplicate_locked(). Because we will add more logic about the PMD swap mapping into __swap_duplicate() and keep the most PTE swap mapping related logic in __swap_duplicate_locked(). Just mechanical code refactoring, there is no any functional change in this patch. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/swapfile.c | 63 +++++++++++++++++++++++++++++++++-------------------------- 1 file changed, 35 insertions(+), 28 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index 97a1bd1a7c9a..6a570ef00fa7 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -3436,32 +3436,12 @@ void si_swapinfo(struct sysinfo *val) spin_unlock(&swap_lock); } -/* - * Verify that a swap entry is valid and increment its swap map count. - * - * Returns error code in following case. - * - success -> 0 - * - swp_entry is invalid -> EINVAL - * - swp_entry is migration entry -> EINVAL - * - swap-cache reference is requested but there is already one. -> EEXIST - * - swap-cache reference is requested but the entry is not used. -> ENOENT - * - swap-mapped reference requested but needs continued swap count. -> ENOMEM - */ -static int __swap_duplicate(swp_entry_t entry, unsigned char usage) +static int __swap_duplicate_locked(struct swap_info_struct *p, + unsigned long offset, unsigned char usage) { - struct swap_info_struct *p; - struct swap_cluster_info *ci; - unsigned long offset; unsigned char count; unsigned char has_cache; - int err = -EINVAL; - - p = get_swap_device(entry); - if (!p) - goto out; - - offset = swp_offset(entry); - ci = lock_cluster_or_swap_info(p, offset); + int err = 0; count = p->swap_map[offset]; @@ -3471,12 +3451,11 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage) */ if (unlikely(swap_count(count) == SWAP_MAP_BAD)) { err = -ENOENT; - goto unlock_out; + goto out; } has_cache = count & SWAP_HAS_CACHE; count &= ~SWAP_HAS_CACHE; - err = 0; if (usage == SWAP_HAS_CACHE) { @@ -3503,11 +3482,39 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage) p->swap_map[offset] = count | has_cache; -unlock_out: +out: + return err; +} + +/* + * Verify that a swap entry is valid and increment its swap map count. + * + * Returns error code in following case. + * - success -> 0 + * - swp_entry is invalid -> EINVAL + * - swp_entry is migration entry -> EINVAL + * - swap-cache reference is requested but there is already one. -> EEXIST + * - swap-cache reference is requested but the entry is not used. -> ENOENT + * - swap-mapped reference requested but needs continued swap count. -> ENOMEM + */ +static int __swap_duplicate(swp_entry_t entry, unsigned char usage) +{ + struct swap_info_struct *p; + struct swap_cluster_info *ci; + unsigned long offset; + int err = -EINVAL; + + p = get_swap_device(entry); + if (!p) + goto out; + + offset = swp_offset(entry); + ci = lock_cluster_or_swap_info(p, offset); + err = __swap_duplicate_locked(p, offset, usage); unlock_cluster_or_swap_info(p, ci); + + put_swap_device(p); out: - if (p) - put_swap_device(p); return err; } From patchwork Wed Sep 12 00:43:56 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10596525 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 755DE6CB for ; Wed, 12 Sep 2018 00:44:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 631C329AC1 for ; Wed, 12 Sep 2018 00:44:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 571B529AD2; Wed, 12 Sep 2018 00:44:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 499B229AC1 for ; Wed, 12 Sep 2018 00:44:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0E4F58E0005; Tue, 11 Sep 2018 20:44:18 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 06E208E0001; Tue, 11 Sep 2018 20:44:17 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E515D8E0005; Tue, 11 Sep 2018 20:44:17 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f198.google.com (mail-pl1-f198.google.com [209.85.214.198]) by kanga.kvack.org (Postfix) with ESMTP id 9532D8E0001 for ; Tue, 11 Sep 2018 20:44:17 -0400 (EDT) Received: by mail-pl1-f198.google.com with SMTP id d40-v6so106758pla.14 for ; Tue, 11 Sep 2018 17:44:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=Pi1i7KMpYThOGOaRvfHROozI/P7F2MnEMKn6KbyoqgY=; b=kE1py+45nMb46onYixIaq12GV0pc68Aqez0I5NBqKE5lVzo6tcC1IjK5b8cUlqFXVr hyQMj9WPvDMVP9lDUAFnuOl5opc1Boz09FRM8w8jC9lgRPXe+FB7BiFj/Djd30yn8HQL tiLKv80rgLT7LlUSQ4PYk/9R4n48680xh8xoJqBRJgOHelrRsNJRrh0X+FocOwq4YgAS nVmEpadVMHYuBp7EqhFx0ynLjjBsfwWHdF6WZCUaA6A71+vfFIiFvJvTuGwO/uSAmxq7 ChWIjpG2frJOVct0n/V5skh4bqcwGtozL4qyD9UfATUgnIsiJjzctLpJAO6iVbgm5ntR pGxw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51DW47+iRUtMy7LAxEpTXJ+iO5ei2hADFTUfwbnxyYV48zbOUzmY 5IZJ2nbJZZuw9TvKv6IWNDTYNa4VOKfaiXZay+2wZeKoavXzhqpsiBO/SibDwnbh4Gee8byZdqG zENy0hK7hazFMSX1DoPdxq7+Sj7h/PhMT9NEKT/+tWF+zJhN0+jVm31ss+cqmqsd9pw== X-Received: by 2002:a62:642:: with SMTP id 63-v6mr32529810pfg.42.1536713057234; Tue, 11 Sep 2018 17:44:17 -0700 (PDT) X-Google-Smtp-Source: ANB0VdYlQcUjdxMQD9DJlG2eYNw0f9l5Xp/JzNcChdUWq6JUzM5ToQzKrchwPzQCqtZAatAKnWt3 X-Received: by 2002:a62:642:: with SMTP id 63-v6mr32529745pfg.42.1536713056023; Tue, 11 Sep 2018 17:44:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536713055; cv=none; d=google.com; s=arc-20160816; b=CgD8oJK7O3BabeHJvbLMzowOlyH46wvCxkTNT+l90HosT6u4GAY/tXHAEHLE/Cb6kJ RiZVrSMhD+gr/y1OIkU2bFkiWg2rirks4TR02IchGuu4uE5c+JdXlkIuuw8aBoDod7Ha g9R9k1jKgC2L2f3XUiXTcQsjSTsGHfw+h15NSvQl64Ft56xGFn72HDX6w9oV67fDSfGL BqT3BQ8Oj1WN1L4s9o2cV+APMf1ZuDxJ7vlpt1vr+l5ckXCBLhDGARrMd54QypC4aSbQ E9b2gbTGUC7vloSI5oJpp5mL7GcXxEjHP2B3Opo8oJPh6Zrs/5pmdvZWKId7xCME7x3f /wkg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=Pi1i7KMpYThOGOaRvfHROozI/P7F2MnEMKn6KbyoqgY=; b=H4CfApeHIV7WNcfwfyx4DtkBqzBxuTD43fSFYBiCZmBvLybYX2I4Qd/ZgpqW6Iixa1 +QLoAyP0CwfRj6CxoWSD24LTURlXKciciZ18DiQxnKZ4vsw9doCoolU4SkSMFn7eWNtF qU1iLd0+cVmfe7jdTu9LncIe29icBJ1P/0vgB/f9/gRmvzcEILQQtFFVKC+e4gHJun/5 Gzgyggfrt4OI8dHBkNnk1r1ld59C5MJT7NCF/3q2cKu3qpbk0841jyK080Ndj8wSFVSc SVZqFr1oloqslb7FZEF57tD1GwqNwXPMebQ3U8DQ7WGHv+9qgRLiAaRdFVR5SFJYaC1w VEyg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga09.intel.com (mga09.intel.com. [134.134.136.24]) by mx.google.com with ESMTPS id p21-v6si20648717plq.338.2018.09.11.17.44.15 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 11 Sep 2018 17:44:15 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) client-ip=134.134.136.24; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Sep 2018 17:44:15 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,362,1531810800"; d="scan'208";a="69283812" Received: from unknown (HELO yhuang-mobile.sh.intel.com) ([10.239.198.87]) by fmsmga007.fm.intel.com with ESMTP; 11 Sep 2018 17:44:11 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 RESEND 03/21] swap: Support PMD swap mapping in swap_duplicate() Date: Wed, 12 Sep 2018 08:43:56 +0800 Message-Id: <20180912004414.22583-4-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180912004414.22583-1-ying.huang@intel.com> References: <20180912004414.22583-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP To support to swapin the THP in one piece, we need to create PMD swap mapping during swapout, and maintain PMD swap mapping count. This patch implements the support to increase the PMD swap mapping count (for swapout, fork, etc.) and set SWAP_HAS_CACHE flag (for swapin, etc.) for a huge swap cluster in swap_duplicate() function family. Although it only implements a part of the design of the swap reference count with PMD swap mapping, the whole design is described as follow to make it easy to understand the patch and the whole picture. A huge swap cluster is used to hold the contents of a swapouted THP. After swapout, a PMD page mapping to the THP will become a PMD swap mapping to the huge swap cluster via a swap entry in PMD. While a PTE page mapping to a subpage of the THP will become the PTE swap mapping to a swap slot in the huge swap cluster via a swap entry in PTE. If there is no PMD swap mapping and the corresponding THP is removed from the page cache (reclaimed), the huge swap cluster will be split and become a normal swap cluster. The count (cluster_count()) of the huge swap cluster is SWAPFILE_CLUSTER (= HPAGE_PMD_NR) + PMD swap mapping count. Because all swap slots in the huge swap cluster are mapped by PTE or PMD, or has SWAP_HAS_CACHE bit set, the usage count of the swap cluster is HPAGE_PMD_NR. And the PMD swap mapping count is recorded too to make it easy to determine whether there are remaining PMD swap mappings. The count in swap_map[offset] is the sum of PTE and PMD swap mapping count. This means when we increase the PMD swap mapping count, we need to increase swap_map[offset] for all swap slots inside the swap cluster. An alternative choice is to make swap_map[offset] to record PTE swap map count only, given we have recorded PMD swap mapping count in the count of the huge swap cluster. But this need to increase swap_map[offset] when splitting the PMD swap mapping, that may fail because of memory allocation for swap count continuation. That is hard to dealt with. So we choose current solution. The PMD swap mapping to a huge swap cluster may be split when unmap a part of PMD mapping etc. That is easy because only the count of the huge swap cluster need to be changed. When the last PMD swap mapping is gone and SWAP_HAS_CACHE is unset, we will split the huge swap cluster (clear the huge flag). This makes it easy to reason the cluster state. A huge swap cluster will be split when splitting the THP in swap cache, or failing to allocate THP during swapin, etc. But when splitting the huge swap cluster, we will not try to split all PMD swap mappings, because we haven't enough information available for that sometimes. Later, when the PMD swap mapping is duplicated or swapin, etc, the PMD swap mapping will be split and fallback to the PTE operation. When a THP is added into swap cache, the SWAP_HAS_CACHE flag will be set in the swap_map[offset] of all swap slots inside the huge swap cluster backing the THP. This huge swap cluster will not be split unless the THP is split even if its PMD swap mapping count dropped to 0. Later, when the THP is removed from swap cache, the SWAP_HAS_CACHE flag will be cleared in the swap_map[offset] of all swap slots inside the huge swap cluster. And this huge swap cluster will be split if its PMD swap mapping count is 0. The first parameter of swap_duplicate() is changed to return the swap entry to call add_swap_count_continuation() for. Because we may need to call it for a swap entry in the middle of a huge swap cluster. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/swap.h | 9 +++-- mm/memory.c | 2 +- mm/rmap.c | 2 +- mm/swap_state.c | 2 +- mm/swapfile.c | 107 ++++++++++++++++++++++++++++++++++++++++++--------- 5 files changed, 97 insertions(+), 25 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index ca7c6307bda7..1bee8b65cb8a 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -451,8 +451,8 @@ extern swp_entry_t get_swap_page_of_type(int); extern int get_swap_pages(int n, swp_entry_t swp_entries[], int entry_size); extern int add_swap_count_continuation(swp_entry_t, gfp_t); extern void swap_shmem_alloc(swp_entry_t); -extern int swap_duplicate(swp_entry_t); -extern int swapcache_prepare(swp_entry_t); +extern int swap_duplicate(swp_entry_t *entry, int entry_size); +extern int swapcache_prepare(swp_entry_t entry, int entry_size); extern void swap_free(swp_entry_t); extern void swapcache_free_entries(swp_entry_t *entries, int n); extern int free_swap_and_cache(swp_entry_t); @@ -510,7 +510,8 @@ static inline void show_swap_cache_info(void) } #define free_swap_and_cache(e) ({(is_migration_entry(e) || is_device_private_entry(e));}) -#define swapcache_prepare(e) ({(is_migration_entry(e) || is_device_private_entry(e));}) +#define swapcache_prepare(e, s) \ + ({(is_migration_entry(e) || is_device_private_entry(e)); }) static inline int add_swap_count_continuation(swp_entry_t swp, gfp_t gfp_mask) { @@ -521,7 +522,7 @@ static inline void swap_shmem_alloc(swp_entry_t swp) { } -static inline int swap_duplicate(swp_entry_t swp) +static inline int swap_duplicate(swp_entry_t *swp, int entry_size) { return 0; } diff --git a/mm/memory.c b/mm/memory.c index f0e0c14d17a4..ba3657a91980 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -956,7 +956,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, swp_entry_t entry = pte_to_swp_entry(pte); if (likely(!non_swap_entry(entry))) { - if (swap_duplicate(entry) < 0) + if (swap_duplicate(&entry, 1) < 0) return entry.val; /* make sure dst_mm is on swapoff's mmlist. */ diff --git a/mm/rmap.c b/mm/rmap.c index 1e79fac3186b..3bb4be720bc0 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1598,7 +1598,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, break; } - if (swap_duplicate(entry) < 0) { + if (swap_duplicate(&entry, 1) < 0) { set_pte_at(mm, address, pvmw.pte, pteval); ret = false; page_vma_mapped_walk_done(&pvmw); diff --git a/mm/swap_state.c b/mm/swap_state.c index dc312559f7df..8b2fd7b97e25 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -433,7 +433,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, /* * Swap entry may have been freed since our caller observed it. */ - err = swapcache_prepare(entry); + err = swapcache_prepare(entry, 1); if (err == -EEXIST) { radix_tree_preload_end(); /* diff --git a/mm/swapfile.c b/mm/swapfile.c index 6a570ef00fa7..138968b79de5 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -534,6 +534,40 @@ static void dec_cluster_info_page(struct swap_info_struct *p, free_cluster(p, idx); } +/* + * When swapout a THP in one piece, PMD page mappings to THP are + * replaced by PMD swap mappings to the corresponding swap cluster. + * cluster_swapcount() returns the PMD swap mapping count. + * + * cluster_count() = PMD swap mapping count + count of allocated swap + * entries in cluster. If a cluster is mapped by PMD, all swap + * entries inside is used, so here cluster_count() = PMD swap mapping + * count + SWAPFILE_CLUSTER. + */ +static inline int cluster_swapcount(struct swap_cluster_info *ci) +{ + VM_BUG_ON(!cluster_is_huge(ci) || cluster_count(ci) < SWAPFILE_CLUSTER); + return cluster_count(ci) - SWAPFILE_CLUSTER; +} + +/* + * Set PMD swap mapping count for the huge cluster + */ +static inline void cluster_set_swapcount(struct swap_cluster_info *ci, + unsigned int count) +{ + VM_BUG_ON(!cluster_is_huge(ci) || cluster_count(ci) < SWAPFILE_CLUSTER); + cluster_set_count(ci, SWAPFILE_CLUSTER + count); +} + +static inline void cluster_add_swapcount(struct swap_cluster_info *ci, int add) +{ + int count = cluster_swapcount(ci) + add; + + VM_BUG_ON(count < 0); + cluster_set_swapcount(ci, count); +} + /* * It's possible scan_swap_map() uses a free cluster in the middle of free * cluster list. Avoiding such abuse to avoid list corruption. @@ -3487,35 +3521,66 @@ static int __swap_duplicate_locked(struct swap_info_struct *p, } /* - * Verify that a swap entry is valid and increment its swap map count. + * Verify that the swap entries from *entry is valid and increment their + * PMD/PTE swap mapping count. * * Returns error code in following case. * - success -> 0 * - swp_entry is invalid -> EINVAL - * - swp_entry is migration entry -> EINVAL * - swap-cache reference is requested but there is already one. -> EEXIST * - swap-cache reference is requested but the entry is not used. -> ENOENT * - swap-mapped reference requested but needs continued swap count. -> ENOMEM + * - the huge swap cluster has been split. -> ENOTDIR */ -static int __swap_duplicate(swp_entry_t entry, unsigned char usage) +static int __swap_duplicate(swp_entry_t *entry, int entry_size, + unsigned char usage) { struct swap_info_struct *p; struct swap_cluster_info *ci; unsigned long offset; int err = -EINVAL; + int i, size = swap_entry_size(entry_size); - p = get_swap_device(entry); + p = get_swap_device(*entry); if (!p) goto out; - offset = swp_offset(entry); + offset = swp_offset(*entry); ci = lock_cluster_or_swap_info(p, offset); - err = __swap_duplicate_locked(p, offset, usage); + if (size == SWAPFILE_CLUSTER) { + /* + * The huge swap cluster has been split, for example, failed to + * allocate huge page during swapin, the caller should split + * the PMD swap mapping and operate on normal swap entries. + */ + if (!cluster_is_huge(ci)) { + err = -ENOTDIR; + goto unlock; + } + VM_BUG_ON(!IS_ALIGNED(offset, size)); + /* If cluster is huge, all swap entries inside is in-use */ + VM_BUG_ON(cluster_count(ci) < SWAPFILE_CLUSTER); + } + /* p->swap_map[] = PMD swap map count + PTE swap map count */ + for (i = 0; i < size; i++) { + err = __swap_duplicate_locked(p, offset + i, usage); + if (err && size != 1) { + *entry = swp_entry(p->type, offset + i); + goto undup; + } + } + if (size == SWAPFILE_CLUSTER && usage == 1) + cluster_add_swapcount(ci, usage); +unlock: unlock_cluster_or_swap_info(p, ci); put_swap_device(p); out: return err; +undup: + for (i--; i >= 0; i--) + __swap_entry_free_locked(p, offset + i, usage); + goto unlock; } /* @@ -3524,36 +3589,42 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage) */ void swap_shmem_alloc(swp_entry_t entry) { - __swap_duplicate(entry, SWAP_MAP_SHMEM); + __swap_duplicate(&entry, 1, SWAP_MAP_SHMEM); } /* * Increase reference count of swap entry by 1. - * Returns 0 for success, or -ENOMEM if a swap_count_continuation is required - * but could not be atomically allocated. Returns 0, just as if it succeeded, - * if __swap_duplicate() fails for another reason (-EINVAL or -ENOENT), which - * might occur if a page table entry has got corrupted. + * + * Return error code in following case. + * - success -> 0 + * - swap_count_continuation is required but could not be atomically allocated. + * *entry is used to return swap entry to call add_swap_count_continuation(). + * -> ENOMEM + * - otherwise same as __swap_duplicate() */ -int swap_duplicate(swp_entry_t entry) +int swap_duplicate(swp_entry_t *entry, int entry_size) { int err = 0; - while (!err && __swap_duplicate(entry, 1) == -ENOMEM) - err = add_swap_count_continuation(entry, GFP_ATOMIC); + while (!err && + (err = __swap_duplicate(entry, entry_size, 1)) == -ENOMEM) + err = add_swap_count_continuation(*entry, GFP_ATOMIC); return err; } /* * @entry: swap entry for which we allocate swap cache. + * @entry_size: size of the swap entry, 1 or SWAPFILE_CLUSTER * * Called when allocating swap cache for existing swap entry, * This can return error codes. Returns 0 at success. - * -EBUSY means there is a swap cache. - * Note: return code is different from swap_duplicate(). + * -EINVAL means the swap device has been swapoff. + * -EEXIST means there is a swap cache. + * Otherwise same as __swap_duplicate() */ -int swapcache_prepare(swp_entry_t entry) +int swapcache_prepare(swp_entry_t entry, int entry_size) { - return __swap_duplicate(entry, SWAP_HAS_CACHE); + return __swap_duplicate(&entry, entry_size, SWAP_HAS_CACHE); } struct swap_info_struct *swp_swap_info(swp_entry_t entry) From patchwork Wed Sep 12 00:43:57 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10596527 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D76F3109C for ; Wed, 12 Sep 2018 00:44:23 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C736B29AC1 for ; Wed, 12 Sep 2018 00:44:23 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BAEC629AD2; Wed, 12 Sep 2018 00:44:23 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2C96429AC1 for ; Wed, 12 Sep 2018 00:44:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 04B218E0006; Tue, 11 Sep 2018 20:44:22 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id F12A08E0001; Tue, 11 Sep 2018 20:44:21 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D8F048E0006; Tue, 11 Sep 2018 20:44:21 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f200.google.com (mail-pl1-f200.google.com [209.85.214.200]) by kanga.kvack.org (Postfix) with ESMTP id 9611B8E0001 for ; Tue, 11 Sep 2018 20:44:21 -0400 (EDT) Received: by mail-pl1-f200.google.com with SMTP id d40-v6so106855pla.14 for ; Tue, 11 Sep 2018 17:44:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=CaCWWMD2A4Bim+4Yv358LYKJOc7V1EosNUqm3zG4Vmk=; b=OaibVg7iwQvtzcYlbshzcBIalOzn7yoO/8cx/qpcUr8vL++FdGV1810x7WocnQ5Yev EoK6mlZLc1m3lTDfGhUS4xxx2rl5iFjIHc3TJ14BBd7G3BrmDYGBRkAzj4+l/KwkqPkP 8dz2tVXynga6iNqodxuVX6HBxAmFB6XflLEjnqOobXufZBHHa2L8lbZQ9pznt7i9V3YJ ltZDoVFxAcTIo+VIBFiMNPyvHrNwEvBZm07q7sdKM2Vij/1BQLaKBWFq52lx57wOw8sM Do5pmYmUdCMxAklTinFVjKyshT3WcZkcdm72/760SxKrsevxzmAsgZONhPeZRzAZsYhb noPA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51BWKCFKM0Th9WtGX+MxdUDEw3iuTnNAhoEpwvVH7vSS9GEz6T7w +YiKww71r5JV0cYf2WwPrcLaNGeEmnIve1VnNbZ8EE38YmAlaANfV3tirjbPchDpgbXmtQR0ywh bs0OcplMvH1vpobbcmeaSxje6C+Ef7UlIwH542nh/KpBM3GTdYyo0A5hsoZg9Py2mwA== X-Received: by 2002:a63:1125:: with SMTP id g37-v6mr30871989pgl.187.1536713061284; Tue, 11 Sep 2018 17:44:21 -0700 (PDT) X-Google-Smtp-Source: ANB0VdbAJqqSobhT9K+5qZiimeo0VKbuN4LUCdyQvJJE5fzk7K5bIWOqcs0ZMrCaaBSm8vVDHP6l X-Received: by 2002:a63:1125:: with SMTP id g37-v6mr30871961pgl.187.1536713060640; Tue, 11 Sep 2018 17:44:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536713060; cv=none; d=google.com; s=arc-20160816; b=s3/fREV5s72B6ug1fXTCLWZKEZyiPll0esJHDyf77Ur7VC0PXTQ8RCOIsfgzk+tlJo fMSby6Y4ndedOAJgCXE0WhbXKUKa0LSPKT5BggtTgaB72qXr+frFEf7HL3ifUQ76G+tM 8ek5soUQWR5PXOEm48AyVZFelEh2BPUt8NHbMYfJoiUAJvsM3/VHM1z9AhKky/3Hyjad ks61pF5KuHhfVWuvrRMjTDlH1xlgGnIE78Ic7I3l4aNcU1d3nO94ESRwkpkxzcoeMgze yjv3JHdUHCqQ7FLZilEFWFd2kgrUCEdgIgS1jEPpNvMS1SPDfrDft73w0+F2EH7HMlrC +Ysg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=CaCWWMD2A4Bim+4Yv358LYKJOc7V1EosNUqm3zG4Vmk=; b=jWyeedUc0BbfMQQ3+FpsbtPGVA7eAMIijBjI1YRqmhihNTgKu7hqYg43oDYyOcYxaP iFax+vY2Tf1NqiIeRb5hrymm2Z6I//12PKlP35FzCdaBJDmDeoN7tTrDjrW0x9ihmIr1 IDfF95xcGy6G4BD1rkjwv6x0LorwcS0ejW4Qrt9MbMJzb6RASunOeywMIi9QGtJYiQnH 0qSI58LPJ19IFA65l352GWLaV8C8H3sOwpFHNfq+CXsn2P6Mvbr/glCHvQLWAgIicjQs GE2G4yfyl1WYhBXEDqZRdD5xOdRcJT0DaIydF/7R219NXU2kNLBicFPDc0aYuG8p/AGk 5VYg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga09.intel.com (mga09.intel.com. [134.134.136.24]) by mx.google.com with ESMTPS id p21-v6si20648717plq.338.2018.09.11.17.44.20 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 11 Sep 2018 17:44:20 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) client-ip=134.134.136.24; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Sep 2018 17:44:20 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,362,1531810800"; d="scan'208";a="69283822" Received: from unknown (HELO yhuang-mobile.sh.intel.com) ([10.239.198.87]) by fmsmga007.fm.intel.com with ESMTP; 11 Sep 2018 17:44:15 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 RESEND 04/21] swap: Support PMD swap mapping in put_swap_page() Date: Wed, 12 Sep 2018 08:43:57 +0800 Message-Id: <20180912004414.22583-5-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180912004414.22583-1-ying.huang@intel.com> References: <20180912004414.22583-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Previously, during swapout, all PMD page mapping will be split and replaced with PTE swap mapping. And when clearing the SWAP_HAS_CACHE flag for the huge swap cluster in put_swap_page(), the huge swap cluster will be split. Now, during swapout, the PMD page mappings to the THP will be changed to PMD swap mappings to the corresponding swap cluster. So when clearing the SWAP_HAS_CACHE flag, the huge swap cluster will only be split if the PMD swap mapping count is 0. Otherwise, we will keep it as the huge swap cluster. So that we can swapin a THP in one piece later. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/swapfile.c | 31 ++++++++++++++++++++++++------- 1 file changed, 24 insertions(+), 7 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index 138968b79de5..553d2551b35a 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1314,6 +1314,15 @@ void swap_free(swp_entry_t entry) /* * Called after dropping swapcache to decrease refcnt to swap entries. + * + * When a THP is added into swap cache, the SWAP_HAS_CACHE flag will + * be set in the swap_map[] of all swap entries in the huge swap + * cluster backing the THP. This huge swap cluster will not be split + * unless the THP is split even if its PMD swap mapping count dropped + * to 0. Later, when the THP is removed from swap cache, the + * SWAP_HAS_CACHE flag will be cleared in the swap_map[] of all swap + * entries in the huge swap cluster. And this huge swap cluster will + * be split if its PMD swap mapping count is 0. */ void put_swap_page(struct page *page, swp_entry_t entry) { @@ -1332,15 +1341,23 @@ void put_swap_page(struct page *page, swp_entry_t entry) ci = lock_cluster_or_swap_info(si, offset); if (size == SWAPFILE_CLUSTER) { - VM_BUG_ON(!cluster_is_huge(ci)); + VM_BUG_ON(!IS_ALIGNED(offset, size)); map = si->swap_map + offset; - for (i = 0; i < SWAPFILE_CLUSTER; i++) { - val = map[i]; - VM_BUG_ON(!(val & SWAP_HAS_CACHE)); - if (val == SWAP_HAS_CACHE) - free_entries++; + /* + * No PMD swap mapping, the swap cluster will be freed + * if all swap entries becoming free, otherwise the + * huge swap cluster will be split. + */ + if (!cluster_swapcount(ci)) { + for (i = 0; i < SWAPFILE_CLUSTER; i++) { + val = map[i]; + VM_BUG_ON(!(val & SWAP_HAS_CACHE)); + if (val == SWAP_HAS_CACHE) + free_entries++; + } + if (free_entries != SWAPFILE_CLUSTER) + cluster_clear_huge(ci); } - cluster_clear_huge(ci); if (free_entries == SWAPFILE_CLUSTER) { unlock_cluster_or_swap_info(si, ci); spin_lock(&si->lock); From patchwork Wed Sep 12 00:43:58 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10596529 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3DF36109C for ; Wed, 12 Sep 2018 00:44:30 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2BF2329AC1 for ; Wed, 12 Sep 2018 00:44:30 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1E38929AD2; Wed, 12 Sep 2018 00:44:30 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 061DA29AC1 for ; Wed, 12 Sep 2018 00:44:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 812AA8E0007; Tue, 11 Sep 2018 20:44:27 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 798CC8E0001; Tue, 11 Sep 2018 20:44:27 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6619D8E0007; Tue, 11 Sep 2018 20:44:27 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f199.google.com (mail-pf1-f199.google.com [209.85.210.199]) by kanga.kvack.org (Postfix) with ESMTP id 212618E0001 for ; Tue, 11 Sep 2018 20:44:27 -0400 (EDT) Received: by mail-pf1-f199.google.com with SMTP id b69-v6so106355pfc.20 for ; Tue, 11 Sep 2018 17:44:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=AUnRQYQ2FroeWQIzJ8u89cB3xhne/eSjrDwa7n5ny5A=; b=FsvEw9AeMZkrc5aTPjl7/Uf8XWA558gx30YZv99OpWF8FMd9pfKV+l0I55R7aFomLU a6U17YBN1Bwb3OL210V/Uyh0NHVYXbeKUJGfDTtO1JmgqooUZJn8GcVznR7Vun3K148h 5WZQbRW+45ddN9Bg+cPYhcKYceTGO8UJjv6Kke7REjCPItJ9As/Z6SZZ+tv9/vCtQAym BBoPH3RHbkSM6OS6E/q1ukDJSBmDLhhnRPnk8cgMK+tc6RZtImRC//uBBhSlpAMDoYBP 8yEGdGPOeOX6JlkhjqZrwFWKiuw4d3htmXHXS+sbmeqC2o3gcznTimajjYCwSnVY7l2J zzyg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51ASEnZw9N68MoE1eucVhwWyq6wpXPFn3sY990tzxa51K7qN+uVh BVvFMz74UERdgVfsuRpcMuyBoUnxllAvEuxRmgKpV9cCIYUk55XY1fklb6U/nCQKKe5Fm9NtlmZ 8YYhsPqvZ8zN7XASqYziHJ+Vssudcjx06xpX7gQNWjt1OGAlOri6TfXjUD7ASjtcQwA== X-Received: by 2002:a63:cc4f:: with SMTP id q15-v6mr30782779pgi.217.1536713066775; Tue, 11 Sep 2018 17:44:26 -0700 (PDT) X-Google-Smtp-Source: ANB0VdbkqEILg8AUjz6JkMTxm6oCypiwdWpMhYNRPKKthYOgNOAINFPIJCjXJ3uPLwff29gZZuLJ X-Received: by 2002:a63:cc4f:: with SMTP id q15-v6mr30782710pgi.217.1536713065488; Tue, 11 Sep 2018 17:44:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536713065; cv=none; d=google.com; s=arc-20160816; b=FXtjX2/bvqfmp2TUM41zW1zVt5GhM7CcEbSm4ZV738Jvv0Stq2RlTV9dfDFZaFcPmC 0F0SmQWLieYl5iQ+YfHyFXbyeA0H8kI58r9BreIzGWqiNw6D1V6mCs2Foo4+4sZhlUj6 nIOey4aVquRRCELQMvSUiaz7xG3tmvPlDBJBHtMQKNwPS0zsblLDrhqM3zMbcUaGkY7C ZJB3CxQaLT4LWKGNn0Kmu+z7QLJ246AAo2ZwdwG4bY23p0HOdPIfgax27plI8hO7u5gB iFmpKNJdF+fZitHuPj/m2FWXRKSFYZoHtPH1fcxD10DDxX6hAtCpcFpjmKopGZLjGmM4 x8DQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=AUnRQYQ2FroeWQIzJ8u89cB3xhne/eSjrDwa7n5ny5A=; b=FYE+2nAmoWdU8n1RvYY0bRTp1qGFspNFzA6/VqEIhWHtOrqMrdYc6X1LJz+dL54yhT 8Sq0MDrahNC9TvWEtCzxu7wWSs/1EzL6Y5F6z4j2WjA64sL08W8gx+QhRC+vlgxoE7b0 bxKiRHgbW0M0H0g46wjnS1EQc8/QsRTJNHn+zATm2lAaw0KdtlBhFP8sap2/z/w+uCT9 jHLVS97QVqvBTykPyqvz7xmKY/yy14hknwHvZ/d4Kq9qYknc3i1WzaGv5FxYIigb3hW+ oYzBimmJnUDMlEf+NaZgTTYidD3WTfG9eCFHl7dmxWHXvP6lxmPsgjA+9ZDkqd5F6gs1 qmpg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga09.intel.com (mga09.intel.com. [134.134.136.24]) by mx.google.com with ESMTPS id p21-v6si20648717plq.338.2018.09.11.17.44.25 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 11 Sep 2018 17:44:25 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) client-ip=134.134.136.24; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Sep 2018 17:44:25 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,362,1531810800"; d="scan'208";a="69283830" Received: from unknown (HELO yhuang-mobile.sh.intel.com) ([10.239.198.87]) by fmsmga007.fm.intel.com with ESMTP; 11 Sep 2018 17:44:20 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 RESEND 05/21] swap: Support PMD swap mapping in free_swap_and_cache()/swap_free() Date: Wed, 12 Sep 2018 08:43:58 +0800 Message-Id: <20180912004414.22583-6-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180912004414.22583-1-ying.huang@intel.com> References: <20180912004414.22583-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When a PMD swap mapping is removed from a huge swap cluster, for example, unmap a memory range mapped with PMD swap mapping, etc, free_swap_and_cache() will be called to decrease the reference count to the huge swap cluster. free_swap_and_cache() may also free or split the huge swap cluster, and free the corresponding THP in swap cache if necessary. swap_free() is similar, and shares most implementation with free_swap_and_cache(). This patch revises free_swap_and_cache() and swap_free() to implement this. If the swap cluster has been split already, for example, because of failing to allocate a THP during swapin, we just decrease one from the reference count of all swap slots. Otherwise, we will decrease one from the reference count of all swap slots and the PMD swap mapping count in cluster_count(). When the corresponding THP isn't in swap cache, if PMD swap mapping count becomes 0, the huge swap cluster will be split, and if all swap count becomes 0, the huge swap cluster will be freed. When the corresponding THP is in swap cache, if every swap_map[offset] == SWAP_HAS_CACHE, we will try to delete the THP from swap cache. Which will cause the THP and the huge swap cluster be freed. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- arch/s390/mm/pgtable.c | 2 +- include/linux/swap.h | 9 +-- kernel/power/swap.c | 4 +- mm/madvise.c | 2 +- mm/memory.c | 4 +- mm/shmem.c | 6 +- mm/swapfile.c | 171 ++++++++++++++++++++++++++++++++++++++----------- 7 files changed, 149 insertions(+), 49 deletions(-) diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c index f2cc7da473e4..ffd4b68adbb3 100644 --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -675,7 +675,7 @@ static void ptep_zap_swap_entry(struct mm_struct *mm, swp_entry_t entry) dec_mm_counter(mm, mm_counter(page)); } - free_swap_and_cache(entry); + free_swap_and_cache(entry, 1); } void ptep_zap_unused(struct mm_struct *mm, unsigned long addr, diff --git a/include/linux/swap.h b/include/linux/swap.h index 1bee8b65cb8a..db3e07a3d9bc 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -453,9 +453,9 @@ extern int add_swap_count_continuation(swp_entry_t, gfp_t); extern void swap_shmem_alloc(swp_entry_t); extern int swap_duplicate(swp_entry_t *entry, int entry_size); extern int swapcache_prepare(swp_entry_t entry, int entry_size); -extern void swap_free(swp_entry_t); +extern void swap_free(swp_entry_t entry, int entry_size); extern void swapcache_free_entries(swp_entry_t *entries, int n); -extern int free_swap_and_cache(swp_entry_t); +extern int free_swap_and_cache(swp_entry_t entry, int entry_size); extern int swap_type_of(dev_t, sector_t, struct block_device **); extern unsigned int count_swap_pages(int, int); extern sector_t map_swap_page(struct page *, struct block_device **); @@ -509,7 +509,8 @@ static inline void show_swap_cache_info(void) { } -#define free_swap_and_cache(e) ({(is_migration_entry(e) || is_device_private_entry(e));}) +#define free_swap_and_cache(e, s) \ + ({(is_migration_entry(e) || is_device_private_entry(e)); }) #define swapcache_prepare(e, s) \ ({(is_migration_entry(e) || is_device_private_entry(e)); }) @@ -527,7 +528,7 @@ static inline int swap_duplicate(swp_entry_t *swp, int entry_size) return 0; } -static inline void swap_free(swp_entry_t swp) +static inline void swap_free(swp_entry_t swp, int entry_size) { } diff --git a/kernel/power/swap.c b/kernel/power/swap.c index d7f6c1a288d3..0275df84ed3d 100644 --- a/kernel/power/swap.c +++ b/kernel/power/swap.c @@ -182,7 +182,7 @@ sector_t alloc_swapdev_block(int swap) offset = swp_offset(get_swap_page_of_type(swap)); if (offset) { if (swsusp_extents_insert(offset)) - swap_free(swp_entry(swap, offset)); + swap_free(swp_entry(swap, offset), 1); else return swapdev_block(swap, offset); } @@ -206,7 +206,7 @@ void free_all_swap_pages(int swap) ext = rb_entry(node, struct swsusp_extent, node); rb_erase(node, &swsusp_extents); for (offset = ext->start; offset <= ext->end; offset++) - swap_free(swp_entry(swap, offset)); + swap_free(swp_entry(swap, offset), 1); kfree(ext); } diff --git a/mm/madvise.c b/mm/madvise.c index 972a9eaa898b..6fff1c1d2009 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -349,7 +349,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, if (non_swap_entry(entry)) continue; nr_swap--; - free_swap_and_cache(entry); + free_swap_and_cache(entry, 1); pte_clear_not_present_full(mm, addr, pte, tlb->fullmm); continue; } diff --git a/mm/memory.c b/mm/memory.c index ba3657a91980..e01e27afd2e8 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1381,7 +1381,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, page = migration_entry_to_page(entry); rss[mm_counter(page)]--; } - if (unlikely(!free_swap_and_cache(entry))) + if (unlikely(!free_swap_and_cache(entry, 1))) print_bad_pte(vma, addr, ptent, NULL); pte_clear_not_present_full(mm, addr, pte, tlb->fullmm); } while (pte++, addr += PAGE_SIZE, addr != end); @@ -3066,7 +3066,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) } set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte); - swap_free(entry); + swap_free(entry, 1); if (mem_cgroup_swap_full(page) || (vma->vm_flags & VM_LOCKED) || PageMlocked(page)) try_to_free_swap(page); diff --git a/mm/shmem.c b/mm/shmem.c index 47bb74fc97fb..279074e46f83 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -677,7 +677,7 @@ static int shmem_free_swap(struct address_space *mapping, xa_unlock_irq(&mapping->i_pages); if (old != radswap) return -ENOENT; - free_swap_and_cache(radix_to_swp_entry(radswap)); + free_swap_and_cache(radix_to_swp_entry(radswap), 1); return 0; } @@ -1212,7 +1212,7 @@ static int shmem_unuse_inode(struct shmem_inode_info *info, spin_lock_irq(&info->lock); info->swapped--; spin_unlock_irq(&info->lock); - swap_free(swap); + swap_free(swap, 1); } } return error; @@ -1751,7 +1751,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, delete_from_swap_cache(page); set_page_dirty(page); - swap_free(swap); + swap_free(swap, 1); } else { if (vma && userfaultfd_missing(vma)) { diff --git a/mm/swapfile.c b/mm/swapfile.c index 553d2551b35a..e06cc1581d1e 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -49,6 +49,9 @@ static bool swap_count_continued(struct swap_info_struct *, pgoff_t, unsigned char); static void free_swap_count_continuations(struct swap_info_struct *); static sector_t map_swap_entry(swp_entry_t, struct block_device**); +static bool __swap_page_trans_huge_swapped(struct swap_info_struct *si, + struct swap_cluster_info *ci, + unsigned long offset); DEFINE_SPINLOCK(swap_lock); static unsigned int nr_swapfiles; @@ -1267,19 +1270,106 @@ struct swap_info_struct *get_swap_device(swp_entry_t entry) return NULL; } -static unsigned char __swap_entry_free(struct swap_info_struct *p, - swp_entry_t entry, unsigned char usage) +#define SF_FREE_CACHE 0x1 + +static void __swap_free(struct swap_info_struct *p, swp_entry_t entry, + int entry_size, unsigned long flags) { struct swap_cluster_info *ci; unsigned long offset = swp_offset(entry); + int i, free_entries = 0, cache_only = 0; + int size = swap_entry_size(entry_size); + unsigned char *map, count; ci = lock_cluster_or_swap_info(p, offset); - usage = __swap_entry_free_locked(p, offset, usage); + VM_BUG_ON(!IS_ALIGNED(offset, size)); + /* + * Normal swap entry or huge swap cluster has been split, free + * each swap entry + */ + if (size == 1 || !cluster_is_huge(ci)) { + for (i = 0; i < size; i++, entry.val++) { + count = __swap_entry_free_locked(p, offset + i, 1); + if (!count || + (flags & SF_FREE_CACHE && + count == SWAP_HAS_CACHE && + !__swap_page_trans_huge_swapped(p, ci, + offset + i))) { + unlock_cluster_or_swap_info(p, ci); + if (!count) + free_swap_slot(entry); + else + __try_to_reclaim_swap(p, offset + i, + TTRS_UNMAPPED | TTRS_FULL); + if (i == size - 1) + return; + lock_cluster_or_swap_info(p, offset); + } + } + unlock_cluster_or_swap_info(p, ci); + return; + } + /* + * Return for normal swap entry above, the following code is + * for huge swap cluster only. + */ + cluster_add_swapcount(ci, -1); + /* + * Decrease mapping count for each swap entry in cluster. + * Because PMD swap mapping is counted in p->swap_map[] too. + */ + map = p->swap_map + offset; + for (i = 0; i < size; i++) { + /* + * Mark swap entries to become free as SWAP_MAP_BAD + * temporarily. + */ + if (map[i] == 1) { + map[i] = SWAP_MAP_BAD; + free_entries++; + } else if (__swap_entry_free_locked(p, offset + i, 1) == + SWAP_HAS_CACHE) + cache_only++; + } + /* + * If there are PMD swap mapping or the THP is in swap cache, + * it's impossible for some swap entries to become free. + */ + VM_BUG_ON(free_entries && + (cluster_swapcount(ci) || (map[0] & SWAP_HAS_CACHE))); + if (free_entries == SWAPFILE_CLUSTER) + memset(map, SWAP_HAS_CACHE, SWAPFILE_CLUSTER); + /* + * If there are no PMD swap mappings remain and the THP isn't + * in swap cache, split the huge swap cluster. + */ + else if (!cluster_swapcount(ci) && !(map[0] & SWAP_HAS_CACHE)) + cluster_clear_huge(ci); unlock_cluster_or_swap_info(p, ci); - if (!usage) - free_swap_slot(entry); - - return usage; + if (free_entries == SWAPFILE_CLUSTER) { + spin_lock(&p->lock); + mem_cgroup_uncharge_swap(entry, SWAPFILE_CLUSTER); + swap_free_cluster(p, offset / SWAPFILE_CLUSTER); + spin_unlock(&p->lock); + } else if (free_entries) { + ci = lock_cluster(p, offset); + for (i = 0; i < size; i++, entry.val++) { + /* + * To be freed swap entries are marked as SWAP_MAP_BAD + * temporarily as above + */ + if (map[i] == SWAP_MAP_BAD) { + map[i] = SWAP_HAS_CACHE; + unlock_cluster(ci); + free_swap_slot(entry); + if (i == size - 1) + return; + ci = lock_cluster(p, offset); + } + } + unlock_cluster(ci); + } else if (cache_only == SWAPFILE_CLUSTER && flags & SF_FREE_CACHE) + __try_to_reclaim_swap(p, offset, TTRS_UNMAPPED | TTRS_FULL); } static void swap_entry_free(struct swap_info_struct *p, swp_entry_t entry) @@ -1303,13 +1393,13 @@ static void swap_entry_free(struct swap_info_struct *p, swp_entry_t entry) * Caller has made sure that the swap device corresponding to entry * is still around or has not been recycled. */ -void swap_free(swp_entry_t entry) +void swap_free(swp_entry_t entry, int entry_size) { struct swap_info_struct *p; p = _swap_info_get(entry); if (p) - __swap_entry_free(p, entry, 1); + __swap_free(p, entry, entry_size, 0); } /* @@ -1545,29 +1635,33 @@ int swp_swapcount(swp_entry_t entry) return count; } -static bool swap_page_trans_huge_swapped(struct swap_info_struct *si, - swp_entry_t entry) +/* si->lock or ci->lock must be held before calling this function */ +static bool __swap_page_trans_huge_swapped(struct swap_info_struct *si, + struct swap_cluster_info *ci, + unsigned long offset) { - struct swap_cluster_info *ci; unsigned char *map = si->swap_map; - unsigned long roffset = swp_offset(entry); - unsigned long offset = round_down(roffset, SWAPFILE_CLUSTER); + unsigned long hoffset = round_down(offset, SWAPFILE_CLUSTER); int i; - bool ret = false; - ci = lock_cluster_or_swap_info(si, offset); - if (!ci || !cluster_is_huge(ci)) { - if (swap_count(map[roffset])) - ret = true; - goto unlock_out; - } + if (!ci || !cluster_is_huge(ci)) + return !!swap_count(map[offset]); for (i = 0; i < SWAPFILE_CLUSTER; i++) { - if (swap_count(map[offset + i])) { - ret = true; - break; - } + if (swap_count(map[hoffset + i])) + return true; } -unlock_out: + return false; +} + +static bool swap_page_trans_huge_swapped(struct swap_info_struct *si, + swp_entry_t entry) +{ + struct swap_cluster_info *ci; + unsigned long offset = swp_offset(entry); + bool ret; + + ci = lock_cluster_or_swap_info(si, offset); + ret = __swap_page_trans_huge_swapped(si, ci, offset); unlock_cluster_or_swap_info(si, ci); return ret; } @@ -1739,22 +1833,17 @@ int try_to_free_swap(struct page *page) * Free the swap entry like above, but also try to * free the page cache entry if it is the last user. */ -int free_swap_and_cache(swp_entry_t entry) +int free_swap_and_cache(swp_entry_t entry, int entry_size) { struct swap_info_struct *p; - unsigned char count; if (non_swap_entry(entry)) return 1; p = _swap_info_get(entry); - if (p) { - count = __swap_entry_free(p, entry, 1); - if (count == SWAP_HAS_CACHE && - !swap_page_trans_huge_swapped(p, entry)) - __try_to_reclaim_swap(p, swp_offset(entry), - TTRS_UNMAPPED | TTRS_FULL); - } + if (p) + __swap_free(p, entry, entry_size, SF_FREE_CACHE); + return p != NULL; } @@ -1901,7 +1990,7 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd, } set_pte_at(vma->vm_mm, addr, pte, pte_mkold(mk_pte(page, vma->vm_page_prot))); - swap_free(entry); + swap_free(entry, 1); /* * Move the page to the active list so it is not * immediately swapped out again after swapon. @@ -2340,6 +2429,16 @@ int try_to_unuse(unsigned int type, bool frontswap, } mmput(start_mm); + + /* + * Swap entries may be marked as SWAP_MAP_BAD temporarily in + * __swap_free() before being freed really. + * find_next_to_unuse() will skip these swap entries, that is + * OK. But we need to wait until they are freed really. + */ + while (!retval && READ_ONCE(si->inuse_pages)) + schedule_timeout_uninterruptible(1); + return retval; } From patchwork Wed Sep 12 00:43:59 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10596531 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 507AE6CB for ; Wed, 12 Sep 2018 00:44:34 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3EBE629AC1 for ; Wed, 12 Sep 2018 00:44:34 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 326BD29AD2; Wed, 12 Sep 2018 00:44:34 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5DECA29AC1 for ; Wed, 12 Sep 2018 00:44:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B1F158E0008; Tue, 11 Sep 2018 20:44:31 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id AAAB48E0001; Tue, 11 Sep 2018 20:44:31 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9486F8E0008; Tue, 11 Sep 2018 20:44:31 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f199.google.com (mail-pl1-f199.google.com [209.85.214.199]) by kanga.kvack.org (Postfix) with ESMTP id 4DB0A8E0001 for ; Tue, 11 Sep 2018 20:44:31 -0400 (EDT) Received: by mail-pl1-f199.google.com with SMTP id k18-v6so108250pls.12 for ; Tue, 11 Sep 2018 17:44:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=+9Byn0qj/BI8WWlrAuB8vergPN2K4AIuHwNsG/GaMkM=; b=P6m5PgQvFQK92NwmKysigiXDiNLA/0KmworJXblGBDOyPmngGl3JYc4LIrGsu8fOIb msYju8sO/qRFO2GwOX9bxDnRbw7/owoNqWO3QYAgyK4XCoVE3r//0drRmS4WDwedi9eG DvAgO9XmQGalrcZmNOAfKLvyXss0zfJoyGuYe+h+Je6jUKTUKJWraA0qNjQor4HwhuWK aI1ov4NS0eUb2+xsW2jfAKhVgKqmqAd5VSR9Tdo5deoD+ozzAVgsA+J5E6W9u6DlP1PP jnhkSDujiLfVsaCKlcS1wqYBt6S6Ywn+gBOxfLojTvd+BsSVwM7YFWikSFv4o8Q20JdW SEbQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51DSoPtEU264xVElX6TdL2iI63bQfDcIHz2kHEzzRorWUa4/Ni6b 2P9VlljMGYAShiPdh3+g6BDy08IlOYpAJfIG/Vang86LPmMMWdVnFb4J9iup7zDbjatdZ9HV0Vp IlhW6sgrYx4+CfhUe6qY0dJ7o7r7D4TYJ4fQDbCgmcjhZlWSo+CpDIAhMA7+Sb+1eKQ== X-Received: by 2002:a63:f309:: with SMTP id l9-v6mr30250659pgh.369.1536713070951; Tue, 11 Sep 2018 17:44:30 -0700 (PDT) X-Google-Smtp-Source: ANB0VdZNchXiRN0IK+WVSNNc4BHnBjgjMUJ4zXdFFHBPCuHrZ8B9LfA1jlL4BPvB6YTHNgbQahk/ X-Received: by 2002:a63:f309:: with SMTP id l9-v6mr30250617pgh.369.1536713070138; Tue, 11 Sep 2018 17:44:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536713070; cv=none; d=google.com; s=arc-20160816; b=G1kONSKbMDN9xYogVDHjCqLFo8oHI9B/9q+rCvWPAC4xs/zna+B9XGeQr1JmxEjbaE pPNLe4rgfHWE6hC62xod0RcRpM6wGp+rTUgQGb20fcnWzzL3Pu4vYANfrwe7HksS0F5c Ph57ACbXkqGY7MiS+TLXMGeaCdgo+uAc622GMjjoE0Z55IQNuLpSFKDgsx38AB3kd8uy JJ86WYa5u5Y+zVV9TnSBiMxK9gl3eMcnCgbqIffJ4dNbsJz2FwV/GFdcCLBpy4dvCBWF V7KL/nHDspfRzocXkMqFUOOeIV+tYpaazYFnJBuFzrFDvefmBg/jB3RAK6GVPpou52wV 8aUA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=+9Byn0qj/BI8WWlrAuB8vergPN2K4AIuHwNsG/GaMkM=; b=cRObOZL+mbLy9kaWu6fFSoGJ1OFi9sNNqOlMaNZS0hUeQlP1hcZe/1VIVkTH9OeVvS Ol8gdx1Yjb9mNGHy13X6RT1am/BA97nItsu/FRo3M1HoWGL8T8WB0EO3yHU4zVSpMxAE C5vjWBVgLz9bEJTjZAWUwAQyoQMuIqREd1I2FIAalWA665RvxbGCliJlmxiUp84R2cSz H1uvRkNtUPCJP5rG45eLPx+y15bURZ1EKfDMjHat4ZuecHmLVLNdtIOP8BiRUlDS+Ggb F7Y5CAdlARoIISFN10ExEN0SA9zm5I8gvqRGP7hDPkfyT9f27HvXD51BJ9junVYlGSlE nuDA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga09.intel.com (mga09.intel.com. [134.134.136.24]) by mx.google.com with ESMTPS id p21-v6si20648717plq.338.2018.09.11.17.44.29 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 11 Sep 2018 17:44:30 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) client-ip=134.134.136.24; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Sep 2018 17:44:29 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,362,1531810800"; d="scan'208";a="69283837" Received: from unknown (HELO yhuang-mobile.sh.intel.com) ([10.239.198.87]) by fmsmga007.fm.intel.com with ESMTP; 11 Sep 2018 17:44:25 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 RESEND 06/21] swap: Support PMD swap mapping when splitting huge PMD Date: Wed, 12 Sep 2018 08:43:59 +0800 Message-Id: <20180912004414.22583-7-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180912004414.22583-1-ying.huang@intel.com> References: <20180912004414.22583-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP A huge PMD need to be split when zap a part of the PMD mapping etc. If the PMD mapping is a swap mapping, we need to split it too. This patch implemented the support for this. This is similar as splitting the PMD page mapping, except we need to decrease the PMD swap mapping count for the huge swap cluster too. If the PMD swap mapping count becomes 0, the huge swap cluster will be split. Notice: is_huge_zero_pmd() and pmd_page() doesn't work well with swap PMD, so pmd_present() check is called before them. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/huge_mm.h | 4 ++++ include/linux/swap.h | 6 ++++++ mm/huge_memory.c | 48 +++++++++++++++++++++++++++++++++++++++++++----- mm/swapfile.c | 32 ++++++++++++++++++++++++++++++++ 4 files changed, 85 insertions(+), 5 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 99c19b06d9a4..0f3e1739986f 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -226,6 +226,10 @@ static inline bool is_huge_zero_page(struct page *page) return READ_ONCE(huge_zero_page) == page; } +/* + * is_huge_zero_pmd() must be called after checking pmd_present(), + * otherwise, it may report false positive for PMD swap entry. + */ static inline bool is_huge_zero_pmd(pmd_t pmd) { return is_huge_zero_page(pmd_page(pmd)); diff --git a/include/linux/swap.h b/include/linux/swap.h index db3e07a3d9bc..a2a3d85decd9 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -618,11 +618,17 @@ static inline swp_entry_t get_swap_page(struct page *page) #ifdef CONFIG_THP_SWAP extern int split_swap_cluster(swp_entry_t entry); +extern int split_swap_cluster_map(swp_entry_t entry); #else static inline int split_swap_cluster(swp_entry_t entry) { return 0; } + +static inline int split_swap_cluster_map(swp_entry_t entry) +{ + return 0; +} #endif #ifdef CONFIG_MEMCG diff --git a/mm/huge_memory.c b/mm/huge_memory.c index c235ba78de68..b8b61a0879f6 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1609,6 +1609,40 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd) return 0; } +/* Convert a PMD swap mapping to a set of PTE swap mappings */ +static void __split_huge_swap_pmd(struct vm_area_struct *vma, + unsigned long haddr, + pmd_t *pmd) +{ + struct mm_struct *mm = vma->vm_mm; + pgtable_t pgtable; + pmd_t _pmd; + swp_entry_t entry; + int i, soft_dirty; + + entry = pmd_to_swp_entry(*pmd); + soft_dirty = pmd_soft_dirty(*pmd); + + split_swap_cluster_map(entry); + + pgtable = pgtable_trans_huge_withdraw(mm, pmd); + pmd_populate(mm, &_pmd, pgtable); + + for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE, entry.val++) { + pte_t *pte, ptent; + + pte = pte_offset_map(&_pmd, haddr); + VM_BUG_ON(!pte_none(*pte)); + ptent = swp_entry_to_pte(entry); + if (soft_dirty) + ptent = pte_swp_mksoft_dirty(ptent); + set_pte_at(mm, haddr, pte, ptent); + pte_unmap(pte); + } + smp_wmb(); /* make pte visible before pmd */ + pmd_populate(mm, pmd, pgtable); +} + /* * Return true if we do MADV_FREE successfully on entire pmd page. * Otherwise, return false. @@ -2075,7 +2109,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, VM_BUG_ON(haddr & ~HPAGE_PMD_MASK); VM_BUG_ON_VMA(vma->vm_start > haddr, vma); VM_BUG_ON_VMA(vma->vm_end < haddr + HPAGE_PMD_SIZE, vma); - VM_BUG_ON(!is_pmd_migration_entry(*pmd) && !pmd_trans_huge(*pmd) + VM_BUG_ON(!is_swap_pmd(*pmd) && !pmd_trans_huge(*pmd) && !pmd_devmap(*pmd)); count_vm_event(THP_SPLIT_PMD); @@ -2099,7 +2133,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, put_page(page); add_mm_counter(mm, mm_counter_file(page), -HPAGE_PMD_NR); return; - } else if (is_huge_zero_pmd(*pmd)) { + } else if (pmd_present(*pmd) && is_huge_zero_pmd(*pmd)) { /* * FIXME: Do we want to invalidate secondary mmu by calling * mmu_notifier_invalidate_range() see comments below inside @@ -2143,6 +2177,9 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, page = pfn_to_page(swp_offset(entry)); } else #endif + if (IS_ENABLED(CONFIG_THP_SWAP) && is_swap_pmd(old_pmd)) + return __split_huge_swap_pmd(vma, haddr, pmd); + else page = pmd_page(old_pmd); VM_BUG_ON_PAGE(!page_count(page), page); page_ref_add(page, HPAGE_PMD_NR - 1); @@ -2235,14 +2272,15 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, * pmd against. Otherwise we can end up replacing wrong page. */ VM_BUG_ON(freeze && !page); - if (page && page != pmd_page(*pmd)) - goto out; + /* pmd_page() should be called only if pmd_present() */ + if (page && (!pmd_present(*pmd) || page != pmd_page(*pmd))) + goto out; if (pmd_trans_huge(*pmd)) { page = pmd_page(*pmd); if (PageMlocked(page)) clear_page_mlock(page); - } else if (!(pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd))) + } else if (!(pmd_devmap(*pmd) || is_swap_pmd(*pmd))) goto out; __split_huge_pmd_locked(vma, pmd, haddr, freeze); out: diff --git a/mm/swapfile.c b/mm/swapfile.c index e06cc1581d1e..16723b9d971a 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -4034,6 +4034,38 @@ void mem_cgroup_throttle_swaprate(struct mem_cgroup *memcg, int node, } #endif +#ifdef CONFIG_THP_SWAP +/* + * The corresponding page table shouldn't be changed under us, that + * is, the page table lock should be held. + */ +int split_swap_cluster_map(swp_entry_t entry) +{ + struct swap_info_struct *si; + struct swap_cluster_info *ci; + unsigned long offset = swp_offset(entry); + + VM_BUG_ON(!IS_ALIGNED(offset, SWAPFILE_CLUSTER)); + si = _swap_info_get(entry); + if (!si) + return -EBUSY; + ci = lock_cluster(si, offset); + /* The swap cluster has been split by someone else, we are done */ + if (!cluster_is_huge(ci)) + goto out; + cluster_add_swapcount(ci, -1); + /* + * If the last PMD swap mapping has gone and the THP isn't in + * swap cache, the huge swap cluster will be split. + */ + if (!cluster_swapcount(ci) && !(si->swap_map[offset] & SWAP_HAS_CACHE)) + cluster_clear_huge(ci); +out: + unlock_cluster(ci); + return 0; +} +#endif + static int __init swapfile_init(void) { int nid; From patchwork Wed Sep 12 00:44:00 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10596533 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EAD5D109C for ; Wed, 12 Sep 2018 00:44:38 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D9CA629AC1 for ; Wed, 12 Sep 2018 00:44:38 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CD96A29AD2; Wed, 12 Sep 2018 00:44:38 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1B20529AC1 for ; Wed, 12 Sep 2018 00:44:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 885C38E0009; Tue, 11 Sep 2018 20:44:36 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 80E608E0001; Tue, 11 Sep 2018 20:44:36 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6D6988E0009; Tue, 11 Sep 2018 20:44:36 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f200.google.com (mail-pl1-f200.google.com [209.85.214.200]) by kanga.kvack.org (Postfix) with ESMTP id 281DB8E0001 for ; Tue, 11 Sep 2018 20:44:36 -0400 (EDT) Received: by mail-pl1-f200.google.com with SMTP id 90-v6so101879pla.18 for ; Tue, 11 Sep 2018 17:44:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=XsQGWOzyJ7Aj5rsD3qBHjQcnmEmR4tal76Y9dbmatXY=; b=Nf+FBU67v5SF3Kdj4tciPh72pQQxKRReX20QZbHYkjIhUWmx/DqLQcLAd55JCF/hNq cUGwpqCYj/OFqjZBEWCFn4gPc38z839ftajq/7asg6s61GweZxVfERV8eJqHYxSDChg6 0YDR7rFJaleAl7PW/dRbPGfymje93VObp9PWPRkS7jQXEDz/OeAIl/90/0lLS65OXA2u faPBb3ZE1my1RVS+uvskhwdfeYfN0t1p9637RH0TDpnexG8OK0Uf3Pu7uIWUOadJKW1W yMc59B3I6B6B4VLoWhMf5BPFtqMns4VO03SdBmQJagxwBCnAvhBIJmbz/C57XSMKBvDw Pnrg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51AgsZsZnxzMifxQD4OiHkV6lSF1Sua+M9Q7oquGMEss92r7o5CZ VWXCZaOMkQRoirGURDuDjrLeJFLghfy2JIDlHs5rVX+VlSKvfZKikbvHP/xxt4fHpoFJLeQOTmq ugInxOglLOQrPxxY8MviARXWuBDmWwXsOQixMV3sjyg8bkGgap90vr2ccbUBkmDhSHw== X-Received: by 2002:a63:c807:: with SMTP id z7-v6mr30303572pgg.77.1536713075827; Tue, 11 Sep 2018 17:44:35 -0700 (PDT) X-Google-Smtp-Source: ANB0VdYmpgk/n3Pr50BdAlcgiWJcnN0vLTJVuzoeX/Rya329rjEbq4A61aTAr52ZMqzkr5fsZBc9 X-Received: by 2002:a63:c807:: with SMTP id z7-v6mr30303535pgg.77.1536713074983; Tue, 11 Sep 2018 17:44:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536713074; cv=none; d=google.com; s=arc-20160816; b=nFyZoEHeN1uD2ffLcE4xUR4BGAh8XbruIb0CD0WWpNfd08L7DKFy0YDx8EHP4Yv0/O tJHDNV1wPbQJhLv3/3UIEhyA81tguHUsIe2k8Ywx6XE6cBcdacBc3wO8Wf8r2E3cBSVZ /ozrU9bBn0YoSB+/q2yHx4q1Tmk4AshO0KvMVHzFON/tjBQ6xOCSugN2yUyepZwPFxCK 7P9jxHLFV4pKR7cLpNaYdEaPdSIoqPuk4cxICf8gW4h1HIuwdLXQAbvxahZUTgphZ2zY Q3f1FjegRN+D1bsPGUZPMVwxGOgrSj2l61bAxDA6AJO/P3+UPi85hDujYSVPv6bSOo// VpmA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=XsQGWOzyJ7Aj5rsD3qBHjQcnmEmR4tal76Y9dbmatXY=; b=reXyNP6n/gI9goAqcORYy7nU3Gd+KKHU4WYmfit0e40dzF3dGyBT3AfNYg1+9dJa35 ixCP+E4xHSrM5byn6eAt9fehixWyi4iC/eD0pCVROmlkC6egcs+IAgxa5cjZJQUXjD8F Q2efuw4OBDMVSAE6D0lzb42vlTs1dClvwTILqlQhqViADcCNgjh5hs8p90iMjJT0NJqo WiiJMtLqUe3ETihqQIe576ob3WJHVTiJx8EbJ/oUjQ3tngMv64/LKV7Q+ip0TSD2RQhO 40CkAL8efz1VuZ2H6Pjr+fROyrRvM5aAPw12xoBEr1Bm0SmGtFp59rpDI7ct1j62bBfb QXyg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga09.intel.com (mga09.intel.com. [134.134.136.24]) by mx.google.com with ESMTPS id p21-v6si20648717plq.338.2018.09.11.17.44.34 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 11 Sep 2018 17:44:34 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) client-ip=134.134.136.24; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Sep 2018 17:44:34 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,362,1531810800"; d="scan'208";a="69283844" Received: from unknown (HELO yhuang-mobile.sh.intel.com) ([10.239.198.87]) by fmsmga007.fm.intel.com with ESMTP; 11 Sep 2018 17:44:29 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 RESEND 07/21] swap: Support PMD swap mapping in split_swap_cluster() Date: Wed, 12 Sep 2018 08:44:00 +0800 Message-Id: <20180912004414.22583-8-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180912004414.22583-1-ying.huang@intel.com> References: <20180912004414.22583-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When splitting a THP in swap cache or failing to allocate a THP when swapin a huge swap cluster, the huge swap cluster will be split. In addition to clear the huge flag of the swap cluster, the PMD swap mapping count recorded in cluster_count() will be set to 0. But we will not touch PMD swap mappings themselves, because it is hard to find them all sometimes. When the PMD swap mappings are operated later, it will be found that the huge swap cluster has been split and the PMD swap mappings will be split at that time. Unless splitting a THP in swap cache (specified via "force" parameter), split_swap_cluster() will return -EEXIST if there is SWAP_HAS_CACHE flag in swap_map[offset]. Because this indicates there is a THP corresponds to this huge swap cluster, and it isn't desired to split the THP. When splitting a THP in swap cache, the position to call split_swap_cluster() is changed to before unlocking sub-pages. So that all sub-pages will be kept locked from the THP has been split to the huge swap cluster is split. This makes the code much easier to be reasoned. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/swap.h | 6 ++++-- mm/huge_memory.c | 18 ++++++++++------ mm/swapfile.c | 58 +++++++++++++++++++++++++++++++++++++--------------- 3 files changed, 57 insertions(+), 25 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index a2a3d85decd9..c0c3b3c077d7 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -616,11 +616,13 @@ static inline swp_entry_t get_swap_page(struct page *page) #endif /* CONFIG_SWAP */ +#define SSC_SPLIT_CACHED 0x1 + #ifdef CONFIG_THP_SWAP -extern int split_swap_cluster(swp_entry_t entry); +extern int split_swap_cluster(swp_entry_t entry, unsigned long flags); extern int split_swap_cluster_map(swp_entry_t entry); #else -static inline int split_swap_cluster(swp_entry_t entry) +static inline int split_swap_cluster(swp_entry_t entry, unsigned long flags) { return 0; } diff --git a/mm/huge_memory.c b/mm/huge_memory.c index b8b61a0879f6..64123cefa978 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2502,6 +2502,17 @@ static void __split_huge_page(struct page *page, struct list_head *list, unfreeze_page(head); + /* + * Split swap cluster before unlocking sub-pages. So all + * sub-pages will be kept locked from THP has been split to + * swap cluster is split. + */ + if (PageSwapCache(head)) { + swp_entry_t entry = { .val = page_private(head) }; + + split_swap_cluster(entry, SSC_SPLIT_CACHED); + } + for (i = 0; i < HPAGE_PMD_NR; i++) { struct page *subpage = head + i; if (subpage == page) @@ -2728,12 +2739,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) __dec_node_page_state(page, NR_SHMEM_THPS); spin_unlock(&pgdata->split_queue_lock); __split_huge_page(page, list, flags); - if (PageSwapCache(head)) { - swp_entry_t entry = { .val = page_private(head) }; - - ret = split_swap_cluster(entry); - } else - ret = 0; + ret = 0; } else { if (IS_ENABLED(CONFIG_DEBUG_VM) && mapcount) { pr_alert("total_mapcount: %u, page_count(): %u\n", diff --git a/mm/swapfile.c b/mm/swapfile.c index 16723b9d971a..ef2b42c199c0 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1469,23 +1469,6 @@ void put_swap_page(struct page *page, swp_entry_t entry) unlock_cluster_or_swap_info(si, ci); } -#ifdef CONFIG_THP_SWAP -int split_swap_cluster(swp_entry_t entry) -{ - struct swap_info_struct *si; - struct swap_cluster_info *ci; - unsigned long offset = swp_offset(entry); - - si = _swap_info_get(entry); - if (!si) - return -EBUSY; - ci = lock_cluster(si, offset); - cluster_clear_huge(ci); - unlock_cluster(ci); - return 0; -} -#endif - static int swp_entry_cmp(const void *ent1, const void *ent2) { const swp_entry_t *e1 = ent1, *e2 = ent2; @@ -4064,6 +4047,47 @@ int split_swap_cluster_map(swp_entry_t entry) unlock_cluster(ci); return 0; } + +/* + * We will not try to split all PMD swap mappings to the swap cluster, + * because we haven't enough information available for that. Later, + * when the PMD swap mapping is duplicated or swapin, etc, the PMD + * swap mapping will be split and fallback to the PTE operations. + */ +int split_swap_cluster(swp_entry_t entry, unsigned long flags) +{ + struct swap_info_struct *si; + struct swap_cluster_info *ci; + unsigned long offset = swp_offset(entry); + int ret = 0; + + si = get_swap_device(entry); + if (!si) + return -EINVAL; + ci = lock_cluster(si, offset); + /* The swap cluster has been split by someone else, we are done */ + if (!cluster_is_huge(ci)) + goto out; + VM_BUG_ON(!IS_ALIGNED(offset, SWAPFILE_CLUSTER)); + VM_BUG_ON(cluster_count(ci) < SWAPFILE_CLUSTER); + /* + * If not requested, don't split swap cluster that has SWAP_HAS_CACHE + * flag. When the flag is cleared later, the huge swap cluster will + * be split if there is no PMD swap mapping. + */ + if (!(flags & SSC_SPLIT_CACHED) && + si->swap_map[offset] & SWAP_HAS_CACHE) { + ret = -EEXIST; + goto out; + } + cluster_set_swapcount(ci, 0); + cluster_clear_huge(ci); + +out: + unlock_cluster(ci); + put_swap_device(si); + return ret; +} #endif static int __init swapfile_init(void) From patchwork Wed Sep 12 00:44:01 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10596535 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4BE5F109C for ; Wed, 12 Sep 2018 00:44:44 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3928829AC1 for ; Wed, 12 Sep 2018 00:44:44 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2D14129AD2; Wed, 12 Sep 2018 00:44:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3E00E29AC1 for ; Wed, 12 Sep 2018 00:44:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8FB338E000A; Tue, 11 Sep 2018 20:44:41 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 884228E0001; Tue, 11 Sep 2018 20:44:41 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 726518E000A; Tue, 11 Sep 2018 20:44:41 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f197.google.com (mail-pf1-f197.google.com [209.85.210.197]) by kanga.kvack.org (Postfix) with ESMTP id 2F0E28E0001 for ; Tue, 11 Sep 2018 20:44:41 -0400 (EDT) Received: by mail-pf1-f197.google.com with SMTP id a23-v6so103818pfo.23 for ; Tue, 11 Sep 2018 17:44:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=qX3WLc3zKTyu5lZ2Sct7yBSZfHZgbd43GmKJeyS43gI=; b=JxF+l55q5Ytp7zVjvt/D4nS0LzEwRNdynhENNlXqDycsxLQk68sxDi4n4LK8zPcyrV 2yRZOb2cBxZCDmzN9av8uaol0b3oPfDBmEJsguLQDa6DNcNBcQ2ieAzbfyOOdfpWkwKv ZhfRKnTKeMCk4kFU//NijZPHG1TfUsgR7jW6w+HUQAnLZfBir+y6iwXO/ep6GbSd4kku xorbv7/CKfolkY3/mkTnJkg0xZOE0L7eScxQHNVvE2niNhw/XFIp56Znj57lRkU4Ys93 pK6oYBCF3vzYNe66rDpgRt51RCMtOu1gq3YbmyeIicD3rz0wrWB4h19Jfh15/mXKqegy W5+Q== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51Biq5fDztEfhTZt/l/HjXRfmLavBGdWgnZmgR9JiciJMx8FdvP8 HoKrOG8xZk93OSmAaJWGXVFKrSvtj8Er4iMXIRuM9JwBsntkHKZBxn66709iwoFWm7hufyLb9hX rClgHxn6GL4ZSx6UdVJroMoujVo9snzNPAw7AvAritQRxyvQixyVNCIJ9vDm0uyGzzA== X-Received: by 2002:a17:902:b60b:: with SMTP id b11-v6mr29097480pls.301.1536713080851; Tue, 11 Sep 2018 17:44:40 -0700 (PDT) X-Google-Smtp-Source: ANB0VdajYJibI5MDjWGaHQbbLVcuSA4PXhDe/uZLXCUGP6nrkh/kI8AqM83GP5zd4pXzllwgiYKv X-Received: by 2002:a17:902:b60b:: with SMTP id b11-v6mr29097444pls.301.1536713080033; Tue, 11 Sep 2018 17:44:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536713080; cv=none; d=google.com; s=arc-20160816; b=ClkenDyF1dkgQb8NwBKBHKu7oIC1zX5mcW2rBqpPsIpPr3at54X4gNgQyZNIQei1Lm zlwk82ITU8G855VjjlBiA2vEieqAdxRa2Gl2BfQji5lyvGOmfQ9+g/578pxSvRNKgdvJ F9359Y4je6DRclBTkxetaXAwo1FnxhbpcQxT/X2LbYkfOgRAiFIDrP6oc3ICk02rpcYP tsXvu9KyhIDHSzGMIGDMXKhNnlzb+KXpqTnj4xwk/HdSMCksv+BauwxkmX/sXIL3G8z7 GPoPBNfkO1bZ2aau0+3vxunnIK8OfN0oL/rW2LU6Qse0hjpJ5T4ZYh6FwTXHijrM8RPf ZKQA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=qX3WLc3zKTyu5lZ2Sct7yBSZfHZgbd43GmKJeyS43gI=; b=RV3DQGsiWtVSp2M3OeBoi1cWfHWJ4FCHgVveNgyBK96NiRJSuc3JKSaPRgGJhZnzlL WNyby1wwQhcwx1gydL5UGIjeoCfj3On1PZsQsJKEBtKdo9Dglvd5sL7UDFuMq5lJFpzU XrpnDdHA/t82BJo5wshs8OIPXWCeWh41AmEuiM1yQuW0ygjQvKW7RUVRswInm0YX8ZZD CERafSiOQO8E5fmt9swswJ3JWFovzeB2bKYx0t2PEy3EZ8ai6giQR2ndfrRT7ZlGs9A1 eMeKDw/O89YAnysgqFq5Qb/+kUvot+TjndHP+upcXpL5BPY1oQ9EBb40DPueatN8t1nk F8+w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga09.intel.com (mga09.intel.com. [134.134.136.24]) by mx.google.com with ESMTPS id p21-v6si20648717plq.338.2018.09.11.17.44.39 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 11 Sep 2018 17:44:40 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) client-ip=134.134.136.24; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Sep 2018 17:44:39 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,362,1531810800"; d="scan'208";a="69283856" Received: from unknown (HELO yhuang-mobile.sh.intel.com) ([10.239.198.87]) by fmsmga007.fm.intel.com with ESMTP; 11 Sep 2018 17:44:34 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 RESEND 08/21] swap: Support to read a huge swap cluster for swapin a THP Date: Wed, 12 Sep 2018 08:44:01 +0800 Message-Id: <20180912004414.22583-9-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180912004414.22583-1-ying.huang@intel.com> References: <20180912004414.22583-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP To swapin a THP in one piece, we need to read a huge swap cluster from the swap device. This patch revised the __read_swap_cache_async() and its callers and callees to support this. If __read_swap_cache_async() find the swap cluster of the specified swap entry is huge, it will try to allocate a THP, add it into the swap cache. So later the contents of the huge swap cluster can be read into the THP. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/huge_mm.h | 38 ++++++++++++++++++++++++++ include/linux/swap.h | 4 +-- mm/huge_memory.c | 26 ------------------ mm/swap_state.c | 72 ++++++++++++++++++++++++++++++++++++------------- mm/swapfile.c | 9 ++++--- 5 files changed, 99 insertions(+), 50 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 0f3e1739986f..3fdb29bc250c 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -250,6 +250,39 @@ static inline bool thp_migration_supported(void) return IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION); } +/* + * always: directly stall for all thp allocations + * defer: wake kswapd and fail if not immediately available + * defer+madvise: wake kswapd and directly stall for MADV_HUGEPAGE, otherwise + * fail if not immediately available + * madvise: directly stall for MADV_HUGEPAGE, otherwise fail if not immediately + * available + * never: never stall for any thp allocation + */ +static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) +{ + bool vma_madvised; + + if (!vma) + return GFP_TRANSHUGE_LIGHT; + vma_madvised = !!(vma->vm_flags & VM_HUGEPAGE); + if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, + &transparent_hugepage_flags)) + return GFP_TRANSHUGE | (vma_madvised ? 0 : __GFP_NORETRY); + if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, + &transparent_hugepage_flags)) + return GFP_TRANSHUGE_LIGHT | __GFP_KSWAPD_RECLAIM; + if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, + &transparent_hugepage_flags)) + return GFP_TRANSHUGE_LIGHT | + (vma_madvised ? __GFP_DIRECT_RECLAIM : + __GFP_KSWAPD_RECLAIM); + if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, + &transparent_hugepage_flags)) + return GFP_TRANSHUGE_LIGHT | + (vma_madvised ? __GFP_DIRECT_RECLAIM : 0); + return GFP_TRANSHUGE_LIGHT; +} #else /* CONFIG_TRANSPARENT_HUGEPAGE */ #define HPAGE_PMD_SHIFT ({ BUILD_BUG(); 0; }) #define HPAGE_PMD_MASK ({ BUILD_BUG(); 0; }) @@ -363,6 +396,11 @@ static inline bool thp_migration_supported(void) { return false; } + +static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) +{ + return 0; +} #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #endif /* _LINUX_HUGE_MM_H */ diff --git a/include/linux/swap.h b/include/linux/swap.h index c0c3b3c077d7..921abd07e13f 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -462,7 +462,7 @@ extern sector_t map_swap_page(struct page *, struct block_device **); extern sector_t swapdev_block(int, pgoff_t); extern int page_swapcount(struct page *); extern int __swap_count(swp_entry_t entry); -extern int __swp_swapcount(swp_entry_t entry); +extern int __swp_swapcount(swp_entry_t entry, int *entry_size); extern int swp_swapcount(swp_entry_t entry); extern struct swap_info_struct *page_swap_info(struct page *); extern struct swap_info_struct *swp_swap_info(swp_entry_t entry); @@ -589,7 +589,7 @@ static inline int __swap_count(swp_entry_t entry) return 0; } -static inline int __swp_swapcount(swp_entry_t entry) +static inline int __swp_swapcount(swp_entry_t entry, int *entry_size) { return 0; } diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 64123cefa978..f1358681db8f 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -620,32 +620,6 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf, } -/* - * always: directly stall for all thp allocations - * defer: wake kswapd and fail if not immediately available - * defer+madvise: wake kswapd and directly stall for MADV_HUGEPAGE, otherwise - * fail if not immediately available - * madvise: directly stall for MADV_HUGEPAGE, otherwise fail if not immediately - * available - * never: never stall for any thp allocation - */ -static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) -{ - const bool vma_madvised = !!(vma->vm_flags & VM_HUGEPAGE); - - if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags)) - return GFP_TRANSHUGE | (vma_madvised ? 0 : __GFP_NORETRY); - if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags)) - return GFP_TRANSHUGE_LIGHT | __GFP_KSWAPD_RECLAIM; - if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, &transparent_hugepage_flags)) - return GFP_TRANSHUGE_LIGHT | (vma_madvised ? __GFP_DIRECT_RECLAIM : - __GFP_KSWAPD_RECLAIM); - if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags)) - return GFP_TRANSHUGE_LIGHT | (vma_madvised ? __GFP_DIRECT_RECLAIM : - 0); - return GFP_TRANSHUGE_LIGHT; -} - /* Caller must hold page table lock. */ static bool set_huge_zero_page(pgtable_t pgtable, struct mm_struct *mm, struct vm_area_struct *vma, unsigned long haddr, pmd_t *pmd, diff --git a/mm/swap_state.c b/mm/swap_state.c index 8b2fd7b97e25..c2516056ec6d 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -385,7 +385,9 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, { struct page *found_page = NULL, *new_page = NULL; struct swap_info_struct *si; - int err; + int err, entry_size = 1; + swp_entry_t hentry; + *new_page_allocated = false; do { @@ -411,14 +413,40 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, * as SWAP_HAS_CACHE. That's done in later part of code or * else swap_off will be aborted if we return NULL. */ - if (!__swp_swapcount(entry) && swap_slot_cache_enabled) + if (!__swp_swapcount(entry, &entry_size) && + swap_slot_cache_enabled) break; /* * Get a new page to read into from swap. */ - if (!new_page) { - new_page = alloc_page_vma(gfp_mask, vma, addr); + if (!new_page || + (IS_ENABLED(CONFIG_THP_SWAP) && + hpage_nr_pages(new_page) != entry_size)) { + if (new_page) + put_page(new_page); + if (IS_ENABLED(CONFIG_THP_SWAP) && + entry_size == HPAGE_PMD_NR) { + gfp_t gfp = alloc_hugepage_direct_gfpmask(vma); + + /* + * Make sure huge page allocation flags are + * compatible with that of normal page + */ + VM_WARN_ONCE(gfp_mask & ~(gfp | __GFP_RECLAIM), + "ignoring gfp_mask bits: %x", + gfp_mask & ~(gfp | __GFP_RECLAIM)); + new_page = alloc_hugepage_vma(gfp, vma, + addr, HPAGE_PMD_ORDER); + if (new_page) + prep_transhuge_page(new_page); + hentry = swp_entry(swp_type(entry), + round_down(swp_offset(entry), + HPAGE_PMD_NR)); + } else { + new_page = alloc_page_vma(gfp_mask, vma, addr); + hentry = entry; + } if (!new_page) break; /* Out of memory */ } @@ -426,16 +454,18 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, /* * call radix_tree_preload() while we can wait. */ - err = radix_tree_maybe_preload(gfp_mask & GFP_KERNEL); + err = radix_tree_maybe_preload_order(gfp_mask & GFP_KERNEL, + compound_order(new_page)); if (err) break; /* * Swap entry may have been freed since our caller observed it. */ - err = swapcache_prepare(entry, 1); - if (err == -EEXIST) { + err = swapcache_prepare(hentry, entry_size); + if (err) radix_tree_preload_end(); + if (err == -EEXIST) { /* * We might race against get_swap_page() and stumble * across a SWAP_HAS_CACHE swap_map entry whose page @@ -443,32 +473,35 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, */ cond_resched(); continue; - } - if (err) { /* swp entry is obsolete ? */ - radix_tree_preload_end(); + } else if (err == -ENOTDIR) { + /* huge swap cluster has been split under us */ + continue; + } else if (err) { /* swp entry is obsolete ? */ break; } /* May fail (-ENOMEM) if radix-tree node allocation failed. */ __SetPageLocked(new_page); __SetPageSwapBacked(new_page); - err = __add_to_swap_cache(new_page, entry); + err = __add_to_swap_cache(new_page, hentry); + radix_tree_preload_end(); if (likely(!err)) { - radix_tree_preload_end(); /* * Initiate read into locked page and return. */ lru_cache_add_anon(new_page); *new_page_allocated = true; + if (IS_ENABLED(CONFIG_THP_SWAP)) + new_page += swp_offset(entry) & + (entry_size - 1); return new_page; } - radix_tree_preload_end(); __ClearPageLocked(new_page); /* * add_to_swap_cache() doesn't return -EEXIST, so we can safely * clear SWAP_HAS_CACHE flag. */ - put_swap_page(new_page, entry); + put_swap_page(new_page, hentry); } while (err != -ENOMEM); if (new_page) @@ -490,7 +523,7 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, vma, addr, &page_was_allocated); if (page_was_allocated) - swap_readpage(retpage, do_poll); + swap_readpage(compound_head(retpage), do_poll); return retpage; } @@ -609,8 +642,9 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, if (!page) continue; if (page_allocated) { - swap_readpage(page, false); - if (offset != entry_offset) { + swap_readpage(compound_head(page), false); + if (offset != entry_offset && + !PageTransCompound(page)) { SetPageReadahead(page); count_vm_event(SWAP_RA); } @@ -771,8 +805,8 @@ static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask, if (!page) continue; if (page_allocated) { - swap_readpage(page, false); - if (i != ra_info.offset) { + swap_readpage(compound_head(page), false); + if (i != ra_info.offset && !PageTransCompound(page)) { SetPageReadahead(page); count_vm_event(SWAP_RA); } diff --git a/mm/swapfile.c b/mm/swapfile.c index ef2b42c199c0..3fe50f1da0a0 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1542,7 +1542,8 @@ int __swap_count(swp_entry_t entry) return count; } -static int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry) +static int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry, + int *entry_size) { int count = 0; pgoff_t offset = swp_offset(entry); @@ -1550,6 +1551,8 @@ static int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry) ci = lock_cluster_or_swap_info(si, offset); count = swap_count(si->swap_map[offset]); + if (entry_size) + *entry_size = ci && cluster_is_huge(ci) ? SWAPFILE_CLUSTER : 1; unlock_cluster_or_swap_info(si, ci); return count; } @@ -1559,14 +1562,14 @@ static int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry) * This does not give an exact answer when swap count is continued, * but does include the high COUNT_CONTINUED flag to allow for that. */ -int __swp_swapcount(swp_entry_t entry) +int __swp_swapcount(swp_entry_t entry, int *entry_size) { int count = 0; struct swap_info_struct *si; si = get_swap_device(entry); if (si) { - count = swap_swapcount(si, entry); + count = swap_swapcount(si, entry, entry_size); put_swap_device(si); } return count; From patchwork Wed Sep 12 00:44:02 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10596537 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3C93D6CB for ; Wed, 12 Sep 2018 00:44:48 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 293B129AC1 for ; Wed, 12 Sep 2018 00:44:48 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1AA1029AD2; Wed, 12 Sep 2018 00:44:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 48A6629AC1 for ; Wed, 12 Sep 2018 00:44:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CA4AB8E000B; Tue, 11 Sep 2018 20:44:45 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C2E658E0001; Tue, 11 Sep 2018 20:44:45 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A31898E000B; Tue, 11 Sep 2018 20:44:45 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f200.google.com (mail-pl1-f200.google.com [209.85.214.200]) by kanga.kvack.org (Postfix) with ESMTP id 5EDBD8E0001 for ; Tue, 11 Sep 2018 20:44:45 -0400 (EDT) Received: by mail-pl1-f200.google.com with SMTP id g36-v6so118478plb.5 for ; Tue, 11 Sep 2018 17:44:45 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=57A9aVY/asz8xjaCXzfJfIA+hQY+nfkWm9eYbnzQeS4=; b=uPHEH1RLV+ebHBrPIWjZJnGoen8sls6LbfwRWa0C6nkKTkfnD+OdwKmn9i8Q+Ti8d6 nmIXkCF4j2OACmKbHAPfQjvd+/PzFX5yyBbrTTEBltLOjDN12/8A61Z8dfyBcbNv0lg6 n96d9IwSCXoV9+PbByQYXSBD2/aY+LBLZjY4KiaUrYL/XbulvYvgcfT3TSzU3ojO99ta ivx8T0NdArYRmBCDlD1hei1EewSPlBX0bdPnsyNB1pLGiaeFichREDgfYWrby6+gaHGs U7xkQA6q9OZuqaZCSvS/FA9LkolsjeKNcaVu+puxkN8Cp4kDGVdx05uRkyF/K+8Ocp32 fTcw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51A4q+SR+e4pacdwrHO5b0gGCzKbfzvWf/86ajolqiJ1+tyh/LFe wvIjDrUt12VB34Tc229h5krNodxbSWziHT9se+EpyFrgS7vOdqA7Hvog/Ji55aqpH0KmHJlgLZt dTuJn7lOfRqWigtK57a2dNrX03tmPE3i8MzCNxsUWL1MyEY22rieW4I12Q0YEpMNWMQ== X-Received: by 2002:a63:4c54:: with SMTP id m20-v6mr30921152pgl.292.1536713085041; Tue, 11 Sep 2018 17:44:45 -0700 (PDT) X-Google-Smtp-Source: ANB0VdbOpTN7FFNLSnd4j0PUCgSbXwBbfh6jpsja/DtbxB/71PTHbSuSRhpBqCnG+RRJgZfUfQoN X-Received: by 2002:a63:4c54:: with SMTP id m20-v6mr30921111pgl.292.1536713084244; Tue, 11 Sep 2018 17:44:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536713084; cv=none; d=google.com; s=arc-20160816; b=bgiTpMWLOlYlJOqzid6prvzHZtM86GZUQu/VhpN7B5FX0j+WyVjIe6GKVl8ABq46r3 CSTk3GQuJ4arhZYOUQ8k3vRYYvmTpkY/RNYYZVXwXVYrm6QePI1V8TWbZkOjaSss9ENi ZryOtFkATg1AXcg0oO4wrqVzXpcKJ6DsUSXwRkPzbuhehXUTO7LmGnM03si1xcx8N2mu 0EbFcRp3NaX4pJq4bHPcxth+j4zdfwxH5ctQvsB8///7tYjcnbzCnvAhwVc+tn5vf+6C b/K8kspJAtzXoQqBa2uTpr8cZNLod51EdJuE9Bf5faQDPtxq8F9M1Pp/CNQTqWYwAPzD 8jzA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=57A9aVY/asz8xjaCXzfJfIA+hQY+nfkWm9eYbnzQeS4=; b=kS+XF9F20IhPpXrFEvLCdIjfbOt8lBCKEz6cflQI5Oo4ZHg+In8F/HbNcNNrBfUL40 bzUjPlwCQy+yaN39/wtfewuk2lBzGAVsoOaSt6LZcDDQbhjMAFtj1+WvilpVACQQ145K jGwNp83ApDD6KEMZkSeK4JEaP6x8Yab4nZZJeDz1ceVLHSVDcdssnUGlYxYJyB0GksPU 0rC+Ou5glVQeRWmqi/vIwoGj4Cuil7/x8hGk8iwUVLVOGrXSMgLwOHazna4On6P6jZAm hK4CW0Pb14lQ8JiW1eIRtrWX5CUdoXOco3tWdCcArbq/9Pqy0J7JGwEcCKAqEJ3Ni1Bc iv8Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga09.intel.com (mga09.intel.com. [134.134.136.24]) by mx.google.com with ESMTPS id p21-v6si20648717plq.338.2018.09.11.17.44.44 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 11 Sep 2018 17:44:44 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) client-ip=134.134.136.24; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Sep 2018 17:44:43 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,362,1531810800"; d="scan'208";a="69283864" Received: from unknown (HELO yhuang-mobile.sh.intel.com) ([10.239.198.87]) by fmsmga007.fm.intel.com with ESMTP; 11 Sep 2018 17:44:39 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 RESEND 09/21] swap: Swapin a THP in one piece Date: Wed, 12 Sep 2018 08:44:02 +0800 Message-Id: <20180912004414.22583-10-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180912004414.22583-1-ying.huang@intel.com> References: <20180912004414.22583-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP With this patch, when page fault handler find a PMD swap mapping, it will swap in a THP in one piece. This avoids the overhead of splitting/collapsing before/after the THP swapping. And improves the swap performance greatly for reduced page fault count etc. do_huge_pmd_swap_page() is added in the patch to implement this. It is similar to do_swap_page() for normal page swapin. If failing to allocate a THP, the huge swap cluster and the PMD swap mapping will be split to fallback to normal page swapin. If the huge swap cluster has been split already, the PMD swap mapping will be split to fallback to normal page swapin. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/huge_mm.h | 9 +++ mm/huge_memory.c | 174 ++++++++++++++++++++++++++++++++++++++++++++++++ mm/memory.c | 16 +++-- 3 files changed, 193 insertions(+), 6 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 3fdb29bc250c..c2b8ced6fc2b 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -403,4 +403,13 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +#ifdef CONFIG_THP_SWAP +extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd); +#else /* CONFIG_THP_SWAP */ +static inline int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) +{ + return 0; +} +#endif /* CONFIG_THP_SWAP */ + #endif /* _LINUX_HUGE_MM_H */ diff --git a/mm/huge_memory.c b/mm/huge_memory.c index f1358681db8f..4dbc4f933c4f 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -33,6 +33,8 @@ #include #include #include +#include +#include #include #include @@ -1617,6 +1619,178 @@ static void __split_huge_swap_pmd(struct vm_area_struct *vma, pmd_populate(mm, pmd, pgtable); } +#ifdef CONFIG_THP_SWAP +static int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, pmd_t orig_pmd) +{ + struct mm_struct *mm = vma->vm_mm; + spinlock_t *ptl; + int ret = 0; + + ptl = pmd_lock(mm, pmd); + if (pmd_same(*pmd, orig_pmd)) + __split_huge_swap_pmd(vma, address & HPAGE_PMD_MASK, pmd); + else + ret = -ENOENT; + spin_unlock(ptl); + + return ret; +} + +int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) +{ + struct page *page; + struct mem_cgroup *memcg; + struct vm_area_struct *vma = vmf->vma; + unsigned long haddr = vmf->address & HPAGE_PMD_MASK; + swp_entry_t entry; + pmd_t pmd; + int i, locked, exclusive = 0, ret = 0; + + entry = pmd_to_swp_entry(orig_pmd); + VM_BUG_ON(non_swap_entry(entry)); + delayacct_set_flag(DELAYACCT_PF_SWAPIN); +retry: + page = lookup_swap_cache(entry, NULL, vmf->address); + if (!page) { + page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE, vma, + haddr, false); + if (!page) { + /* + * Back out if somebody else faulted in this pmd + * while we released the pmd lock. + */ + if (likely(pmd_same(*vmf->pmd, orig_pmd))) { + /* + * Failed to allocate huge page, split huge swap + * cluster, and fallback to swapin normal page + */ + ret = split_swap_cluster(entry, 0); + /* Somebody else swapin the swap entry, retry */ + if (ret == -EEXIST) { + ret = 0; + goto retry; + /* swapoff occurs under us */ + } else if (ret == -EINVAL) + ret = 0; + else + goto fallback; + } + delayacct_clear_flag(DELAYACCT_PF_SWAPIN); + goto out; + } + + /* Had to read the page from swap area: Major fault */ + ret = VM_FAULT_MAJOR; + count_vm_event(PGMAJFAULT); + count_memcg_event_mm(vma->vm_mm, PGMAJFAULT); + } else if (!PageTransCompound(page)) + goto fallback; + + locked = lock_page_or_retry(page, vma->vm_mm, vmf->flags); + + delayacct_clear_flag(DELAYACCT_PF_SWAPIN); + if (!locked) { + ret |= VM_FAULT_RETRY; + goto out_release; + } + + /* + * Make sure try_to_free_swap or reuse_swap_page or swapoff did not + * release the swapcache from under us. The page pin, and pmd_same + * test below, are not enough to exclude that. Even if it is still + * swapcache, we need to check that the page's swap has not changed. + */ + if (unlikely(!PageSwapCache(page) || page_private(page) != entry.val)) + goto out_page; + + if (mem_cgroup_try_charge_delay(page, vma->vm_mm, GFP_KERNEL, + &memcg, true)) { + ret = VM_FAULT_OOM; + goto out_page; + } + + /* + * Back out if somebody else already faulted in this pmd. + */ + vmf->ptl = pmd_lockptr(vma->vm_mm, vmf->pmd); + spin_lock(vmf->ptl); + if (unlikely(!pmd_same(*vmf->pmd, orig_pmd))) + goto out_nomap; + + if (unlikely(!PageUptodate(page))) { + ret = VM_FAULT_SIGBUS; + goto out_nomap; + } + + /* + * The page isn't present yet, go ahead with the fault. + * + * Be careful about the sequence of operations here. + * To get its accounting right, reuse_swap_page() must be called + * while the page is counted on swap but not yet in mapcount i.e. + * before page_add_anon_rmap() and swap_free(); try_to_free_swap() + * must be called after the swap_free(), or it will never succeed. + */ + + add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR); + add_mm_counter(vma->vm_mm, MM_SWAPENTS, -HPAGE_PMD_NR); + pmd = mk_huge_pmd(page, vma->vm_page_prot); + if ((vmf->flags & FAULT_FLAG_WRITE) && reuse_swap_page(page, NULL)) { + pmd = maybe_pmd_mkwrite(pmd_mkdirty(pmd), vma); + vmf->flags &= ~FAULT_FLAG_WRITE; + ret |= VM_FAULT_WRITE; + exclusive = RMAP_EXCLUSIVE; + } + for (i = 0; i < HPAGE_PMD_NR; i++) + flush_icache_page(vma, page + i); + if (pmd_swp_soft_dirty(orig_pmd)) + pmd = pmd_mksoft_dirty(pmd); + do_page_add_anon_rmap(page, vma, haddr, + exclusive | RMAP_COMPOUND); + mem_cgroup_commit_charge(page, memcg, true, true); + activate_page(page); + set_pmd_at(vma->vm_mm, haddr, vmf->pmd, pmd); + + swap_free(entry, HPAGE_PMD_NR); + if (mem_cgroup_swap_full(page) || + (vma->vm_flags & VM_LOCKED) || PageMlocked(page)) + try_to_free_swap(page); + unlock_page(page); + + if (vmf->flags & FAULT_FLAG_WRITE) { + spin_unlock(vmf->ptl); + ret |= do_huge_pmd_wp_page(vmf, pmd); + if (ret & VM_FAULT_ERROR) + ret &= VM_FAULT_ERROR; + goto out; + } + + /* No need to invalidate - it was non-present before */ + update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); + spin_unlock(vmf->ptl); +out: + return ret; +out_nomap: + mem_cgroup_cancel_charge(page, memcg, true); + spin_unlock(vmf->ptl); +out_page: + unlock_page(page); +out_release: + put_page(page); + return ret; +fallback: + delayacct_clear_flag(DELAYACCT_PF_SWAPIN); + if (!split_huge_swap_pmd(vmf->vma, vmf->pmd, vmf->address, orig_pmd)) + ret = VM_FAULT_FALLBACK; + else + ret = 0; + if (page) + put_page(page); + return ret; +} +#endif + /* * Return true if we do MADV_FREE successfully on entire pmd page. * Otherwise, return false. diff --git a/mm/memory.c b/mm/memory.c index e01e27afd2e8..eddc968de51e 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4083,13 +4083,17 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, barrier(); if (unlikely(is_swap_pmd(orig_pmd))) { - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(orig_pmd)); - if (is_pmd_migration_entry(orig_pmd)) + if (thp_migration_supported() && + is_pmd_migration_entry(orig_pmd)) { pmd_migration_entry_wait(mm, vmf.pmd); - return 0; - } - if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) { + return 0; + } else if (IS_ENABLED(CONFIG_THP_SWAP)) { + ret = do_huge_pmd_swap_page(&vmf, orig_pmd); + if (!(ret & VM_FAULT_FALLBACK)) + return ret; + } else + VM_BUG_ON(1); + } else if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) { if (pmd_protnone(orig_pmd) && vma_is_accessible(vma)) return do_huge_pmd_numa_page(&vmf, orig_pmd); From patchwork Wed Sep 12 00:44:03 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10596539 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 959E46CB for ; Wed, 12 Sep 2018 00:44:52 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 84CF029AC1 for ; Wed, 12 Sep 2018 00:44:52 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 78F5929AD2; Wed, 12 Sep 2018 00:44:52 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D84BB29AC1 for ; Wed, 12 Sep 2018 00:44:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4B39E8E000C; Tue, 11 Sep 2018 20:44:50 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 43D318E0001; Tue, 11 Sep 2018 20:44:50 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2E0428E000C; Tue, 11 Sep 2018 20:44:50 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by kanga.kvack.org (Postfix) with ESMTP id DF6D98E0001 for ; Tue, 11 Sep 2018 20:44:49 -0400 (EDT) Received: by mail-pf1-f200.google.com with SMTP id j15-v6so119397pfi.10 for ; Tue, 11 Sep 2018 17:44:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=RqkbXKhEemBTkpwZgOb7X0CwZNzEi3dJiNai0dI1Epo=; b=c/gnzyOJKUr8zRXNDZn1zw/k/3E/GOhNhOt96c8A8o3ttZc/FntK5YkzroTiHfRm2y LDKRdSbW98huvh1kSqJdTDBBSnmSGN5hho7rfroRyJGdbCsdPQ9uUs4CbVDF9szj6UiS vIRARHbnSZFK/fUk8jIwmcOzsZwzi0o9gm9pNyybhALiCEPePT2aprhTHyAqu37hMcPe etK4eyF37s+qsecUE8ZPapls7AHUIjockgZ0CxLMPvAzt4NbI1kxVXQKTpFz+raoQhP8 Qr/QVXfw2SEIZ9kevAm4Fxb9UEro0VUk6ATKtWR3yytxWTEpY7P2s9db1yh9j4vGNcA2 NUTg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51Cp5v8Tji8nQHubRAOei2b6tylqy3zl4cEIYBgA7bQoY0oXMtF1 vZbTRBBt2x8OZxBQr7BKE2CUAeHaz7+Cx3s4t2yh7N8qOej7JV4oN2f12/rnc54f8RTNfWl6zny 1KOiMHlG4Z9xSKHfEqb33fY6YtPC7iIctUQMFhaNdQsfFyvhwGbps1Rwm5AHukqkcHw== X-Received: by 2002:a62:4dc1:: with SMTP id a184-v6mr32145037pfb.5.1536713089577; Tue, 11 Sep 2018 17:44:49 -0700 (PDT) X-Google-Smtp-Source: ANB0VdbfsnDysDZauXCXkNZD5En0degKLuELcV+4iGW4HSnL+CFYTGrNzqFKSBlkbp/Y+JSvP6Ts X-Received: by 2002:a62:4dc1:: with SMTP id a184-v6mr32144997pfb.5.1536713088822; Tue, 11 Sep 2018 17:44:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536713088; cv=none; d=google.com; s=arc-20160816; b=BGvVXdGm83F/6KLLreWJQP+asZ+s28pfod45Tf5XnlZaIiOhTQo6kstUEer+GOHNcl piF7+WgODoPNnGhYJAiUQUI6aWf5L7B8zTM2aFfxvXbNiuQTIAVVz6QvMdgrQO2DYFTV SwjYWaxsW7vi9TDtUDxTLkC4xo5eDr0OrDhE9kYu3IzhClJcO2Zq7NicVROA7GN/+EX2 ouIDYQAv3VyYeHfE7bwEmpUT27PqGRuWNl32/h13QnaYLgDgOoZQ2uNMH09yO3/v4ZV4 1//OTzCNuRSkHpnvhBQk1+SUlcSfgUB1PjdYY0gVwRuPXLyheBd0K6eyg0agtnvLNkoe kJZQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=RqkbXKhEemBTkpwZgOb7X0CwZNzEi3dJiNai0dI1Epo=; b=TvFNxvU5p6lx2+JBJRxlFF0sUXj8nxNPsUcZnvRa/krWTgltR08y86Eyl8dB0SL8tu 9beD2xxH4f1HGuQUaqqsLYWAFbsb2nNvVRz3l0ZQjI6eauIcXZuu3nPhJbNFxjjyVHLb BcVIhzJYtPZqf//ejbDHNsNllR+09T2BMh//AziY70Rv91NqJ0Od5mQaJU7X49HddCa1 20QL8yjMjU2yFLW/i7NDeUh4STY7Z1+j9jM2W/sH/1YKAKAAp4eoiJXkDCA90tq38UbQ VoHWes30xD3Xfbr5u86GcQEkzhr3wAWf6eK4x5nK6owAeO7r2XASHWwntn1m7yfsa0cz QQ2w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga09.intel.com (mga09.intel.com. [134.134.136.24]) by mx.google.com with ESMTPS id p21-v6si20648717plq.338.2018.09.11.17.44.48 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 11 Sep 2018 17:44:48 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) client-ip=134.134.136.24; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Sep 2018 17:44:48 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,362,1531810800"; d="scan'208";a="69283879" Received: from unknown (HELO yhuang-mobile.sh.intel.com) ([10.239.198.87]) by fmsmga007.fm.intel.com with ESMTP; 11 Sep 2018 17:44:44 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 RESEND 10/21] swap: Support to count THP swapin and its fallback Date: Wed, 12 Sep 2018 08:44:03 +0800 Message-Id: <20180912004414.22583-11-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180912004414.22583-1-ying.huang@intel.com> References: <20180912004414.22583-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP 2 new /proc/vmstat fields are added, "thp_swapin" and "thp_swapin_fallback" to count swapin a THP from swap device in one piece and fallback to normal page swapin. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- Documentation/admin-guide/mm/transhuge.rst | 8 ++++++++ include/linux/vm_event_item.h | 2 ++ mm/huge_memory.c | 4 +++- mm/page_io.c | 15 ++++++++++++--- mm/vmstat.c | 2 ++ 5 files changed, 27 insertions(+), 4 deletions(-) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index 7ab93a8404b9..85e33f785fd7 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -364,6 +364,14 @@ thp_swpout_fallback Usually because failed to allocate some continuous swap space for the huge page. +thp_swpin + is incremented every time a huge page is swapin in one piece + without splitting. + +thp_swpin_fallback + is incremented if a huge page has to be split during swapin. + Usually because failed to allocate a huge page. + As the system ages, allocating huge pages may be expensive as the system uses memory compaction to copy data around memory to free a huge page for use. There are some counters in ``/proc/vmstat`` to help diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index 5c7f010676a7..7b438548a78e 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -88,6 +88,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, THP_ZERO_PAGE_ALLOC_FAILED, THP_SWPOUT, THP_SWPOUT_FALLBACK, + THP_SWPIN, + THP_SWPIN_FALLBACK, #endif #ifdef CONFIG_MEMORY_BALLOON BALLOON_INFLATE, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 4dbc4f933c4f..1232ade5deca 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1673,8 +1673,10 @@ int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) /* swapoff occurs under us */ } else if (ret == -EINVAL) ret = 0; - else + else { + count_vm_event(THP_SWPIN_FALLBACK); goto fallback; + } } delayacct_clear_flag(DELAYACCT_PF_SWAPIN); goto out; diff --git a/mm/page_io.c b/mm/page_io.c index aafd19ec1db4..362254b99955 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -348,6 +348,15 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc, return ret; } +static inline void count_swpin_vm_event(struct page *page) +{ +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + if (unlikely(PageTransHuge(page))) + count_vm_event(THP_SWPIN); +#endif + count_vm_events(PSWPIN, hpage_nr_pages(page)); +} + int swap_readpage(struct page *page, bool synchronous) { struct bio *bio; @@ -371,7 +380,7 @@ int swap_readpage(struct page *page, bool synchronous) ret = mapping->a_ops->readpage(swap_file, page); if (!ret) - count_vm_event(PSWPIN); + count_swpin_vm_event(page); return ret; } @@ -382,7 +391,7 @@ int swap_readpage(struct page *page, bool synchronous) unlock_page(page); } - count_vm_event(PSWPIN); + count_swpin_vm_event(page); return 0; } @@ -401,7 +410,7 @@ int swap_readpage(struct page *page, bool synchronous) get_task_struct(current); bio->bi_private = current; bio_set_op_attrs(bio, REQ_OP_READ, 0); - count_vm_event(PSWPIN); + count_swpin_vm_event(page); bio_get(bio); qc = submit_bio(bio); while (synchronous) { diff --git a/mm/vmstat.c b/mm/vmstat.c index 8ba0870ecddd..ac04801bb0cb 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1263,6 +1263,8 @@ const char * const vmstat_text[] = { "thp_zero_page_alloc_failed", "thp_swpout", "thp_swpout_fallback", + "thp_swpin", + "thp_swpin_fallback", #endif #ifdef CONFIG_MEMORY_BALLOON "balloon_inflate", From patchwork Wed Sep 12 00:44:04 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10596541 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EC2C4109C for ; Wed, 12 Sep 2018 00:44:57 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DB81D29AC1 for ; Wed, 12 Sep 2018 00:44:57 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CFA4129AD2; Wed, 12 Sep 2018 00:44:57 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9433C29AC1 for ; Wed, 12 Sep 2018 00:44:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B6DA78E000D; Tue, 11 Sep 2018 20:44:54 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id AF3F08E0001; Tue, 11 Sep 2018 20:44:54 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9701C8E000D; Tue, 11 Sep 2018 20:44:54 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) by kanga.kvack.org (Postfix) with ESMTP id 53D678E0001 for ; Tue, 11 Sep 2018 20:44:54 -0400 (EDT) Received: by mail-pf1-f198.google.com with SMTP id o27-v6so123953pfj.6 for ; Tue, 11 Sep 2018 17:44:54 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=FsoeYDEPIBJ3Fa66biCsw2N/KKbHJ3ouD3TlHFqBnJE=; b=CF6TK5eFwRtQSGY/Ao+Bk447Yf+aTVWjm1rtyo9eaDmnv4iLQzEZZsr5opjGDdUBqH NrvZD4V9ukjcj14qmVL5TPW3WbK+QNhPI+u8Q9FuBj1NvoAhWkXVxEdC41ytTe6N5v3H /J7QY2n2wy6IrwZVSuh/IxtTCg84deJFXDqnLW0Ue8GQ8LItkk7WLAjcMjNu1UJcZEQ7 pWucM0iOuayD1UJW1n1qVdI20PNQUcKzcS36v/VVqsdtMPEEd2dw+BLY1BXsaGFoaLbH acTnDg2SNgM2t5OtmC2rgxkQxvd+fNKLaQiBGbdmGNuvmlKZHGdP6hYfvcq0bvSpgs3a vTEQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51DJWMgcFKFHTLfKsuTANUomv/vjsmraeAytOhJ8sZnAi8YrZ6sZ 20ZD7T4180dMEIEjSjxqT2lfwIoDi5Frq83mW2OV2LaBqTK9TSQJjblKMFBWv665YemkFwG9mgP RcCbYiGBdF5m57ZD8sdOUQgE1VimqKJN9PNk5SLTZcS1Vgn882qwJ8j2rt5gKRDnh5g== X-Received: by 2002:a62:46c8:: with SMTP id o69-v6mr32443693pfi.21.1536713094016; Tue, 11 Sep 2018 17:44:54 -0700 (PDT) X-Google-Smtp-Source: ANB0VdbCWRP5w+oXXAzWZ9VrS8qyuvNx0MSEa2d8US31RkOKkPHky7OVPO6vj04+t8NZ0jrxNIsj X-Received: by 2002:a62:46c8:: with SMTP id o69-v6mr32443645pfi.21.1536713093105; Tue, 11 Sep 2018 17:44:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536713093; cv=none; d=google.com; s=arc-20160816; b=KLuzgav8RzAdpubHYOgCh9mK4nMkVSdQm8Z09Ju+7iaC0tPXNDSdiCx/vw/+AvXch5 mRqw7lNCp128ckvnfY9U+CNuQwz/DEnE7F7q4b01j/H+LO9/C1qjQDWdQKq4fVy0CQMl KxHwInxFK5j5NtdN396HXgRE9DWR9r0ykcbNP32bZzgbooX6GkIcyq3ltAbOHfOGaVyI VEEzS/eOr58PuDXn5yZIbCx9GTdFIwn5z9CghC1FAtOFIgEodq6lJU3jML1Jp/TKMo2A 6hcXAf3CUjdDLi4Vt1R5GfV3QTkCSH0sbu9cFdEEt/XrgfEoiYaWXbOGNZWJw4DCr2hP bZQw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=FsoeYDEPIBJ3Fa66biCsw2N/KKbHJ3ouD3TlHFqBnJE=; b=bZz/ebirWrNe0IOndjqArE9dNHMOkjHK39l6RiT4KEZcZUlgT5dYko6eD2URIR8as4 S62YVojzaek0wTMdMahRUJhpQ7lR3XPLkMRrdovzUSlQAQHOZ5idiAC09ZlKY6lf3KiO 45HrFJq43wAPIEhpLC8Xq4TXDc1u/zLH2cMmPhmU5lTusAxQNvNIFtKQGxE4zDwsgwlT Kp2gJ7AdRzjU4ZNVMTVEpZrlFoXqdQcoM3FJUwckQUqrefqEsFWQddXLCSP96hznPQEg QK3kEGOX4kO3Hx2wGx6ju4K5dIotGUFXZDDTPFD8I2yCJELARMePSrdMVX+gG5OhPLvq IAVA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga04.intel.com (mga04.intel.com. [192.55.52.120]) by mx.google.com with ESMTPS id d12-v6si21574651pla.421.2018.09.11.17.44.52 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 11 Sep 2018 17:44:53 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.120 as permitted sender) client-ip=192.55.52.120; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Sep 2018 17:44:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,362,1531810800"; d="scan'208";a="69283887" Received: from unknown (HELO yhuang-mobile.sh.intel.com) ([10.239.198.87]) by fmsmga007.fm.intel.com with ESMTP; 11 Sep 2018 17:44:48 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 RESEND 11/21] swap: Add sysfs interface to configure THP swapin Date: Wed, 12 Sep 2018 08:44:04 +0800 Message-Id: <20180912004414.22583-12-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180912004414.22583-1-ying.huang@intel.com> References: <20180912004414.22583-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Swapin a THP as a whole isn't desirable in some situations. For example, for completely random access pattern, swapin a THP in one piece will inflate the reading greatly. So a sysfs interface: /sys/kernel/mm/transparent_hugepage/swapin_enabled is added to configure it. Three options as follow are provided, - always: THP swapin will be enabled always - madvise: THP swapin will be enabled only for VMA with VM_HUGEPAGE flag set. - never: THP swapin will be disabled always The default configuration is: madvise. During page fault, if a PMD swap mapping is found and THP swapin is disabled, the huge swap cluster and the PMD swap mapping will be split and fallback to normal page swapin. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- Documentation/admin-guide/mm/transhuge.rst | 21 +++++++ include/linux/huge_mm.h | 31 ++++++++++ mm/huge_memory.c | 94 ++++++++++++++++++++++++------ 3 files changed, 127 insertions(+), 19 deletions(-) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index 85e33f785fd7..23aefb17101c 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -160,6 +160,27 @@ Some userspace (such as a test program, or an optimized memory allocation cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size +Transparent hugepage may be swapout and swapin in one piece without +splitting. This will improve the utility of transparent hugepage but +may inflate the read/write too. So whether to enable swapin +transparent hugepage in one piece can be configured as follow. + + echo always >/sys/kernel/mm/transparent_hugepage/swapin_enabled + echo madvise >/sys/kernel/mm/transparent_hugepage/swapin_enabled + echo never >/sys/kernel/mm/transparent_hugepage/swapin_enabled + +always + Attempt to allocate a transparent huge page and read it from + swap space in one piece every time. + +never + Always split the swap space and PMD swap mapping and swapin + the fault normal page during swapin. + +madvise + Only swapin the transparent huge page in one piece for + MADV_HUGEPAGE madvise regions. + khugepaged will be automatically started when transparent_hugepage/enabled is set to "always" or "madvise, and it'll be automatically shutdown if it's set to "never". diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index c2b8ced6fc2b..9dedff974def 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -63,6 +63,8 @@ enum transparent_hugepage_flag { #ifdef CONFIG_DEBUG_VM TRANSPARENT_HUGEPAGE_DEBUG_COW_FLAG, #endif + TRANSPARENT_HUGEPAGE_SWAPIN_FLAG, + TRANSPARENT_HUGEPAGE_SWAPIN_REQ_MADV_FLAG, }; struct kobject; @@ -405,11 +407,40 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) #ifdef CONFIG_THP_SWAP extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd); + +static inline bool transparent_hugepage_swapin_enabled( + struct vm_area_struct *vma) +{ + if (vma->vm_flags & VM_NOHUGEPAGE) + return false; + + if (is_vma_temporary_stack(vma)) + return false; + + if (test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) + return false; + + if (transparent_hugepage_flags & + (1 << TRANSPARENT_HUGEPAGE_SWAPIN_FLAG)) + return true; + + if (transparent_hugepage_flags & + (1 << TRANSPARENT_HUGEPAGE_SWAPIN_REQ_MADV_FLAG)) + return !!(vma->vm_flags & VM_HUGEPAGE); + + return false; +} #else /* CONFIG_THP_SWAP */ static inline int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) { return 0; } + +static inline bool transparent_hugepage_swapin_enabled( + struct vm_area_struct *vma) +{ + return false; +} #endif /* CONFIG_THP_SWAP */ #endif /* _LINUX_HUGE_MM_H */ diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 1232ade5deca..c4a766243a8f 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -57,7 +57,8 @@ unsigned long transparent_hugepage_flags __read_mostly = #endif (1<address); if (!page) { + if (!transparent_hugepage_swapin_enabled(vma)) + goto split; + page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE, vma, haddr, false); if (!page) { @@ -1660,24 +1714,8 @@ int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) * Back out if somebody else faulted in this pmd * while we released the pmd lock. */ - if (likely(pmd_same(*vmf->pmd, orig_pmd))) { - /* - * Failed to allocate huge page, split huge swap - * cluster, and fallback to swapin normal page - */ - ret = split_swap_cluster(entry, 0); - /* Somebody else swapin the swap entry, retry */ - if (ret == -EEXIST) { - ret = 0; - goto retry; - /* swapoff occurs under us */ - } else if (ret == -EINVAL) - ret = 0; - else { - count_vm_event(THP_SWPIN_FALLBACK); - goto fallback; - } - } + if (likely(pmd_same(*vmf->pmd, orig_pmd))) + goto split; delayacct_clear_flag(DELAYACCT_PF_SWAPIN); goto out; } @@ -1790,6 +1828,24 @@ int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) if (page) put_page(page); return ret; +split: + /* + * Failed to allocate huge page, split huge swap cluster, and + * fallback to swapin normal page + */ + ret = split_swap_cluster(entry, 0); + /* Somebody else swapin the swap entry, retry */ + if (ret == -EEXIST) { + ret = 0; + goto retry; + } + /* swapoff occurs under us */ + if (ret == -EINVAL) { + delayacct_clear_flag(DELAYACCT_PF_SWAPIN); + return 0; + } + count_vm_event(THP_SWPIN_FALLBACK); + goto fallback; } #endif From patchwork Wed Sep 12 00:44:05 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10596543 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B15C86CB for ; Wed, 12 Sep 2018 00:45:01 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A066D29AC1 for ; Wed, 12 Sep 2018 00:45:01 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9249E29AD2; Wed, 12 Sep 2018 00:45:01 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BA33029AC1 for ; Wed, 12 Sep 2018 00:45:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 10CF28E000E; Tue, 11 Sep 2018 20:44:59 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 0962C8E0001; Tue, 11 Sep 2018 20:44:59 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E78DF8E000E; Tue, 11 Sep 2018 20:44:58 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f197.google.com (mail-pf1-f197.google.com [209.85.210.197]) by kanga.kvack.org (Postfix) with ESMTP id A498C8E0001 for ; Tue, 11 Sep 2018 20:44:58 -0400 (EDT) Received: by mail-pf1-f197.google.com with SMTP id a23-v6so104270pfo.23 for ; Tue, 11 Sep 2018 17:44:58 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=nwJi2pJbzegm6L+TxcMzBBo0dLOptCpzoK26qw9cwsA=; b=P8X7hLisDxh5oUy56Q9nPOSEUpctqTe30FnYD+4/wQbeFeyn9ldzNFrm41mFfhe6T1 fn5ofWWKrT97JENkqT9le/Q7WKTb2RUB/37sUP7per0na81MHyrOTNWw0OlOZeZ+zAUe U83UMkokzKRQDcl9ihGSBlfB/1yLk4ohrV0lVngs7drqzeWXjwiJoYpEz4qB9oq/zhOr qvv6F6F8lEYgQS+I58KXu0KjOP+MLas3vP7NeGrHpKFVt+6neCJMtWC1h0jyzIRmCtAA 544aIolOdtyHTOJkZALwc8bLcqt2trJR1rwy+NCKysaOhzhi3/tP6ZudC1Qs5kV4O6uC lsPA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51AMGoUGLrhlmsEH8j7c50+XpE2FpLACDHh6gSObPNRdHTKty1rU MfQCGhy2bwzO9Ww+S3tXKy28erPyO6AdsKKXsKKwaW8F3SwmYCwjooGPuY4sYtLvXZt3S8QUkFV Qb9IIVCMzjeJbKEdzL0vNgLm3oED6VdWuRdzojRah6JO5a8cWQf4bThz7ioyqMu2Tow== X-Received: by 2002:a62:cac5:: with SMTP id y66-v6mr31991649pfk.187.1536713098339; Tue, 11 Sep 2018 17:44:58 -0700 (PDT) X-Google-Smtp-Source: ANB0Vdbv4j13/cyL/U5c7njKXQRUCMkmxCylYtRPX+NCTysCNSxDPekX2BFjK88m7UX4Bv9OrUBg X-Received: by 2002:a62:cac5:: with SMTP id y66-v6mr31991612pfk.187.1536713097474; Tue, 11 Sep 2018 17:44:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536713097; cv=none; d=google.com; s=arc-20160816; b=ASBZ+QR128NPwn5OlmPQudd0KexCfIW9SX/HOlcQPAmG85r3xSXBQVP7fxr6roR7w8 C2oP+W2X1fM2VJ1UP2teEL4PF1UfzK47L92c4H8JJz7W5VOtvSaS4THgZkaD/+7ikBQz fK47nkIlCiD2PHXWOMLu92cF9tVqX6Sg74/zLy3LEtcVb9Zt74D3UVe1m2K+E5xR/p/J n8l4jizXxzJBrq+w2gfZJEeZm/yD2vWYCwwCFJvE1lGLaaEjrcSk3xfomGNd2RrXrgc1 arLC0NcbTBKgKr2nnpzs6tz3TyUqWK5muaM9vLkg3Bm+Z46kAV2Zx1nUZe65m4e7P4ss sZKw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=nwJi2pJbzegm6L+TxcMzBBo0dLOptCpzoK26qw9cwsA=; b=GBk4NKu4SkEPQBjSLQOuRxam9feMr6vW1vHAy1ql/bmw1tigLaj51oEETda6s+AlZ3 88eFjooukaEPz9xfcyZ84FSOf1RerEyAjEBRvRBKLG35dsYru6bSa5WEI9eG1hOiylId Nuqe7zc7csGfm/ZpcJajcwltMqGBC7WT7mekKt3hetz932qRXy8e1arGHBHf8b9d0ns/ d7AzCkwi8hNGV/mrmaKq9ZWaVBBe/+rZ69UdAdvljDjDXOt3EIkLgLwISLeQXA71lrCj LQFc8/23vythEKcPM4u5i2DbzHPalEohyQunKESAtZLR5A+aY0KR+nb0cL4WtFhFHYcd Fx6g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga04.intel.com (mga04.intel.com. [192.55.52.120]) by mx.google.com with ESMTPS id d12-v6si21574651pla.421.2018.09.11.17.44.57 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 11 Sep 2018 17:44:57 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.120 as permitted sender) client-ip=192.55.52.120; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Sep 2018 17:44:57 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,362,1531810800"; d="scan'208";a="69283900" Received: from unknown (HELO yhuang-mobile.sh.intel.com) ([10.239.198.87]) by fmsmga007.fm.intel.com with ESMTP; 11 Sep 2018 17:44:52 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 RESEND 12/21] swap: Support PMD swap mapping in swapoff Date: Wed, 12 Sep 2018 08:44:05 +0800 Message-Id: <20180912004414.22583-13-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180912004414.22583-1-ying.huang@intel.com> References: <20180912004414.22583-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP During swapoff, for a huge swap cluster, we need to allocate a THP, read its contents into the THP and unuse the PMD and PTE swap mappings to it. If failed to allocate a THP, the huge swap cluster will be split. During unuse, if it is found that the swap cluster mapped by a PMD swap mapping is split already, we will split the PMD swap mapping and unuse the PTEs. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/asm-generic/pgtable.h | 14 +------ include/linux/huge_mm.h | 8 ++++ mm/huge_memory.c | 4 +- mm/swapfile.c | 86 ++++++++++++++++++++++++++++++++++++++++++- 4 files changed, 97 insertions(+), 15 deletions(-) diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index eb1e9d17371b..d64cef2bff04 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -931,22 +931,12 @@ static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t *pmd) barrier(); #endif /* - * !pmd_present() checks for pmd migration entries - * - * The complete check uses is_pmd_migration_entry() in linux/swapops.h - * But using that requires moving current function and pmd_trans_unstable() - * to linux/swapops.h to resovle dependency, which is too much code move. - * - * !pmd_present() is equivalent to is_pmd_migration_entry() currently, - * because !pmd_present() pages can only be under migration not swapped - * out. - * - * pmd_none() is preseved for future condition checks on pmd migration + * pmd_none() is preseved for future condition checks on pmd swap * entries and not confusing with this function name, although it is * redundant with !pmd_present(). */ if (pmd_none(pmdval) || pmd_trans_huge(pmdval) || - (IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION) && !pmd_present(pmdval))) + (IS_ENABLED(CONFIG_HAVE_PMD_SWAP_ENTRY) && !pmd_present(pmdval))) return 1; if (unlikely(pmd_bad(pmdval))) { pmd_clear_bad(pmd); diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 9dedff974def..25ba9b5f1e60 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -406,6 +406,8 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #ifdef CONFIG_THP_SWAP +extern int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, pmd_t orig_pmd); extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd); static inline bool transparent_hugepage_swapin_enabled( @@ -431,6 +433,12 @@ static inline bool transparent_hugepage_swapin_enabled( return false; } #else /* CONFIG_THP_SWAP */ +static inline int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, pmd_t orig_pmd) +{ + return 0; +} + static inline int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) { return 0; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index c4a766243a8f..cd353f39bed9 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1671,8 +1671,8 @@ static void __split_huge_swap_pmd(struct vm_area_struct *vma, } #ifdef CONFIG_THP_SWAP -static int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, - unsigned long address, pmd_t orig_pmd) +int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, pmd_t orig_pmd) { struct mm_struct *mm = vma->vm_mm; spinlock_t *ptl; diff --git a/mm/swapfile.c b/mm/swapfile.c index 3fe50f1da0a0..64067ee6a09c 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1931,6 +1931,11 @@ static inline int pte_same_as_swp(pte_t pte, pte_t swp_pte) return pte_same(pte_swp_clear_soft_dirty(pte), swp_pte); } +static inline int pmd_same_as_swp(pmd_t pmd, pmd_t swp_pmd) +{ + return pmd_same(pmd_swp_clear_soft_dirty(pmd), swp_pmd); +} + /* * No need to decide whether this PTE shares the swap entry with others, * just let do_wp_page work it out if a write is requested later - to @@ -1992,6 +1997,53 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd, return ret; } +#ifdef CONFIG_THP_SWAP +static int unuse_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long addr, swp_entry_t entry, struct page *page) +{ + struct mem_cgroup *memcg; + spinlock_t *ptl; + int ret = 1; + + if (mem_cgroup_try_charge(page, vma->vm_mm, GFP_KERNEL, + &memcg, true)) { + ret = -ENOMEM; + goto out_nolock; + } + + ptl = pmd_lock(vma->vm_mm, pmd); + if (unlikely(!pmd_same_as_swp(*pmd, swp_entry_to_pmd(entry)))) { + mem_cgroup_cancel_charge(page, memcg, true); + ret = 0; + goto out; + } + + add_mm_counter(vma->vm_mm, MM_SWAPENTS, -HPAGE_PMD_NR); + add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR); + get_page(page); + set_pmd_at(vma->vm_mm, addr, pmd, + pmd_mkold(mk_huge_pmd(page, vma->vm_page_prot))); + page_add_anon_rmap(page, vma, addr, true); + mem_cgroup_commit_charge(page, memcg, true, true); + swap_free(entry, HPAGE_PMD_NR); + /* + * Move the page to the active list so it is not + * immediately swapped out again after swapon. + */ + activate_page(page); +out: + spin_unlock(ptl); +out_nolock: + return ret; +} +#else +static int unuse_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long addr, swp_entry_t entry, struct page *page) +{ + return 0; +} +#endif + static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, unsigned long end, swp_entry_t entry, struct page *page) @@ -2032,7 +2084,7 @@ static inline int unuse_pmd_range(struct vm_area_struct *vma, pud_t *pud, unsigned long addr, unsigned long end, swp_entry_t entry, struct page *page) { - pmd_t *pmd; + pmd_t swp_pmd = swp_entry_to_pmd(entry), *pmd, orig_pmd; unsigned long next; int ret; @@ -2040,6 +2092,27 @@ static inline int unuse_pmd_range(struct vm_area_struct *vma, pud_t *pud, do { cond_resched(); next = pmd_addr_end(addr, end); + orig_pmd = *pmd; + if (IS_ENABLED(CONFIG_THP_SWAP) && is_swap_pmd(orig_pmd)) { + if (likely(!pmd_same_as_swp(orig_pmd, swp_pmd))) + continue; + /* + * Huge cluster has been split already, split + * PMD swap mapping and fallback to unuse PTE + */ + if (!PageTransCompound(page)) { + ret = split_huge_swap_pmd(vma, pmd, + addr, orig_pmd); + if (ret) + return ret; + ret = unuse_pte_range(vma, pmd, addr, + next, entry, page); + } else + ret = unuse_pmd(vma, pmd, addr, entry, page); + if (ret) + return ret; + continue; + } if (pmd_none_or_trans_huge_or_clear_bad(pmd)) continue; ret = unuse_pte_range(vma, pmd, addr, next, entry, page); @@ -2233,6 +2306,7 @@ int try_to_unuse(unsigned int type, bool frontswap, * there are races when an instance of an entry might be missed. */ while ((i = find_next_to_unuse(si, i, frontswap)) != 0) { +retry: if (signal_pending(current)) { retval = -EINTR; break; @@ -2248,6 +2322,8 @@ int try_to_unuse(unsigned int type, bool frontswap, page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE, NULL, 0, false); if (!page) { + struct swap_cluster_info *ci = NULL; + /* * Either swap_duplicate() failed because entry * has been freed independently, and will not be @@ -2264,6 +2340,14 @@ int try_to_unuse(unsigned int type, bool frontswap, */ if (!swcount || swcount == SWAP_MAP_BAD) continue; + if (si->cluster_info) + ci = si->cluster_info + i / SWAPFILE_CLUSTER; + /* Split huge cluster if failed to allocate huge page */ + if (cluster_is_huge(ci)) { + retval = split_swap_cluster(entry, 0); + if (!retval || retval == -EEXIST) + goto retry; + } retval = -ENOMEM; break; } From patchwork Wed Sep 12 00:44:06 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10596545 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A24B5920 for ; Wed, 12 Sep 2018 00:45:06 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 918C629AC1 for ; Wed, 12 Sep 2018 00:45:06 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 85C2129AD2; Wed, 12 Sep 2018 00:45:06 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E108C29AC1 for ; Wed, 12 Sep 2018 00:45:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 63AD58E000F; Tue, 11 Sep 2018 20:45:04 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 5C3828E0001; Tue, 11 Sep 2018 20:45:04 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4B3688E000F; Tue, 11 Sep 2018 20:45:04 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f197.google.com (mail-pg1-f197.google.com [209.85.215.197]) by kanga.kvack.org (Postfix) with ESMTP id 0DA978E0001 for ; Tue, 11 Sep 2018 20:45:04 -0400 (EDT) Received: by mail-pg1-f197.google.com with SMTP id l65-v6so102895pge.17 for ; Tue, 11 Sep 2018 17:45:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=3fleLZedhMgsyakHSVtnIsIlHhAyjbq2yurczqgb5QA=; b=eu0YqynA3h1ufahEowKw71FXJAqcQ1Qxl+lsuhq7dlcH+HDdR1LOq3KRFhVQe+4NbP suqwbHRXzTeDFrKlvT2yCwC1KCT9ptN7MIVLjqcBcvkCBYe8dmANfdEFAa7bhruGZxvJ nayoGoMXLO2PmBavOf1MQpmcjLGSkRwm+LMZZfk1B76gMBfHfnShXZN+62jjMr1fg6a5 wAp4i+2CzyK5c8O7YbDERD5AqFtqzUv+2QMakBG6rYXA1spNh+6K+CjvbcURtv0LQvEB KYw1zTvW7EIyL+bmU7wWVeBIPG7P/KpgisarvoPvEc24oZSCtfhTjXXIZqsZ97brzuJx CEeg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51DAl+yY+tZF4MMHw+ZiSThFyLQIPNigzJ00Hz/2DujswAx46306 +Kj/apLtrMwysvpq59u8hIG/DeJWJLk2Tnd5FF07zF2zFI170OlL2p+nzFNG/HdUnk1b1kADyJH 99zMtgslhZBHpE1DmW/bzHQNaMtZflCehrIfDlhuMBe8HtDUQRu75BWUAZiFH0RszDQ== X-Received: by 2002:a62:ed5:: with SMTP id 82-v6mr31944943pfo.198.1536713103726; Tue, 11 Sep 2018 17:45:03 -0700 (PDT) X-Google-Smtp-Source: ANB0Vdazv/Jy4LGVlCtamW89WQuF4s4tgtBcu7K4n6j/2fe1e0l8VZFyAffXkdM6Kjwq1P1tM5cA X-Received: by 2002:a62:ed5:: with SMTP id 82-v6mr31944905pfo.198.1536713103026; Tue, 11 Sep 2018 17:45:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536713103; cv=none; d=google.com; s=arc-20160816; b=qgauD43Btxv8obXZZZnmS3BzgHfkvMB6rPfQGXOcfBuf9hqzx7qbMSzV6plpF4nErQ H6nokG0xVUIyPCXqNYqLGm8RZv11eYlPfx6ppueTEBMp0ADaHpYsO6XGSPO9T45cMdq8 5Tjy14lNCERtMuOMh+J9eVW6YDy3DqIal4kE8+QsJmvSTXd7rssls17toY9BEWRDBgE6 mhGBn4LNNl4rCfEEmmF75jQfZV+tuj5Sp03ND6bt0FQW7Us+AbZJBwxGXg+NVwISlozI oR8UBcAh3L8CYxf2hWJqYn+xv/JGfqE3jlJD5lcNa4xJryglsSzabHrI04WGrryM10Q7 FTdg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=3fleLZedhMgsyakHSVtnIsIlHhAyjbq2yurczqgb5QA=; b=rZUsQKkrLbLTXoWk4NXP1BLHwRFevtdIj+7lBedW2dRm5eZ3ARtuTU2S5qY20n/d0K F3/Pik/2ybCafrqJCCoWg2+LnwcRQBZGECSxtIFWP419vI04mlD5XL0MeT8OjWl8cNjb X77C5Ivoh+EBB+bTLZJmlGA59qbedEF+2GDsFPSzOlDdiCpU3poaFQEeLUWSzoXPXEwa Cjf+RNmzAK2nDelyiFXjjyhDtLt8qi31FK8ndrqxsz8pZ6FCk+F9GODBDhZvDWgnIOfO YE0kdFyrcXTOgaoQ6E6fB08BE97HAQ7PWQJt+CtT+RsWrNcvNAhB+tlwV5CcOsoWEHgA bKLQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga14.intel.com (mga14.intel.com. [192.55.52.115]) by mx.google.com with ESMTPS id n5-v6si23354174pgg.572.2018.09.11.17.45.02 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 11 Sep 2018 17:45:03 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) client-ip=192.55.52.115; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Sep 2018 17:45:02 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,362,1531810800"; d="scan'208";a="69283926" Received: from unknown (HELO yhuang-mobile.sh.intel.com) ([10.239.198.87]) by fmsmga007.fm.intel.com with ESMTP; 11 Sep 2018 17:44:57 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 RESEND 13/21] swap: Support PMD swap mapping in madvise_free() Date: Wed, 12 Sep 2018 08:44:06 +0800 Message-Id: <20180912004414.22583-14-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180912004414.22583-1-ying.huang@intel.com> References: <20180912004414.22583-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When madvise_free() found a PMD swap mapping, if only part of the huge swap cluster is operated on, the PMD swap mapping will be split and fallback to PTE swap mapping processing. Otherwise, if all huge swap cluster is operated on, free_swap_and_cache() will be called to decrease the PMD swap mapping count and probably free the swap space and the THP in swap cache too. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/huge_memory.c | 54 +++++++++++++++++++++++++++++++++++++++--------------- mm/madvise.c | 2 +- 2 files changed, 40 insertions(+), 16 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index cd353f39bed9..05407832e793 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1849,6 +1849,15 @@ int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) } #endif +static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd) +{ + pgtable_t pgtable; + + pgtable = pgtable_trans_huge_withdraw(mm, pmd); + pte_free(mm, pgtable); + mm_dec_nr_ptes(mm); +} + /* * Return true if we do MADV_FREE successfully on entire pmd page. * Otherwise, return false. @@ -1869,15 +1878,39 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, goto out_unlocked; orig_pmd = *pmd; - if (is_huge_zero_pmd(orig_pmd)) - goto out; - if (unlikely(!pmd_present(orig_pmd))) { - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(orig_pmd)); - goto out; + swp_entry_t entry = pmd_to_swp_entry(orig_pmd); + + if (is_migration_entry(entry)) { + VM_BUG_ON(!thp_migration_supported()); + goto out; + } else if (IS_ENABLED(CONFIG_THP_SWAP) && + !non_swap_entry(entry)) { + /* + * If part of THP is discarded, split the PMD + * swap mapping and operate on the PTEs + */ + if (next - addr != HPAGE_PMD_SIZE) { + unsigned long haddr = addr & HPAGE_PMD_MASK; + + __split_huge_swap_pmd(vma, haddr, pmd); + goto out; + } + free_swap_and_cache(entry, HPAGE_PMD_NR); + pmd_clear(pmd); + zap_deposited_table(mm, pmd); + if (current->mm == mm) + sync_mm_rss(mm); + add_mm_counter(mm, MM_SWAPENTS, -HPAGE_PMD_NR); + ret = true; + goto out; + } else + VM_BUG_ON(1); } + if (is_huge_zero_pmd(orig_pmd)) + goto out; + page = pmd_page(orig_pmd); /* * If other processes are mapping this page, we couldn't discard @@ -1923,15 +1956,6 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, return ret; } -static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd) -{ - pgtable_t pgtable; - - pgtable = pgtable_trans_huge_withdraw(mm, pmd); - pte_free(mm, pgtable); - mm_dec_nr_ptes(mm); -} - int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr) { diff --git a/mm/madvise.c b/mm/madvise.c index 6fff1c1d2009..07ef599d4255 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -321,7 +321,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, unsigned long next; next = pmd_addr_end(addr, end); - if (pmd_trans_huge(*pmd)) + if (pmd_trans_huge(*pmd) || is_swap_pmd(*pmd)) if (madvise_free_huge_pmd(tlb, vma, pmd, addr, next)) goto next; From patchwork Wed Sep 12 00:44:07 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10596547 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8F20A6CB for ; Wed, 12 Sep 2018 00:45:13 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7CC1029AC1 for ; Wed, 12 Sep 2018 00:45:13 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6FD4329AD2; Wed, 12 Sep 2018 00:45:13 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 46D6529AC1 for ; Wed, 12 Sep 2018 00:45:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8AE728E0010; Tue, 11 Sep 2018 20:45:10 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 835298E0001; Tue, 11 Sep 2018 20:45:10 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 723B68E0010; Tue, 11 Sep 2018 20:45:10 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f198.google.com (mail-pl1-f198.google.com [209.85.214.198]) by kanga.kvack.org (Postfix) with ESMTP id 23D1A8E0001 for ; Tue, 11 Sep 2018 20:45:10 -0400 (EDT) Received: by mail-pl1-f198.google.com with SMTP id a8-v6so111596pla.10 for ; Tue, 11 Sep 2018 17:45:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=uU5pXcQQCKiCgqGL9LpKsiQ2lYAgQCgv3uQLYWUB/sE=; b=bmhTiEJTtct8tBrP5N7D4lMxHUvYLecf39m0Zg5z5/gt9/lD1jLxdjj+L35HlH5utJ uXxfJnzki5gJTVdX+qrI8Z3vy5aF4taGzZBkbZQqFRVm7yO+eeZxBZkcwlKtx/FL3arb q0rV3mhpJckMEE+t3pL0SAuvszaHRQl4rBoknupNuPKSMsVzJcBnHKQyExiYreuj7Zm4 VAiQc4bRJOU45MW11X9UXsgEAuE+p6Ampj920BnrmAL8nL34cxtuU4rFmFtuU8umguD1 DcXzBj19FjWruITkw2Y2Ut4O34HehSpr9KhdZsDUlkVDNlFDoDEQjV9qwjYHNjXzAVaz OE6A== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51Dqec15Rukq6UPi9ddsFyyXiYy3KrapeRioytvKvMVzckYeFWEG 4UjnBRYN5ASJgTbPgMBc4/+p5gy6sUrAdyqbR6OoxbKhs6RQpW4FOMcN2oQiAjcYsawOVtO16WY uhoYBOw4TgEEa6IGJYAjVZJVUaM2txG0BUFMA71EQE/5FngURRmXGZvwWyQMNHbPxfg== X-Received: by 2002:a17:902:7586:: with SMTP id j6-v6mr29816083pll.295.1536713109671; Tue, 11 Sep 2018 17:45:09 -0700 (PDT) X-Google-Smtp-Source: ANB0VdYdCYOz/TqTSDXIRzHmw89hCynQscrVfiZ/t0S5WbkZf+8X+9BOmn3JC9CES1rRr89CcA7A X-Received: by 2002:a17:902:7586:: with SMTP id j6-v6mr29816025pll.295.1536713108495; Tue, 11 Sep 2018 17:45:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536713108; cv=none; d=google.com; s=arc-20160816; b=nPmSTssqBppOytXYpMXMhc5LhfhXn2X/3IgMkzg9JZxM4Y6EZZqFvW5mkSDP4xAnvl dW1tsI/cU/14YyAxxyzf41klHRCmkXfTx3VHema5vr6b2SMIcBklmZuB6VARFq6o/m6V izvTgjR+7Pg4wW+g+/wsB7QJQGCV6mWaJf1v0ih0eJqVRBl1yyKVRr/zobxsKmaVqa0n dd1jd44/0rVN4nw8nTKKWpcviHKwUvWxyT81WkDC7zbtvwY33ktZnjkGNFhu0Gux+eZp yNfTMMqw0CBkRCyjI2yTzBw0StEJmsdVy0PnJDQh2sUT1xxtqvGugOL19RI5H7wIIrq9 IScQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=uU5pXcQQCKiCgqGL9LpKsiQ2lYAgQCgv3uQLYWUB/sE=; b=gmV1mmJBXDtvUAj1c4KY4+ch8/0HnpJbqDYmZ5O1kIr0yyZ0dr/PO1LffhEKgxa/4m odelqnr9+LTwhErn+argGbcrlieJsdpn8DKqZSC91niTGK1LFJ2qrgD5MzfqvU0fHMR2 pGpqXN4FOXrxUIqCLMbjHsUKwyDJxN64fObwwy2zM4VvcJ6ftdB33pnSVZQ9UcmMFycM pjfwR+JC7Sf4Zp81M9dzlTpE+JL6k1bq4J0tgEX0Y+Vt+Xnfcx2unhPzGW1DYZseUc6l 3axs+WA0QOeYFj0S1nN1U3/LwU6ryr+TX7nEIcy1REjzsrlvD7u9elPr//LPtplU5t4z /JpQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga09.intel.com (mga09.intel.com. [134.134.136.24]) by mx.google.com with ESMTPS id e8-v6si22134418pgl.498.2018.09.11.17.45.08 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 11 Sep 2018 17:45:08 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) client-ip=134.134.136.24; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Sep 2018 17:45:08 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,362,1531810800"; d="scan'208";a="69283958" Received: from unknown (HELO yhuang-mobile.sh.intel.com) ([10.239.198.87]) by fmsmga007.fm.intel.com with ESMTP; 11 Sep 2018 17:45:02 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 RESEND 14/21] swap: Support to move swap account for PMD swap mapping Date: Wed, 12 Sep 2018 08:44:07 +0800 Message-Id: <20180912004414.22583-15-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180912004414.22583-1-ying.huang@intel.com> References: <20180912004414.22583-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Previously the huge swap cluster will be split after the THP is swapout. Now, to support to swapin the THP in one piece, the huge swap cluster will not be split after the THP is reclaimed. So in memcg, we need to move the swap account for PMD swap mappings in the process's page table. When the page table is scanned during moving memcg charge, the PMD swap mapping will be identified. And mem_cgroup_move_swap_account() and its callee is revised to move account for the whole huge swap cluster. If the swap cluster mapped by PMD has been split, the PMD swap mapping will be split and fallback to PTE processing. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/huge_mm.h | 9 ++++ include/linux/swap.h | 6 +++ include/linux/swap_cgroup.h | 3 +- mm/huge_memory.c | 8 +-- mm/memcontrol.c | 129 ++++++++++++++++++++++++++++++++++---------- mm/swap_cgroup.c | 45 +++++++++++++--- mm/swapfile.c | 14 +++++ 7 files changed, 174 insertions(+), 40 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 25ba9b5f1e60..6586c1bfac21 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -406,6 +406,9 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #ifdef CONFIG_THP_SWAP +extern void __split_huge_swap_pmd(struct vm_area_struct *vma, + unsigned long haddr, + pmd_t *pmd); extern int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long address, pmd_t orig_pmd); extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd); @@ -433,6 +436,12 @@ static inline bool transparent_hugepage_swapin_enabled( return false; } #else /* CONFIG_THP_SWAP */ +static inline void __split_huge_swap_pmd(struct vm_area_struct *vma, + unsigned long haddr, + pmd_t *pmd) +{ +} + static inline int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long address, pmd_t orig_pmd) { diff --git a/include/linux/swap.h b/include/linux/swap.h index 921abd07e13f..d45c3a7746e0 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -621,6 +621,7 @@ static inline swp_entry_t get_swap_page(struct page *page) #ifdef CONFIG_THP_SWAP extern int split_swap_cluster(swp_entry_t entry, unsigned long flags); extern int split_swap_cluster_map(swp_entry_t entry); +extern int get_swap_entry_size(swp_entry_t entry); #else static inline int split_swap_cluster(swp_entry_t entry, unsigned long flags) { @@ -631,6 +632,11 @@ static inline int split_swap_cluster_map(swp_entry_t entry) { return 0; } + +static inline int get_swap_entry_size(swp_entry_t entry) +{ + return 1; +} #endif #ifdef CONFIG_MEMCG diff --git a/include/linux/swap_cgroup.h b/include/linux/swap_cgroup.h index a12dd1c3966c..c40fb52b0563 100644 --- a/include/linux/swap_cgroup.h +++ b/include/linux/swap_cgroup.h @@ -7,7 +7,8 @@ #ifdef CONFIG_MEMCG_SWAP extern unsigned short swap_cgroup_cmpxchg(swp_entry_t ent, - unsigned short old, unsigned short new); + unsigned short old, unsigned short new, + unsigned int nr_ents); extern unsigned short swap_cgroup_record(swp_entry_t ent, unsigned short id, unsigned int nr_ents); extern unsigned short lookup_swap_cgroup_id(swp_entry_t ent); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 05407832e793..f98d8a543d73 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1636,10 +1636,11 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd) return 0; } +#ifdef CONFIG_THP_SWAP /* Convert a PMD swap mapping to a set of PTE swap mappings */ -static void __split_huge_swap_pmd(struct vm_area_struct *vma, - unsigned long haddr, - pmd_t *pmd) +void __split_huge_swap_pmd(struct vm_area_struct *vma, + unsigned long haddr, + pmd_t *pmd) { struct mm_struct *mm = vma->vm_mm; pgtable_t pgtable; @@ -1670,7 +1671,6 @@ static void __split_huge_swap_pmd(struct vm_area_struct *vma, pmd_populate(mm, pmd, pgtable); } -#ifdef CONFIG_THP_SWAP int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long address, pmd_t orig_pmd) { diff --git a/mm/memcontrol.c b/mm/memcontrol.c index fcec9b39e2a3..6c2527ffd17d 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2682,9 +2682,10 @@ void mem_cgroup_split_huge_fixup(struct page *head) #ifdef CONFIG_MEMCG_SWAP /** * mem_cgroup_move_swap_account - move swap charge and swap_cgroup's record. - * @entry: swap entry to be moved + * @entry: the first swap entry to be moved * @from: mem_cgroup which the entry is moved from * @to: mem_cgroup which the entry is moved to + * @nr_ents: number of swap entries * * It succeeds only when the swap_cgroup's record for this entry is the same * as the mem_cgroup's id of @from. @@ -2695,23 +2696,27 @@ void mem_cgroup_split_huge_fixup(struct page *head) * both res and memsw, and called css_get(). */ static int mem_cgroup_move_swap_account(swp_entry_t entry, - struct mem_cgroup *from, struct mem_cgroup *to) + struct mem_cgroup *from, + struct mem_cgroup *to, + unsigned int nr_ents) { unsigned short old_id, new_id; old_id = mem_cgroup_id(from); new_id = mem_cgroup_id(to); - if (swap_cgroup_cmpxchg(entry, old_id, new_id) == old_id) { - mod_memcg_state(from, MEMCG_SWAP, -1); - mod_memcg_state(to, MEMCG_SWAP, 1); + if (swap_cgroup_cmpxchg(entry, old_id, new_id, nr_ents) == old_id) { + mod_memcg_state(from, MEMCG_SWAP, -nr_ents); + mod_memcg_state(to, MEMCG_SWAP, nr_ents); return 0; } return -EINVAL; } #else static inline int mem_cgroup_move_swap_account(swp_entry_t entry, - struct mem_cgroup *from, struct mem_cgroup *to) + struct mem_cgroup *from, + struct mem_cgroup *to, + unsigned int nr_ents) { return -EINVAL; } @@ -4666,6 +4671,7 @@ enum mc_target_type { MC_TARGET_PAGE, MC_TARGET_SWAP, MC_TARGET_DEVICE, + MC_TARGET_FALLBACK, }; static struct page *mc_handle_present_pte(struct vm_area_struct *vma, @@ -4732,6 +4738,26 @@ static struct page *mc_handle_swap_pte(struct vm_area_struct *vma, } #endif +static struct page *mc_handle_swap_pmd(struct vm_area_struct *vma, + pmd_t pmd, swp_entry_t *entry) +{ + struct page *page = NULL; + swp_entry_t ent = pmd_to_swp_entry(pmd); + + if (!(mc.flags & MOVE_ANON) || non_swap_entry(ent)) + return NULL; + + /* + * Because lookup_swap_cache() updates some statistics counter, + * we call find_get_page() with swapper_space directly. + */ + page = find_get_page(swap_address_space(ent), swp_offset(ent)); + if (do_memsw_account()) + entry->val = ent.val; + + return page; +} + static struct page *mc_handle_file_pte(struct vm_area_struct *vma, unsigned long addr, pte_t ptent, swp_entry_t *entry) { @@ -4920,7 +4946,9 @@ static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma, * There is a swap entry and a page doesn't exist or isn't charged. * But we cannot move a tail-page in a THP. */ - if (ent.val && !ret && (!page || !PageTransCompound(page)) && + if (ent.val && !ret && + ((page && !PageTransCompound(page)) || + (!page && get_swap_entry_size(ent) == 1)) && mem_cgroup_id(mc.from) == lookup_swap_cgroup_id(ent)) { ret = MC_TARGET_SWAP; if (target) @@ -4931,37 +4959,64 @@ static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma, #ifdef CONFIG_TRANSPARENT_HUGEPAGE /* - * We don't consider PMD mapped swapping or file mapped pages because THP does - * not support them for now. - * Caller should make sure that pmd_trans_huge(pmd) is true. + * We don't consider file mapped pages because THP does not support + * them for now. */ static enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma, - unsigned long addr, pmd_t pmd, union mc_target *target) + unsigned long addr, pmd_t *pmdp, union mc_target *target) { + pmd_t pmd = *pmdp; struct page *page = NULL; enum mc_target_type ret = MC_TARGET_NONE; + swp_entry_t ent = { .val = 0 }; if (unlikely(is_swap_pmd(pmd))) { - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(pmd)); - return ret; + if (is_pmd_migration_entry(pmd)) { + VM_BUG_ON(!thp_migration_supported()); + return ret; + } + if (!IS_ENABLED(CONFIG_THP_SWAP)) { + VM_BUG_ON(1); + return ret; + } + page = mc_handle_swap_pmd(vma, pmd, &ent); + /* The swap cluster has been split under us */ + if ((page && !PageTransHuge(page)) || + (!page && ent.val && get_swap_entry_size(ent) == 1)) { + __split_huge_swap_pmd(vma, addr, pmdp); + ret = MC_TARGET_FALLBACK; + goto out; + } + } else { + page = pmd_page(pmd); + get_page(page); } - page = pmd_page(pmd); - VM_BUG_ON_PAGE(!page || !PageHead(page), page); + VM_BUG_ON_PAGE(page && !PageHead(page), page); if (!(mc.flags & MOVE_ANON)) - return ret; - if (page->mem_cgroup == mc.from) { + goto out; + if (!page && !ent.val) + goto out; + if (page && page->mem_cgroup == mc.from) { ret = MC_TARGET_PAGE; if (target) { get_page(page); target->page = page; } } + if (ent.val && !ret && !page && + mem_cgroup_id(mc.from) == lookup_swap_cgroup_id(ent)) { + ret = MC_TARGET_SWAP; + if (target) + target->ent = ent; + } +out: + if (page) + put_page(page); return ret; } #else static inline enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma, - unsigned long addr, pmd_t pmd, union mc_target *target) + unsigned long addr, pmd_t *pmdp, union mc_target *target) { return MC_TARGET_NONE; } @@ -4974,6 +5029,7 @@ static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd, struct vm_area_struct *vma = walk->vma; pte_t *pte; spinlock_t *ptl; + int ret; ptl = pmd_trans_huge_lock(pmd, vma); if (ptl) { @@ -4982,12 +5038,16 @@ static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd, * support transparent huge page with MEMORY_DEVICE_PUBLIC or * MEMORY_DEVICE_PRIVATE but this might change. */ - if (get_mctgt_type_thp(vma, addr, *pmd, NULL) == MC_TARGET_PAGE) - mc.precharge += HPAGE_PMD_NR; + ret = get_mctgt_type_thp(vma, addr, pmd, NULL); spin_unlock(ptl); + if (ret == MC_TARGET_FALLBACK) + goto fallback; + if (ret) + mc.precharge += HPAGE_PMD_NR; return 0; } +fallback: if (pmd_trans_unstable(pmd)) return 0; pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); @@ -5178,6 +5238,7 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, enum mc_target_type target_type; union mc_target target; struct page *page; + swp_entry_t ent; ptl = pmd_trans_huge_lock(pmd, vma); if (ptl) { @@ -5185,8 +5246,9 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, spin_unlock(ptl); return 0; } - target_type = get_mctgt_type_thp(vma, addr, *pmd, &target); - if (target_type == MC_TARGET_PAGE) { + target_type = get_mctgt_type_thp(vma, addr, pmd, &target); + switch (target_type) { + case MC_TARGET_PAGE: page = target.page; if (!isolate_lru_page(page)) { if (!mem_cgroup_move_account(page, true, @@ -5197,7 +5259,8 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, putback_lru_page(page); } put_page(page); - } else if (target_type == MC_TARGET_DEVICE) { + break; + case MC_TARGET_DEVICE: page = target.page; if (!mem_cgroup_move_account(page, true, mc.from, mc.to)) { @@ -5205,9 +5268,21 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, mc.moved_charge += HPAGE_PMD_NR; } put_page(page); + break; + case MC_TARGET_SWAP: + ent = target.ent; + if (!mem_cgroup_move_swap_account(ent, mc.from, mc.to, + HPAGE_PMD_NR)) { + mc.precharge -= HPAGE_PMD_NR; + mc.moved_swap += HPAGE_PMD_NR; + } + break; + default: + break; } spin_unlock(ptl); - return 0; + if (target_type != MC_TARGET_FALLBACK) + return 0; } if (pmd_trans_unstable(pmd)) @@ -5217,7 +5292,6 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, for (; addr != end; addr += PAGE_SIZE) { pte_t ptent = *(pte++); bool device = false; - swp_entry_t ent; if (!mc.precharge) break; @@ -5251,7 +5325,8 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, break; case MC_TARGET_SWAP: ent = target.ent; - if (!mem_cgroup_move_swap_account(ent, mc.from, mc.to)) { + if (!mem_cgroup_move_swap_account(ent, mc.from, + mc.to, 1)) { mc.precharge--; /* we fixup refcnts and charges later. */ mc.moved_swap++; diff --git a/mm/swap_cgroup.c b/mm/swap_cgroup.c index 45affaef3bc6..ccc08e88962a 100644 --- a/mm/swap_cgroup.c +++ b/mm/swap_cgroup.c @@ -87,29 +87,58 @@ static struct swap_cgroup *lookup_swap_cgroup(swp_entry_t ent, /** * swap_cgroup_cmpxchg - cmpxchg mem_cgroup's id for this swp_entry. - * @ent: swap entry to be cmpxchged + * @ent: the first swap entry to be cmpxchged * @old: old id * @new: new id + * @nr_ents: number of swap entries * * Returns old id at success, 0 at failure. * (There is no mem_cgroup using 0 as its id) */ unsigned short swap_cgroup_cmpxchg(swp_entry_t ent, - unsigned short old, unsigned short new) + unsigned short old, unsigned short new, + unsigned int nr_ents) { struct swap_cgroup_ctrl *ctrl; - struct swap_cgroup *sc; + struct swap_cgroup *sc_start, *sc; unsigned long flags; unsigned short retval; + pgoff_t offset_start = swp_offset(ent), offset; + pgoff_t end = offset_start + nr_ents; - sc = lookup_swap_cgroup(ent, &ctrl); + sc_start = lookup_swap_cgroup(ent, &ctrl); spin_lock_irqsave(&ctrl->lock, flags); - retval = sc->id; - if (retval == old) + sc = sc_start; + offset = offset_start; + for (;;) { + if (sc->id != old) { + retval = 0; + goto out; + } + offset++; + if (offset == end) + break; + if (offset % SC_PER_PAGE) + sc++; + else + sc = __lookup_swap_cgroup(ctrl, offset); + } + + sc = sc_start; + offset = offset_start; + for (;;) { sc->id = new; - else - retval = 0; + offset++; + if (offset == end) + break; + if (offset % SC_PER_PAGE) + sc++; + else + sc = __lookup_swap_cgroup(ctrl, offset); + } + retval = old; +out: spin_unlock_irqrestore(&ctrl->lock, flags); return retval; } diff --git a/mm/swapfile.c b/mm/swapfile.c index 64067ee6a09c..bff2cb7badbb 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1730,6 +1730,20 @@ static int page_trans_huge_map_swapcount(struct page *page, int *total_mapcount, return map_swapcount; } +#ifdef CONFIG_THP_SWAP +int get_swap_entry_size(swp_entry_t entry) +{ + struct swap_info_struct *si; + struct swap_cluster_info *ci; + + si = _swap_info_get(entry); + if (!si || !si->cluster_info) + return 1; + ci = si->cluster_info + swp_offset(entry) / SWAPFILE_CLUSTER; + return cluster_is_huge(ci) ? SWAPFILE_CLUSTER : 1; +} +#endif + /* * We can write to an anon page without COW if there are no other references * to it. And as a side-effect, free up its swap: because the old content From patchwork Wed Sep 12 00:44:08 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10596549 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 04CE76CB for ; Wed, 12 Sep 2018 00:45:18 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E7BA629AC1 for ; Wed, 12 Sep 2018 00:45:17 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DC12729AD2; Wed, 12 Sep 2018 00:45:17 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B49B029AC1 for ; Wed, 12 Sep 2018 00:45:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D1A3F8E0001; Tue, 11 Sep 2018 20:45:14 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C7A5D8E0012; Tue, 11 Sep 2018 20:45:14 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B4CD98E0001; Tue, 11 Sep 2018 20:45:14 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f199.google.com (mail-pg1-f199.google.com [209.85.215.199]) by kanga.kvack.org (Postfix) with ESMTP id 729E18E0001 for ; Tue, 11 Sep 2018 20:45:14 -0400 (EDT) Received: by mail-pg1-f199.google.com with SMTP id v195-v6so122036pgb.0 for ; Tue, 11 Sep 2018 17:45:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=iqcb6BcSEpiA1xXvEIvVxaqOr9xLhQaEiPiikqIif+w=; b=g/dvaxbVaVuYyUMp8ZAn1bMYTDuFbEDGXEWQ+7b4Uj8mDyhPPEOmOEVC5JRR6Aiw8/ /Ks7R5kEMSc6YUq5aWyZYMnmdiSsOBbDThVog2YaMnOCizYgU4lkRviVOLBGJkOyvuag yq1p4adAkSjPbqctxcgnd4g5zhUl+4JwLuQEzuOftMQ0ZkSNN60/T1trpmk3ly0OV0MK B1f2I+Bqd8xC/+oN8U2PnXrtKdu1c7uFj0mhJ1SteAp1ty7f2igvi/xZ6C2d+HMXZQkF EVEccftnc0amqyraEvCLOXbClmd7Oxa5lTabHt+Y0c0iLtijfOrsiiIbPxiy8ZLeJ8F4 xrdQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51DQZCIT9vhKFzB10NJYxmau7vDPXD8J0Jc73xWsntAGNgLz2hw3 YuJL1Tuup+mQOVVz/oMXhFDpzbknlHGWrghBTtztsF6Ek3tOkxgo44Oeh26C5pSfA9VBvy+RoUW zGMOirU3SsqzNqg5kIRFkuSLiMZazIHe5uIOwG9L8E4dmyl/j11VKej0wCcw91qHDTw== X-Received: by 2002:a63:7d48:: with SMTP id m8-v6mr31544535pgn.0.1536713114139; Tue, 11 Sep 2018 17:45:14 -0700 (PDT) X-Google-Smtp-Source: ANB0Vdakl1dLVh2aKs1NLsZyCfyog89xldDsiO/f77jyiyzw6E53TcIuBtV8Cn9fTh+pt1+1W6pO X-Received: by 2002:a63:7d48:: with SMTP id m8-v6mr31544500pgn.0.1536713113424; Tue, 11 Sep 2018 17:45:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536713113; cv=none; d=google.com; s=arc-20160816; b=BSh/7a/XCLgJHwMrK1pmuKgqFNxC0p8W2dov/VRZ6ZUJva71C2JrVp9EstI4KrEyPf G7LIn+5oebZ8I4oRYetVqplyJ1rjTcfKocHaZk4twYdofuI2+mlt3jH1bTD7N9SjOkDL 7f9BVLpzTz73uKFjKEHBg6fDuedRejckmbSQVaSzjEOMJLjjc5Ytv2Z87S2XlHXUdkkp 0EnsRVw8Cn45Zt5FZqjc7lCgjslO2fDoJWd24McAI0jGDwAU0zs+nxu8qSWVJBRsMQfE IdX3uIXILZCsQtX9H5s6YsE5+4bO7ievduoGG3zvWEt5rktAvvyRpkMIMMzniKAeVYTN yRIA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=iqcb6BcSEpiA1xXvEIvVxaqOr9xLhQaEiPiikqIif+w=; b=lGC9vET7Q/6PIMOs+c9EDKpId3XYPf1p6pgOIUQZeamiGB6dm9W2Nr5SHJw5R5Jds7 h8tFJWJndmsz+y0CLkKfw082cr2Z7wmXtWH5ZQr9orFHo1EWz61AajIEBn7fYGo5tZbB aPl4or7B2aeL7iYura8N8+HUh2lHJknyE4l9p1JCjqTVJVsk9BRbiSXVc4EF5eSy9AC3 enGHqwNphdyEhkCcwM35Y0awDWgT0Coze1TyrDNrjeD+h21ntMkRqdgP3kCCXu6xQ1Aw +hHWLZ7c/haBe/1Qxz3r85dqjcwYCxehEXjdEOAFlOb+2vLLdjcvyp2sml9z1QbyyhSb R5kA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga09.intel.com (mga09.intel.com. [134.134.136.24]) by mx.google.com with ESMTPS id e8-v6si22134418pgl.498.2018.09.11.17.45.13 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 11 Sep 2018 17:45:13 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) client-ip=134.134.136.24; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Sep 2018 17:45:13 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,362,1531810800"; d="scan'208";a="69283993" Received: from unknown (HELO yhuang-mobile.sh.intel.com) ([10.239.198.87]) by fmsmga007.fm.intel.com with ESMTP; 11 Sep 2018 17:45:08 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 RESEND 15/21] swap: Support to copy PMD swap mapping when fork() Date: Wed, 12 Sep 2018 08:44:08 +0800 Message-Id: <20180912004414.22583-16-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180912004414.22583-1-ying.huang@intel.com> References: <20180912004414.22583-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP During fork, the page table need to be copied from parent to child. A PMD swap mapping need to be copied too and the swap reference count need to be increased. When the huge swap cluster has been split already, we need to split the PMD swap mapping and fallback to PTE copying. When swap count continuation failed to allocate a page with GFP_ATOMIC, we need to unlock the spinlock and try again with GFP_KERNEL. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/huge_memory.c | 72 ++++++++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 57 insertions(+), 15 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index f98d8a543d73..4e2230583c53 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -941,6 +941,7 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, if (unlikely(!pgtable)) goto out; +retry: dst_ptl = pmd_lock(dst_mm, dst_pmd); src_ptl = pmd_lockptr(src_mm, src_pmd); spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); @@ -948,26 +949,67 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, ret = -EAGAIN; pmd = *src_pmd; -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION if (unlikely(is_swap_pmd(pmd))) { swp_entry_t entry = pmd_to_swp_entry(pmd); - VM_BUG_ON(!is_pmd_migration_entry(pmd)); - if (is_write_migration_entry(entry)) { - make_migration_entry_read(&entry); - pmd = swp_entry_to_pmd(entry); - if (pmd_swp_soft_dirty(*src_pmd)) - pmd = pmd_swp_mksoft_dirty(pmd); - set_pmd_at(src_mm, addr, src_pmd, pmd); +#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION + if (is_migration_entry(entry)) { + if (is_write_migration_entry(entry)) { + make_migration_entry_read(&entry); + pmd = swp_entry_to_pmd(entry); + if (pmd_swp_soft_dirty(*src_pmd)) + pmd = pmd_swp_mksoft_dirty(pmd); + set_pmd_at(src_mm, addr, src_pmd, pmd); + } + add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); + mm_inc_nr_ptes(dst_mm); + pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); + set_pmd_at(dst_mm, addr, dst_pmd, pmd); + ret = 0; + goto out_unlock; } - add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); - mm_inc_nr_ptes(dst_mm); - pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); - set_pmd_at(dst_mm, addr, dst_pmd, pmd); - ret = 0; - goto out_unlock; - } #endif + if (IS_ENABLED(CONFIG_THP_SWAP) && !non_swap_entry(entry)) { + ret = swap_duplicate(&entry, HPAGE_PMD_NR); + if (!ret) { + add_mm_counter(dst_mm, MM_SWAPENTS, + HPAGE_PMD_NR); + mm_inc_nr_ptes(dst_mm); + pgtable_trans_huge_deposit(dst_mm, dst_pmd, + pgtable); + set_pmd_at(dst_mm, addr, dst_pmd, pmd); + /* make sure dst_mm is on swapoff's mmlist. */ + if (unlikely(list_empty(&dst_mm->mmlist))) { + spin_lock(&mmlist_lock); + if (list_empty(&dst_mm->mmlist)) + list_add(&dst_mm->mmlist, + &src_mm->mmlist); + spin_unlock(&mmlist_lock); + } + } else if (ret == -ENOTDIR) { + /* + * The huge swap cluster has been split, split + * the PMD swap mapping and fallback to PTE + */ + __split_huge_swap_pmd(vma, addr, src_pmd); + pte_free(dst_mm, pgtable); + } else if (ret == -ENOMEM) { + spin_unlock(src_ptl); + spin_unlock(dst_ptl); + ret = add_swap_count_continuation(entry, + GFP_KERNEL); + if (ret < 0) { + ret = -ENOMEM; + pte_free(dst_mm, pgtable); + goto out; + } + goto retry; + } else + VM_BUG_ON(1); + goto out_unlock; + } + VM_BUG_ON(1); + } if (unlikely(!pmd_trans_huge(pmd))) { pte_free(dst_mm, pgtable); From patchwork Wed Sep 12 00:44:09 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10596551 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 800BC6CB for ; Wed, 12 Sep 2018 00:45:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6FD5529AC1 for ; Wed, 12 Sep 2018 00:45:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6381229AD2; Wed, 12 Sep 2018 00:45:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DD26229AC1 for ; Wed, 12 Sep 2018 00:45:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6144F8E0012; Tue, 11 Sep 2018 20:45:19 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 59F5B8E0011; Tue, 11 Sep 2018 20:45:19 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 418B78E0012; Tue, 11 Sep 2018 20:45:19 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f197.google.com (mail-pl1-f197.google.com [209.85.214.197]) by kanga.kvack.org (Postfix) with ESMTP id EE4018E0011 for ; Tue, 11 Sep 2018 20:45:18 -0400 (EDT) Received: by mail-pl1-f197.google.com with SMTP id n4-v6so115541plk.7 for ; Tue, 11 Sep 2018 17:45:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=aqfaiI2uSbaqc7SVVxBdipNN6hP/FX/5wM1EuCwmNw8=; b=avRo01KHUHUrglePFP/KPfd6ATM2GxXp8RphWRSEHSjKiQpBW1uE9xziePd7gccl9v DTLI0dULMJY1fTFk6Xvg35XuFn6nF0rZVrZuouIOIrYomERGDxEOX8T+mB/6yQ5pfb8M 7IovYeB6JqRsA+aiDnTa36WZ7T552468mSfVmrAAtKLMfZsuu6e6w1U0qCwUdHMq0Zq9 pk0vmIkEedxB+fAMbwPP8ALdEqjVk7iaBV04f+rwD8VhoAikX0XjFoelRvLgaZ60McQm 0t6O5hdGZhbKrLVrGoYyzQZ2T8E0J9P3n5vUlpRCRE4YcibMjzDQXVN12Go4Ti3W8Z8A +PnA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51CBprcfYmSFZTOf7wLNCVQLKvfT2LZvSwEHrneE3usA3s/NQrGn wRnp8mYlMO/FqpYXDmr9YUZd5T02Xm8I8cgW5qi2ySN5ytEdaKjQrQX7ftzMo3yea/xv3G8Q2Re PDfTk4TNkNL+sOIqPZYjpbdKNE5chLNMRTXYv9PEHDf337K8I2tShFx6mhilwhJx6nA== X-Received: by 2002:a63:f80a:: with SMTP id n10-v6mr30542155pgh.82.1536713118665; Tue, 11 Sep 2018 17:45:18 -0700 (PDT) X-Google-Smtp-Source: ANB0VdY4hMU4AbAHDDDzZa0X+9HTQu0xI05V8u7OpIFHIKGCovVtRy+2TBQRZgR4hX3ar3lc+Ztf X-Received: by 2002:a63:f80a:: with SMTP id n10-v6mr30542120pgh.82.1536713117981; Tue, 11 Sep 2018 17:45:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536713117; cv=none; d=google.com; s=arc-20160816; b=G3J9bspAdt4b4poh2QZIgoKpvLOZQ5b/WKglsL+ZXkkTVDzJ2xb/Lo9Tt5O8yvd2B/ FZEuUc2yZOJEI218OwAotDYiETJtQCNrZqCC9/ZaYIcuQUeptgp9ZzZ5q/fjs+xjtiyh FQTIC59Hl6OSepXDehS9tKQErDzk62o0e4lTelmQ6J+zwFQCGyS3lUkxLn3Nku1iO39R vBaFT+dtFht69ewuLOr18HUmieZ3VbMqXpqObWGyjCMiX16uRG03+IT89nw0YAoz3l3p Gme6vXYtXb1F93XdfBlfJAdOVmOU0dbcyVg3Yzos/Q3zTxyXJEkFNRU9A6AofotCi382 5ERQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=aqfaiI2uSbaqc7SVVxBdipNN6hP/FX/5wM1EuCwmNw8=; b=xRLyxJK7dEC0pilmQiP++WQCnVUdSZd72GB/K8fBkiajvNAGsjMjrAdERkiqCg96oi 6G10Lz3auWmIvKUUW8Zx90XmumHfJKtJEGGbtbw57RuMs8+m0JceM30IbLvuNK1VLv7S hxHlcjHwanU2hbFK2ceRVN2GPZsOflH+9Uk9gbuF6TRbF6LiNbe04HLViRDPUyyn4e70 NnwOtku/WZXTXz7ylwMAS/DuBN8pQJyz8d5Z2p3/sJX9HVtnCIeyDrR8ojCV6TkTo66q KTBJ8BpOWEXvNOL9qTdFEe/W1xld7Bds8DKFpIEDqh99nh9pydHOiiQk6O/jwu+E6S5v bQeQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga09.intel.com (mga09.intel.com. [134.134.136.24]) by mx.google.com with ESMTPS id e8-v6si22134418pgl.498.2018.09.11.17.45.17 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 11 Sep 2018 17:45:17 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) client-ip=134.134.136.24; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Sep 2018 17:45:17 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,362,1531810800"; d="scan'208";a="69284006" Received: from unknown (HELO yhuang-mobile.sh.intel.com) ([10.239.198.87]) by fmsmga007.fm.intel.com with ESMTP; 11 Sep 2018 17:45:13 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 RESEND 16/21] swap: Free PMD swap mapping when zap_huge_pmd() Date: Wed, 12 Sep 2018 08:44:09 +0800 Message-Id: <20180912004414.22583-17-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180912004414.22583-1-ying.huang@intel.com> References: <20180912004414.22583-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP For a PMD swap mapping, zap_huge_pmd() will clear the PMD and call free_swap_and_cache() to decrease the swap reference count and maybe free or split the huge swap cluster and the THP in swap cache. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/huge_memory.c | 32 +++++++++++++++++++++----------- 1 file changed, 21 insertions(+), 11 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 4e2230583c53..d4e8b4f80543 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2024,7 +2024,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, spin_unlock(ptl); if (is_huge_zero_pmd(orig_pmd)) tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE); - } else if (is_huge_zero_pmd(orig_pmd)) { + } else if (pmd_present(orig_pmd) && is_huge_zero_pmd(orig_pmd)) { zap_deposited_table(tlb->mm, pmd); spin_unlock(ptl); tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE); @@ -2037,17 +2037,27 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, page_remove_rmap(page, true); VM_BUG_ON_PAGE(page_mapcount(page) < 0, page); VM_BUG_ON_PAGE(!PageHead(page), page); - } else if (thp_migration_supported()) { - swp_entry_t entry; - - VM_BUG_ON(!is_pmd_migration_entry(orig_pmd)); - entry = pmd_to_swp_entry(orig_pmd); - page = pfn_to_page(swp_offset(entry)); + } else { + swp_entry_t entry = pmd_to_swp_entry(orig_pmd); + + if (thp_migration_supported() && + is_migration_entry(entry)) + page = pfn_to_page(swp_offset(entry)); + else if (IS_ENABLED(CONFIG_THP_SWAP) && + !non_swap_entry(entry)) + free_swap_and_cache(entry, HPAGE_PMD_NR); + else { + WARN_ONCE(1, +"Non present huge pmd without pmd migration or swap enabled!"); + goto unlock; + } flush_needed = 0; - } else - WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!"); + } - if (PageAnon(page)) { + if (!page) { + zap_deposited_table(tlb->mm, pmd); + add_mm_counter(tlb->mm, MM_SWAPENTS, -HPAGE_PMD_NR); + } else if (PageAnon(page)) { zap_deposited_table(tlb->mm, pmd); add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR); } else { @@ -2055,7 +2065,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, zap_deposited_table(tlb->mm, pmd); add_mm_counter(tlb->mm, mm_counter_file(page), -HPAGE_PMD_NR); } - +unlock: spin_unlock(ptl); if (flush_needed) tlb_remove_page_size(tlb, page, HPAGE_PMD_SIZE); From patchwork Wed Sep 12 00:44:10 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10596553 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 35FF4920 for ; Wed, 12 Sep 2018 00:45:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 255E329AC1 for ; Wed, 12 Sep 2018 00:45:26 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 19BE729AD2; Wed, 12 Sep 2018 00:45:26 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9BE6029AC1 for ; Wed, 12 Sep 2018 00:45:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 292DF8E0013; Tue, 11 Sep 2018 20:45:24 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 241BA8E0011; Tue, 11 Sep 2018 20:45:24 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 10B198E0013; Tue, 11 Sep 2018 20:45:24 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) by kanga.kvack.org (Postfix) with ESMTP id B90E78E0011 for ; Tue, 11 Sep 2018 20:45:23 -0400 (EDT) Received: by mail-pf1-f198.google.com with SMTP id w19-v6so114870pfa.14 for ; Tue, 11 Sep 2018 17:45:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=b7M8l068DUkpjDOr5g3Cef5HXa4IqKizLLwhMmuEeMg=; b=nlEmbY8fOZQdeGzjmTQWyZYBMpmXlz/XPl+7n+Bql/GUHx9kMH4MEdwRaMub3O9f7+ sGtDUMuC+u7tAVA9AY9Lpwhwb75CXtPLGIlxBe4dLkvicGj8FI9g9fHuIEnsdX0PVTwP 8xfwIXdkd44/HiZzitGoF1QH/0KAgPA3830wIcKIker12tgbVgwXn988mI5qToCO/Mwa f/Rg1EbByTyj1gGuSJ+AAhwG1khcPHpLuIFL8pcn1HbGpX+iKI7Zm5no0aHoTdzVuDx4 P9QKLPzPIDpmHJ8QhQ1/bePk0tr/GHDK82yiWU5+TNHYsa1A+2ktOZbj94awdDWuXIca ctaQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51Aq2/a6ztBNg3by5nhEIB2LsMnTP1eCA5sDTLaXHiFYWtsQka63 3JyNBFgs2Wq2N6h7iRn8V+0Cp9UoQe2RzmL/1xVzY+myPgVmtK2qMqEwzJLFVbC+vA79UdqOZuB BzvQ/aNbdVlg/UNh3aSrgcG1X1NHQBWaYoPufhoO2/iwJmC3HCbrDuFCv1L05zZJJiA== X-Received: by 2002:a63:a919:: with SMTP id u25-v6mr31044959pge.211.1536713123409; Tue, 11 Sep 2018 17:45:23 -0700 (PDT) X-Google-Smtp-Source: ANB0Vda+9ev73mZcIvDacWew59Geaqrg7FmX9PpAyYePdnCfjauaaqIwPyQ/C4zHnbValiKE4IeH X-Received: by 2002:a63:a919:: with SMTP id u25-v6mr31044909pge.211.1536713122618; Tue, 11 Sep 2018 17:45:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536713122; cv=none; d=google.com; s=arc-20160816; b=OK5xs3/1QNN18GG42smL9UiB1AwY2bcdxZBRMeLxNirILtHlb5/G9q96mURlHT5ND6 b/P7LBlwg2DRjaoZEmFtQElXHN3nyd17QgH851CWjUCoYfxy1zzodNi7UD9LbFEy9t/b HmGeYXnVbneAIpGelP4lkANgla8BqODLg2y2Reqj89MvVpgHnMiMpU91LOIN60kHQlxf 05naKSOf4UfWyX8LgBY8tqTQzcrKftRWJy06MeqxLzF99Xu8wvqex+5mpfEB7zlGNmCe eD2lZHdW1+wsbSB91xHmXrif10k3t5F+JMnlYNYwJ0+zw0oSEpmhyI793yPzJ+mxQGTg sQgw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=b7M8l068DUkpjDOr5g3Cef5HXa4IqKizLLwhMmuEeMg=; b=XhqxqESyJ3D0Z3BH2FG37MNNCX6FY80tFQ/LPd6WlatraE026lO68kTO0Bm3TQuHsI 3YsvN92t81c4BWCsQT87XXTcVJljFr6kkjzg8eaQONLNB+JkZOdSNk2619k5yiWP7QSi V9XJRylRaQp0PD8c/pySIicrHeGFfxKyjw6aT9RwuczhuTlgbylQ1prxjQldIXY8ULXQ H2VG2bKQtxkH52LCYg2qUr5edEnX1dQgyqi3SsE9mXUzBbUnnGlL2nL8f4ICdQDdXPnz h4np1sgQE3k1hUJwLDREnqnU+Ggj16OgR1V53Lq+EBPUuaElBpYWFLJNjCvDI/7hKpld jHGQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga09.intel.com (mga09.intel.com. [134.134.136.24]) by mx.google.com with ESMTPS id e8-v6si22134418pgl.498.2018.09.11.17.45.22 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 11 Sep 2018 17:45:22 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) client-ip=134.134.136.24; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Sep 2018 17:45:22 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,362,1531810800"; d="scan'208";a="69284021" Received: from unknown (HELO yhuang-mobile.sh.intel.com) ([10.239.198.87]) by fmsmga007.fm.intel.com with ESMTP; 11 Sep 2018 17:45:17 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 RESEND 17/21] swap: Support PMD swap mapping for MADV_WILLNEED Date: Wed, 12 Sep 2018 08:44:10 +0800 Message-Id: <20180912004414.22583-18-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180912004414.22583-1-ying.huang@intel.com> References: <20180912004414.22583-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP During MADV_WILLNEED, for a PMD swap mapping, if THP swapin is enabled for the VMA, the whole swap cluster will be swapin. Otherwise, the huge swap cluster and the PMD swap mapping will be split and fallback to PTE swap mapping. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/madvise.c | 26 ++++++++++++++++++++++++-- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index 07ef599d4255..608c5ae201c6 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -196,14 +196,36 @@ static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start, pte_t *orig_pte; struct vm_area_struct *vma = walk->private; unsigned long index; + swp_entry_t entry; + struct page *page; + pmd_t pmdval; + + pmdval = *pmd; + if (IS_ENABLED(CONFIG_THP_SWAP) && is_swap_pmd(pmdval) && + !is_pmd_migration_entry(pmdval)) { + entry = pmd_to_swp_entry(pmdval); + if (!transparent_hugepage_swapin_enabled(vma)) { + if (!split_swap_cluster(entry, 0)) + split_huge_swap_pmd(vma, pmd, start, pmdval); + } else { + page = read_swap_cache_async(entry, + GFP_HIGHUSER_MOVABLE, + vma, start, false); + if (page) { + /* The swap cluster has been split under us */ + if (!PageTransHuge(page)) + split_huge_swap_pmd(vma, pmd, start, + pmdval); + put_page(page); + } + } + } if (pmd_none_or_trans_huge_or_clear_bad(pmd)) return 0; for (index = start; index != end; index += PAGE_SIZE) { pte_t pte; - swp_entry_t entry; - struct page *page; spinlock_t *ptl; orig_pte = pte_offset_map_lock(vma->vm_mm, pmd, start, &ptl); From patchwork Wed Sep 12 00:44:11 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10596555 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 76998920 for ; Wed, 12 Sep 2018 00:45:31 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6482529AC1 for ; Wed, 12 Sep 2018 00:45:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 44AA529AD2; Wed, 12 Sep 2018 00:45:31 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A03E829AC1 for ; Wed, 12 Sep 2018 00:45:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EC62E8E0014; Tue, 11 Sep 2018 20:45:28 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id E76678E0011; Tue, 11 Sep 2018 20:45:28 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D17B48E0014; Tue, 11 Sep 2018 20:45:28 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f197.google.com (mail-pg1-f197.google.com [209.85.215.197]) by kanga.kvack.org (Postfix) with ESMTP id 8D4358E0011 for ; Tue, 11 Sep 2018 20:45:28 -0400 (EDT) Received: by mail-pg1-f197.google.com with SMTP id s77-v6so120299pgs.2 for ; Tue, 11 Sep 2018 17:45:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=Z6mo2vxKIVbzzmAj4hoPlPERZonwoLdPejVwkVbv0hY=; b=PcDcIQDFp7Nn/uTCFq04pj+k/2bTZpUXfCFd9/rKMF/KeqThFBwRUHMLXqP4KtwRnx DM9HDGyI2TR5irpFiuA3kSwaDuw91CMKIOe197t4tuaXGk+NPZFGWvBTel3DtvKmrPfi 73hJ878Dd2y561jjZ+N6l9HV3Xc5TM19LpKytB6dOYEecxajuHcpHNnQOs1jA8EnyI2w 4PYhWiq8hbdL8jd8qgZmZMVCNBhcN69c8AHOy/hWXEb4xqHBKznLEdJl7ENfBMBYIdhL 7UNJ4C73279r+AGNu9wtbFEIkKtMIT2ekIvAd3qZMrvTdUs/Wmuf3K5hAqmw0qrq5CmM nM0Q== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51DUZ79cIEAPO7/SrpmyMSmuaVo8JpOCmsOZjR2DmttfX3Cbd0Kr o8AukicA5kxvIcSFT6+bD3NTxCIdtjP7E0FJJnfsUI9wWF82dgndqI464a97vIUTVLMbKKiYFOQ 1YgKcx/kYs9OTczGauTugtpMKLJR0kbdsgB/gLzcQdpI2usLcpLqtNUmqaGK00Ms5ig== X-Received: by 2002:a17:902:c7:: with SMTP id a65-v6mr29487193pla.264.1536713128255; Tue, 11 Sep 2018 17:45:28 -0700 (PDT) X-Google-Smtp-Source: ANB0VdbZ88z8AoIiLhA/cuSbR2jwwDKuRxwP55znYzCfLtZLJ4tMBPKflCOt9ygse9BnRTPxzflt X-Received: by 2002:a17:902:c7:: with SMTP id a65-v6mr29487161pla.264.1536713127612; Tue, 11 Sep 2018 17:45:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536713127; cv=none; d=google.com; s=arc-20160816; b=xGj653jHFqEIdAkBQU4yZwY1bgvRoKy908a64zLeaqgwUdUvR7W96g1G2Xok2PSHll PVH7LbhvaTBZTyMm1J00vtidxBVuVfk0sS/7f4CTz8N+Csx6mphuyeXeokoJ3QiapXb2 jtu8AtpsSK6YElxww0Csr5jpNjBp6XADIdCUU0GHZDmp0fEMplcbvzvQYzzWZv0zx99e QKTy9nNvuRTg7j97n4KK4kbBZezDpFmfbx9bfp/w01l3ORE98XwZ7pSYEPv98chCB95f OxIwK0x2S8muVf2X/FyI6vqPwoRldjKlYetIXFB4VQ47SRUYBL4sPNLZbtSsefueeeFF yYeQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=Z6mo2vxKIVbzzmAj4hoPlPERZonwoLdPejVwkVbv0hY=; b=NDbD2H/kn52UWGLrqChlucwTHn38GBaE4T9TM+cYvDWjVPF3Uio6FECpbXVarG48gH P2g+fowDTmUULxqXvev2Ub61IC8KDiSWDJqjDl8FeyRRzl8EzHd7xf4E7p8iXfHZdHzi +QENihFe9kja+vEOg0MounBDnX5UtIEMg0FsOvBgvGM1S3rYXbWqAgEuWbRbzqUxjLby wYblOU4YTw60ZHcF9Q28gTjhhrZ9x2+5kRnH7gKmjwHmODKUKrU9LNqMjOBKYiEHwP96 teBDHtkxCUgs7MZiUJl5OrqtDl0AYKbvJQy4W5WRELU8zYHEuddIz+SoL75d4qto2E9w F9nA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga09.intel.com (mga09.intel.com. [134.134.136.24]) by mx.google.com with ESMTPS id e8-v6si22134418pgl.498.2018.09.11.17.45.27 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 11 Sep 2018 17:45:27 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) client-ip=134.134.136.24; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Sep 2018 17:45:27 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,362,1531810800"; d="scan'208";a="69284045" Received: from unknown (HELO yhuang-mobile.sh.intel.com) ([10.239.198.87]) by fmsmga007.fm.intel.com with ESMTP; 11 Sep 2018 17:45:22 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 RESEND 18/21] swap: Support PMD swap mapping in mincore() Date: Wed, 12 Sep 2018 08:44:11 +0800 Message-Id: <20180912004414.22583-19-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180912004414.22583-1-ying.huang@intel.com> References: <20180912004414.22583-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP During mincore(), for PMD swap mapping, swap cache will be looked up. If the resulting page isn't compound page, the PMD swap mapping will be split and fallback to PTE swap mapping processing. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/mincore.c | 37 +++++++++++++++++++++++++++++++------ 1 file changed, 31 insertions(+), 6 deletions(-) diff --git a/mm/mincore.c b/mm/mincore.c index a66f2052c7b1..a2a66c3c8c6a 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -48,7 +48,8 @@ static int mincore_hugetlb(pte_t *pte, unsigned long hmask, unsigned long addr, * and is up to date; i.e. that no page-in operation would be required * at this time if an application were to map and access this page. */ -static unsigned char mincore_page(struct address_space *mapping, pgoff_t pgoff) +static unsigned char mincore_page(struct address_space *mapping, pgoff_t pgoff, + bool *compound) { unsigned char present = 0; struct page *page; @@ -86,6 +87,8 @@ static unsigned char mincore_page(struct address_space *mapping, pgoff_t pgoff) #endif if (page) { present = PageUptodate(page); + if (compound) + *compound = PageCompound(page); put_page(page); } @@ -103,7 +106,8 @@ static int __mincore_unmapped_range(unsigned long addr, unsigned long end, pgoff = linear_page_index(vma, addr); for (i = 0; i < nr; i++, pgoff++) - vec[i] = mincore_page(vma->vm_file->f_mapping, pgoff); + vec[i] = mincore_page(vma->vm_file->f_mapping, + pgoff, NULL); } else { for (i = 0; i < nr; i++) vec[i] = 0; @@ -127,14 +131,36 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, pte_t *ptep; unsigned char *vec = walk->private; int nr = (end - addr) >> PAGE_SHIFT; + swp_entry_t entry; ptl = pmd_trans_huge_lock(pmd, vma); if (ptl) { - memset(vec, 1, nr); + unsigned char val = 1; + bool compound; + + if (IS_ENABLED(CONFIG_THP_SWAP) && is_swap_pmd(*pmd)) { + entry = pmd_to_swp_entry(*pmd); + if (!non_swap_entry(entry)) { + val = mincore_page(swap_address_space(entry), + swp_offset(entry), + &compound); + /* + * The huge swap cluster has been + * split under us + */ + if (!compound) { + __split_huge_swap_pmd(vma, addr, pmd); + spin_unlock(ptl); + goto fallback; + } + } + } + memset(vec, val, nr); spin_unlock(ptl); goto out; } +fallback: if (pmd_trans_unstable(pmd)) { __mincore_unmapped_range(addr, end, vma, vec); goto out; @@ -150,8 +176,7 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, else if (pte_present(pte)) *vec = 1; else { /* pte is a swap entry */ - swp_entry_t entry = pte_to_swp_entry(pte); - + entry = pte_to_swp_entry(pte); if (non_swap_entry(entry)) { /* * migration or hwpoison entries are always @@ -161,7 +186,7 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, } else { #ifdef CONFIG_SWAP *vec = mincore_page(swap_address_space(entry), - swp_offset(entry)); + swp_offset(entry), NULL); #else WARN_ON(1); *vec = 1; From patchwork Wed Sep 12 00:44:12 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10596557 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 11D846CB for ; Wed, 12 Sep 2018 00:45:37 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0140229AC1 for ; Wed, 12 Sep 2018 00:45:37 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E97EC29AD2; Wed, 12 Sep 2018 00:45:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B61B329AC1 for ; Wed, 12 Sep 2018 00:45:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 26EE58E0015; Tue, 11 Sep 2018 20:45:34 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 1F4078E0011; Tue, 11 Sep 2018 20:45:34 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0BB188E0015; Tue, 11 Sep 2018 20:45:34 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f197.google.com (mail-pl1-f197.google.com [209.85.214.197]) by kanga.kvack.org (Postfix) with ESMTP id BE3598E0011 for ; Tue, 11 Sep 2018 20:45:33 -0400 (EDT) Received: by mail-pl1-f197.google.com with SMTP id c5-v6so121428plo.2 for ; Tue, 11 Sep 2018 17:45:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=cK80HbtMiRTqZZ+MItcUCuXl5VsMka2jv/ic2vFBd/M=; b=WtFlXkcNn5FE68h5wTkN/QBmyOUag+rRcZ9ObZNS/4C25IkMS4SxAE2O10+VQMprRj ddTo3rv+pF+IX763FBFtXu4L/EQ5J/i+LuxJyBHHl4PlKXaLMEnIumMDH9g4Z2NadRSG BJ3SbLBwJf4kipLwOVV1XmjZ+K/FyPg/Wb9q7tMPPmb090EfwBl+ESDdvUoTmwZLyo/+ RuHwquUUSC1imgM3RXPdpRbesmsUzY50wjmFk5OTjSi0yYRCjbrDjG2DHCOmXG19kArz IeCDorlcIomg/uhBSh6CuJO1REzVRPqpqhaOs82i1MUE7qPOrE93ei2bC0MI7bNJ6WHa amTQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51CBWB9iyJhXLZLSQwIM6E4kxv05OD8cfTdRLWTOG4gGj1hI85Ke YLcxSPOLBeh0HR02lg9Nvk3SA1CHW1CatS5elpJtOuKHUDCw58CNlv9Ybe2rHYs4RN0a8+A3m+6 7Wv1hKpWl6RF40eXzUsOSSEMu9YQfWZasqr3UnfWt34rktNJ7hS3/1gri5h35ej/Vgw== X-Received: by 2002:a63:5d1f:: with SMTP id r31-v6mr30965502pgb.445.1536713133456; Tue, 11 Sep 2018 17:45:33 -0700 (PDT) X-Google-Smtp-Source: ANB0VdZudplxOSYlFVoKrK+5VvOowLpk+cfBc9Z9YAE242Kn4EkHrV98mgvoh/51wI1FaIPiuREo X-Received: by 2002:a63:5d1f:: with SMTP id r31-v6mr30965468pgb.445.1536713132663; Tue, 11 Sep 2018 17:45:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536713132; cv=none; d=google.com; s=arc-20160816; b=CnDT7FLpEdVXw6CInF2E3HsxPpQ+CPL5ztfhZHU2yrlSXBqZv0+w5ix1k//6WjLe7N ks2sZjKZvBhQjEdLlrPO+WKmac87COKPpzvotJ0QxroF27BayDyquvba/nY9sVpkRKlv qWGJTBdEp5C2b/kBYbgVqflgbj92kRktjsGeI+ElGXyqK7duBNUW2/qF5gEV0wBvVNdR fH7Wxk2HF4Ei7aae2q/xBtCK0kQ/Z6d9UITKSNg6h1y0TbXjhT5g328YC5t+mmqyidVk LOJIHIVMqnKNuzjPupVUM99666tbvrUN8i1AlsbHG4/SRK2IjgATpoFK+uxL8i/84/wY 16Tw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=cK80HbtMiRTqZZ+MItcUCuXl5VsMka2jv/ic2vFBd/M=; b=hMgWHZxuMipMLVGpBsIJcrLzbrFNVl19Z19RRdkdgygtq/DBWj/zQi4gTtvWLOw+cR tdUPhennsaiKLOqg/qXS6ItdO02MN9tRZjhlRhdHLdmOr1+LYW3dkvT4gB7FqM/RC2vn SFpkr1pFDEOhYapvVTbTMqHH2gXATMr+aCl9q24cJxR5e3U366u4EgydZQC4v/5iH2lO jwtLfBERJYlpGOfGJknrmTiTQUtBSbx4GDsFZLqpxK9HkTP4BMm6NflZktobR/TEocn8 ytWNXRfeEEGrU8+zPH+I7hc5ICf5w4XT/6AHztOFbvqWEp71AXeV26PcjVG6bHNK346h Q1tg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga09.intel.com (mga09.intel.com. [134.134.136.24]) by mx.google.com with ESMTPS id e8-v6si22134418pgl.498.2018.09.11.17.45.32 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 11 Sep 2018 17:45:32 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) client-ip=134.134.136.24; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Sep 2018 17:45:32 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,362,1531810800"; d="scan'208";a="69284059" Received: from unknown (HELO yhuang-mobile.sh.intel.com) ([10.239.198.87]) by fmsmga007.fm.intel.com with ESMTP; 11 Sep 2018 17:45:27 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 RESEND 19/21] swap: Support PMD swap mapping in common path Date: Wed, 12 Sep 2018 08:44:12 +0800 Message-Id: <20180912004414.22583-20-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180912004414.22583-1-ying.huang@intel.com> References: <20180912004414.22583-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Original code is only for PMD migration entry, it is revised to support PMD swap mapping. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- fs/proc/task_mmu.c | 12 +++++------- mm/gup.c | 36 ++++++++++++++++++++++++------------ mm/huge_memory.c | 7 ++++--- mm/mempolicy.c | 2 +- 4 files changed, 34 insertions(+), 23 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 5ea1d64cb0b4..2d968523c57b 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -972,7 +972,7 @@ static inline void clear_soft_dirty_pmd(struct vm_area_struct *vma, pmd = pmd_clear_soft_dirty(pmd); set_pmd_at(vma->vm_mm, addr, pmdp, pmd); - } else if (is_migration_entry(pmd_to_swp_entry(pmd))) { + } else if (is_swap_pmd(pmd)) { pmd = pmd_swp_clear_soft_dirty(pmd); set_pmd_at(vma->vm_mm, addr, pmdp, pmd); } @@ -1302,9 +1302,8 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end, if (pm->show_pfn) frame = pmd_pfn(pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT); - } -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION - else if (is_swap_pmd(pmd)) { + } else if (IS_ENABLED(CONFIG_HAVE_PMD_SWAP_ENTRY) && + is_swap_pmd(pmd)) { swp_entry_t entry = pmd_to_swp_entry(pmd); unsigned long offset; @@ -1317,10 +1316,9 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end, flags |= PM_SWAP; if (pmd_swp_soft_dirty(pmd)) flags |= PM_SOFT_DIRTY; - VM_BUG_ON(!is_pmd_migration_entry(pmd)); - page = migration_entry_to_page(entry); + if (is_pmd_migration_entry(pmd)) + page = migration_entry_to_page(entry); } -#endif if (page && page_mapcount(page) == 1) flags |= PM_MMAP_EXCLUSIVE; diff --git a/mm/gup.c b/mm/gup.c index 1abc8b4afff6..b35b7729b1b7 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -216,6 +216,7 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, spinlock_t *ptl; struct page *page; struct mm_struct *mm = vma->vm_mm; + swp_entry_t entry; pmd = pmd_offset(pudp, address); /* @@ -243,18 +244,22 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, if (!pmd_present(pmdval)) { if (likely(!(flags & FOLL_MIGRATION))) return no_page_table(vma, flags); - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(pmdval)); - if (is_pmd_migration_entry(pmdval)) + entry = pmd_to_swp_entry(pmdval); + if (thp_migration_supported() && is_migration_entry(entry)) { pmd_migration_entry_wait(mm, pmd); - pmdval = READ_ONCE(*pmd); - /* - * MADV_DONTNEED may convert the pmd to null because - * mmap_sem is held in read mode - */ - if (pmd_none(pmdval)) + pmdval = READ_ONCE(*pmd); + /* + * MADV_DONTNEED may convert the pmd to null because + * mmap_sem is held in read mode + */ + if (pmd_none(pmdval)) + return no_page_table(vma, flags); + goto retry; + } + if (IS_ENABLED(CONFIG_THP_SWAP) && !non_swap_entry(entry)) return no_page_table(vma, flags); - goto retry; + WARN_ON(1); + return no_page_table(vma, flags); } if (pmd_devmap(pmdval)) { ptl = pmd_lock(mm, pmd); @@ -276,11 +281,18 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, return no_page_table(vma, flags); } if (unlikely(!pmd_present(*pmd))) { + entry = pmd_to_swp_entry(*pmd); spin_unlock(ptl); if (likely(!(flags & FOLL_MIGRATION))) return no_page_table(vma, flags); - pmd_migration_entry_wait(mm, pmd); - goto retry_locked; + if (thp_migration_supported() && is_migration_entry(entry)) { + pmd_migration_entry_wait(mm, pmd); + goto retry_locked; + } + if (IS_ENABLED(CONFIG_THP_SWAP) && !non_swap_entry(entry)) + return no_page_table(vma, flags); + WARN_ON(1); + return no_page_table(vma, flags); } if (unlikely(!pmd_trans_huge(*pmd))) { spin_unlock(ptl); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index d4e8b4f80543..2aa432830a38 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2091,7 +2091,7 @@ static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl, static pmd_t move_soft_dirty_pmd(pmd_t pmd) { #ifdef CONFIG_MEM_SOFT_DIRTY - if (unlikely(is_pmd_migration_entry(pmd))) + if (unlikely(is_swap_pmd(pmd))) pmd = pmd_swp_mksoft_dirty(pmd); else if (pmd_present(pmd)) pmd = pmd_mksoft_dirty(pmd); @@ -2177,11 +2177,12 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, preserve_write = prot_numa && pmd_write(*pmd); ret = 1; -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION +#if defined(CONFIG_ARCH_ENABLE_THP_MIGRATION) || defined(CONFIG_THP_SWAP) if (is_swap_pmd(*pmd)) { swp_entry_t entry = pmd_to_swp_entry(*pmd); - VM_BUG_ON(!is_pmd_migration_entry(*pmd)); + VM_BUG_ON(!IS_ENABLED(CONFIG_THP_SWAP) && + !is_migration_entry(entry)); if (is_write_migration_entry(entry)) { pmd_t newpmd; /* diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 2e76a8f65e94..32f752d08d09 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -436,7 +436,7 @@ static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr, struct queue_pages *qp = walk->private; unsigned long flags; - if (unlikely(is_pmd_migration_entry(*pmd))) { + if (unlikely(is_swap_pmd(*pmd))) { ret = 1; goto unlock; } From patchwork Wed Sep 12 00:44:13 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10596559 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 677036CB for ; Wed, 12 Sep 2018 00:45:41 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5682729AC1 for ; Wed, 12 Sep 2018 00:45:41 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4AC7029AD2; Wed, 12 Sep 2018 00:45:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8D91D29AC1 for ; Wed, 12 Sep 2018 00:45:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B6BBC8E0016; Tue, 11 Sep 2018 20:45:38 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B1BBA8E0011; Tue, 11 Sep 2018 20:45:38 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9E2B98E0016; Tue, 11 Sep 2018 20:45:38 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f198.google.com (mail-pl1-f198.google.com [209.85.214.198]) by kanga.kvack.org (Postfix) with ESMTP id 598348E0011 for ; Tue, 11 Sep 2018 20:45:38 -0400 (EDT) Received: by mail-pl1-f198.google.com with SMTP id d10-v6so99483pll.22 for ; Tue, 11 Sep 2018 17:45:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=hRS77j/8u92svb4yu+Kh+GJ4P50JwbeZ+RkLakiqSGM=; b=a+Jws0MYmxfs9PIza1MBp7BaevDoJ3epZGatciQVQYN6Z1ooKi9eWfPynMv/tFTcJi bVCu1nyXqRGpU06jzp9rMviP6TTVtwcgdjJAJ7mJyj+lnFLk9zasEnmNns+gCb8jjKEK Px+oNpdc7gdKlQNddWt9NVhc/OjLw5344lP7bhXdJw/xPrXTEAzpc5RchQujStUHdagQ PjbSAYlv+SCGZvNZTPC0zBwyCUfh7WgzkBuKAuhsvNQIRFNOjGhtHL9K/enLpvNnRSr2 +dOUj+eofkTeuXcwTy+8PkU4kHEKkmAOmGaiiTkwTNXQFoLWz/K4f+rZRaHfsanRR+OD xh4g== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51C+aZTw6Obsjqm+24mvWQebI9BOnLz3jBlyIN4HD/Bz4D+luUG2 DubU8UhmTy5CYYszAKOS1hYxk/a7y6ScKnPcIqRmQr5CnFwvm00fev/n9m/4jFBoKHhp+/oCsKE JtDvyivVtMmZMsynaQUTruYztBsIUlWN0OtUxhi7H50Ld8w1/VCujTK4kgRlnKw995Q== X-Received: by 2002:a63:352:: with SMTP id 79-v6mr3314480pgd.112.1536713138023; Tue, 11 Sep 2018 17:45:38 -0700 (PDT) X-Google-Smtp-Source: ANB0Vdbvv4MsQTfWR4VDK61uob9AgH0VQGWO2ad0B7F/qKbT8hkR9lLAxM5OGOE6/I0ySxKG7Jt0 X-Received: by 2002:a63:352:: with SMTP id 79-v6mr3314445pgd.112.1536713137234; Tue, 11 Sep 2018 17:45:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536713137; cv=none; d=google.com; s=arc-20160816; b=Pmc5kyQhOUq5Crt+83gpoeyWSHuxpvQFw5pspS08kRACn52WVk98v2hhkU0powUSFb RnrLDXrQMYHdolq1KTH1Ps1ddALr+YxnEv3kRnFm2Sh+dxD6mgSp5BlBavIYDB3UPVDV xrdRxz54cSrhIjCERHyzAATNXeHG80XwpVeC7irLA4HeahF1vGoHoA2UXS810e/GLhcw /t9+iTNDyINrw2o9spG4Sc/cH/5FE43V/0EYJrX8xxHKmn532K8tgce/n68YgbYqoU8+ DHULX42MllTBeru/fqpoxUXdtgxsrMkknaoOtsCb93JPvCtK/120bMnCL2QHiCy/d/Wg BY/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=hRS77j/8u92svb4yu+Kh+GJ4P50JwbeZ+RkLakiqSGM=; b=ekpHGLNN/Nw24R+MOXAaKbwFNmN7p/ZodB0tULcdcsTF3MkLeJAnFxoYIQ5rtZSHUQ MlR+/BlgtaStRvR9+wph+ZmVeByuxGgcciItpNQ2UCqW+ArzSbbfvusbAsFlS84/W2JP G+J614GPx/L3ZfhvHL2+Th2C9jWvFMrRbgLoWMv7EN8xMgXfq6DRU/TX4JnhKxVj2oKs VnPWHOgzkuOj5CuSJ6bPhGkcYOwj9BEx0CeEVg+tvvSBnTSx9O/n3Mb6+vh34ue5Dlyw 8xCYRME2EunfbFnLR8Vb0p1ZHu1uHyx4wyiAOBqZ1JBSW6WH3ppK3tP35UNzOE1prXe9 qUqg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga09.intel.com (mga09.intel.com. [134.134.136.24]) by mx.google.com with ESMTPS id e8-v6si22134418pgl.498.2018.09.11.17.45.36 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 11 Sep 2018 17:45:37 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) client-ip=134.134.136.24; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Sep 2018 17:45:36 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,362,1531810800"; d="scan'208";a="69284081" Received: from unknown (HELO yhuang-mobile.sh.intel.com) ([10.239.198.87]) by fmsmga007.fm.intel.com with ESMTP; 11 Sep 2018 17:45:32 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 RESEND 20/21] swap: create PMD swap mapping when unmap the THP Date: Wed, 12 Sep 2018 08:44:13 +0800 Message-Id: <20180912004414.22583-21-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180912004414.22583-1-ying.huang@intel.com> References: <20180912004414.22583-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP This is the final step of the THP swapin support. When reclaiming a anonymous THP, after allocating the huge swap cluster and add the THP into swap cache, the PMD page mapping will be changed to the mapping to the swap space. Previously, the PMD page mapping will be split before being changed. In this patch, the unmap code is enhanced not to split the PMD mapping, but create a PMD swap mapping to replace it instead. So later when clear the SWAP_HAS_CACHE flag in the last step of swapout, the huge swap cluster will be kept instead of being split, and when swapin, the huge swap cluster will be read in one piece into a THP. That is, the THP will not be split during swapout/swapin. This can eliminate the overhead of splitting/collapsing, and reduce the page fault count, etc. But more important, the utilization of THP is improved greatly, that is, much more THP will be kept when swapping is used, so that we can take full advantage of THP including its high performance for swapout/swapin. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/huge_mm.h | 11 +++++++++++ mm/huge_memory.c | 30 ++++++++++++++++++++++++++++++ mm/rmap.c | 43 ++++++++++++++++++++++++++++++++++++++++++- mm/vmscan.c | 6 +----- 4 files changed, 84 insertions(+), 6 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 6586c1bfac21..8cbce31bc090 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -405,6 +405,8 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +struct page_vma_mapped_walk; + #ifdef CONFIG_THP_SWAP extern void __split_huge_swap_pmd(struct vm_area_struct *vma, unsigned long haddr, @@ -412,6 +414,8 @@ extern void __split_huge_swap_pmd(struct vm_area_struct *vma, extern int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long address, pmd_t orig_pmd); extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd); +extern bool set_pmd_swap_entry(struct page_vma_mapped_walk *pvmw, + struct page *page, unsigned long address, pmd_t pmdval); static inline bool transparent_hugepage_swapin_enabled( struct vm_area_struct *vma) @@ -453,6 +457,13 @@ static inline int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) return 0; } +static inline bool set_pmd_swap_entry(struct page_vma_mapped_walk *pvmw, + struct page *page, unsigned long address, + pmd_t pmdval) +{ + return false; +} + static inline bool transparent_hugepage_swapin_enabled( struct vm_area_struct *vma) { diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2aa432830a38..542af5836ca5 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1889,6 +1889,36 @@ int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) count_vm_event(THP_SWPIN_FALLBACK); goto fallback; } + +bool set_pmd_swap_entry(struct page_vma_mapped_walk *pvmw, struct page *page, + unsigned long address, pmd_t pmdval) +{ + struct vm_area_struct *vma = pvmw->vma; + struct mm_struct *mm = vma->vm_mm; + pmd_t swp_pmd; + swp_entry_t entry = { .val = page_private(page) }; + + if (swap_duplicate(&entry, HPAGE_PMD_NR) < 0) { + set_pmd_at(mm, address, pvmw->pmd, pmdval); + return false; + } + if (list_empty(&mm->mmlist)) { + spin_lock(&mmlist_lock); + if (list_empty(&mm->mmlist)) + list_add(&mm->mmlist, &init_mm.mmlist); + spin_unlock(&mmlist_lock); + } + add_mm_counter(mm, MM_ANONPAGES, -HPAGE_PMD_NR); + add_mm_counter(mm, MM_SWAPENTS, HPAGE_PMD_NR); + swp_pmd = swp_entry_to_pmd(entry); + if (pmd_soft_dirty(pmdval)) + swp_pmd = pmd_swp_mksoft_dirty(swp_pmd); + set_pmd_at(mm, address, pvmw->pmd, swp_pmd); + + page_remove_rmap(page, true); + put_page(page); + return true; +} #endif static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd) diff --git a/mm/rmap.c b/mm/rmap.c index 3bb4be720bc0..a180cb1fe2db 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1413,11 +1413,52 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, continue; } + address = pvmw.address; + +#ifdef CONFIG_THP_SWAP + /* PMD-mapped THP swap entry */ + if (IS_ENABLED(CONFIG_THP_SWAP) && + !pvmw.pte && PageAnon(page)) { + pmd_t pmdval; + + VM_BUG_ON_PAGE(PageHuge(page) || + !PageTransCompound(page), page); + + flush_cache_range(vma, address, + address + HPAGE_PMD_SIZE); + mmu_notifier_invalidate_range_start(mm, address, + address + HPAGE_PMD_SIZE); + if (should_defer_flush(mm, flags)) { + /* check comments for PTE below */ + pmdval = pmdp_huge_get_and_clear(mm, address, + pvmw.pmd); + set_tlb_ubc_flush_pending(mm, + pmd_dirty(pmdval)); + } else + pmdval = pmdp_huge_clear_flush(vma, address, + pvmw.pmd); + + /* + * Move the dirty bit to the page. Now the pmd + * is gone. + */ + if (pmd_dirty(pmdval)) + set_page_dirty(page); + + /* Update high watermark before we lower rss */ + update_hiwater_rss(mm); + + ret = set_pmd_swap_entry(&pvmw, page, address, pmdval); + mmu_notifier_invalidate_range_end(mm, address, + address + HPAGE_PMD_SIZE); + continue; + } +#endif + /* Unexpected PMD-mapped THP? */ VM_BUG_ON_PAGE(!pvmw.pte, page); subpage = page - page_to_pfn(page) + pte_pfn(*pvmw.pte); - address = pvmw.address; if (PageHuge(page)) { if (huge_pmd_unshare(mm, &address, pvmw.pte)) { diff --git a/mm/vmscan.c b/mm/vmscan.c index fa2c150ab7b9..45968f23462f 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1315,11 +1315,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, * processes. Try to unmap it here. */ if (page_mapped(page)) { - enum ttu_flags flags = ttu_flags | TTU_BATCH_FLUSH; - - if (unlikely(PageTransHuge(page))) - flags |= TTU_SPLIT_HUGE_PMD; - if (!try_to_unmap(page, flags)) { + if (!try_to_unmap(page, ttu_flags | TTU_BATCH_FLUSH)) { nr_unmap_fail++; goto activate_locked; } From patchwork Wed Sep 12 00:44:14 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10596561 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C3554920 for ; Wed, 12 Sep 2018 00:45:45 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B108129AC1 for ; Wed, 12 Sep 2018 00:45:45 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A547929AD2; Wed, 12 Sep 2018 00:45:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 320D229AC1 for ; Wed, 12 Sep 2018 00:45:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 63BB68E0017; Tue, 11 Sep 2018 20:45:43 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 5C3F78E0011; Tue, 11 Sep 2018 20:45:43 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 466FD8E0017; Tue, 11 Sep 2018 20:45:43 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by kanga.kvack.org (Postfix) with ESMTP id 020B48E0011 for ; Tue, 11 Sep 2018 20:45:43 -0400 (EDT) Received: by mail-pf1-f200.google.com with SMTP id c8-v6so130666pfn.2 for ; Tue, 11 Sep 2018 17:45:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=+azsiG7CFPg/G0Yz2lZBoPs78GUaEpoc0nG8JLivGAE=; b=PrZeBK3ewxRb8U8wGhB1lb2g0ts7j3wVupPuEXJ5EWHGgZ4GWIofRUb03gxeF9fAb+ aZgt1TWPYPK+J73ZVwhPpfx51/gK+EtvLVRyql04wtqR/P2TV9zw9x48RoGm2Hmsxn0Z vwPSlYCNc16TL4f8rgFV0xiaIBvInJ2KGegf4Q/ZCC1BjyqDPTNgAooXVLrFWKjkAJxT o45hv/ozb/YzUbzSyqqCrigxJ3Gz1Ram2wTmjyGeSzYzeyqipIdvzoSvQno+/s5tpFus v2B/LmT4R+8ovI8dCiCeWQPhXYJDRPEYmnD0+3bvH1t+MEC3Q5W77Tf5mn2LMYL6Nvrr tuJA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51D0KJcy343i8Q2lRdqGVIqcdsdMzz1e8qIpVNiUWHXbS8As3tAi LEgcZmMHIapwb9rBcDsjHdK49wsPw+h1VMZBcg/banOPqQ1du2QZXsuSb+LxHq88jwhnvKPz4p3 rxz9hLIc2MUhJgA00HUA8VcDPD3+j2nNu8bpMLOBHXCcCz2OcMXkpk9QP+7p4jvMb0w== X-Received: by 2002:a62:c0a:: with SMTP id u10-v6mr32343277pfi.43.1536713142663; Tue, 11 Sep 2018 17:45:42 -0700 (PDT) X-Google-Smtp-Source: ANB0Vdajlnxpp+/NGaAlOADSRcqpRC9k7xjJMNzbKl1f0XHeFAoHbppaV/JcKsFek2Lq3h0arr3u X-Received: by 2002:a62:c0a:: with SMTP id u10-v6mr32343253pfi.43.1536713142034; Tue, 11 Sep 2018 17:45:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536713142; cv=none; d=google.com; s=arc-20160816; b=eb2HxDejaTgPNV+nvdnA8JcC0Xnq1g6YEnPpVFCb+fuhwWo5E5Zkk1psmRRSXWjHAp XYjxH3BeesAvCFrOS/RZkzWw2B7vvtS/KRcDH6VBVM8ZI3ouuLkf+bG6yCOMktDhBH07 Byfg36HtgBsp1nH1GXuakLf2XMXrPF6PgSN6qZ/4iHfTCl3HVlgNydGIHPtWEJ4ff75i zejVU1+h+gGkq+cyHqh6GcgvEc90xhjPULkf2wZRXLkJHlaHGHSeTNd5cFFlLMQIM5/K cmbVPtcynlAZ5l7kt1kp9ciri6uDvskbGO99nJ+AR0v7rNWAp9pAL8NUEWAtHv4ET/cD ELiw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=+azsiG7CFPg/G0Yz2lZBoPs78GUaEpoc0nG8JLivGAE=; b=B2SK6YVkYotGbotWVtnKI3B4zwDc83OP/dtuegeoOOwhWrDYZ0aVe4u52+C2oSUPHr oLJ8QPEwt1ELsrCs9Fw0Bo4mSZyrH4ABg9WIAy/Jzc1D3Q3QKePRClVmfyOWlZ5dgVTq l55eWqWCC0jHOwTLzPItEOvYH+VUheM0Uq5hauig+/MCH8DPQaVBCYfS35JwJJNFJJR7 TKzU87pW9vi+PKrbSsx73xt+oOUeOvWRFVGbMH7+Ma5OrjIAU2hoPZR38b/gV//TGtp4 Jb5F0tw6zvo7hWlDdzP8+8h6FNIQp0VNPk3bVJCLjFiijFQemNCDyNXdrICrz0BXWwPU haxQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga09.intel.com (mga09.intel.com. [134.134.136.24]) by mx.google.com with ESMTPS id e8-v6si22134418pgl.498.2018.09.11.17.45.41 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 11 Sep 2018 17:45:42 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) client-ip=134.134.136.24; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Sep 2018 17:45:41 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,362,1531810800"; d="scan'208";a="69284102" Received: from unknown (HELO yhuang-mobile.sh.intel.com) ([10.239.198.87]) by fmsmga007.fm.intel.com with ESMTP; 11 Sep 2018 17:45:37 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan , Dan Williams Subject: [PATCH -V5 RESEND 21/21] swap: Update help of CONFIG_THP_SWAP Date: Wed, 12 Sep 2018 08:44:14 +0800 Message-Id: <20180912004414.22583-22-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180912004414.22583-1-ying.huang@intel.com> References: <20180912004414.22583-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The help of CONFIG_THP_SWAP is updated to reflect the latest progress of THP (Tranparent Huge Page) swap optimization. Signed-off-by: "Huang, Ying" Reviewed-by: Dan Williams Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/Kconfig | 2 -- 1 file changed, 2 deletions(-) diff --git a/mm/Kconfig b/mm/Kconfig index 9a6e7e27e8d5..cd41bc4382bf 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -425,8 +425,6 @@ config THP_SWAP depends on TRANSPARENT_HUGEPAGE && ARCH_WANTS_THP_SWAP && SWAP help Swap transparent huge pages in one piece, without splitting. - XXX: For now, swap cluster backing transparent huge page - will be split after swapout. For selection by architectures with reasonable THP sizes.