From patchwork Fri Aug 26 22:03:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12956666 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 926E6C0502A for ; Fri, 26 Aug 2022 22:03:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 06CF86B0075; Fri, 26 Aug 2022 18:03:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EEBA26B0078; Fri, 26 Aug 2022 18:03:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D8A2B940007; Fri, 26 Aug 2022 18:03:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id C98906B0075 for ; Fri, 26 Aug 2022 18:03:38 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 9A64FA1018 for ; Fri, 26 Aug 2022 22:03:38 +0000 (UTC) X-FDA: 79843121316.14.AFE4C2B Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) by imf25.hostedemail.com (Postfix) with ESMTP id 531AAA0030 for ; Fri, 26 Aug 2022 22:03:38 +0000 (UTC) Received: by mail-pj1-f73.google.com with SMTP id 36-20020a17090a0fa700b001fd64c962afso1333661pjz.5 for ; Fri, 26 Aug 2022 15:03:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc; bh=7fZ5HILtG3kfI60e7wcW/PIym4PFzvtnF5Uve7FWkPI=; b=p0hsfxqgbV2dQ/5hCSKETRTjued+0EaoD/3V55MfV0RmxDpc4KfZTuJWjxQTpx99st pxAEmB9kUvR+AmsxfvghoRc8a/zJM14by3qHHv5IIkmKN3L/vk9w/D306bziURD/6Hki qgTMODw50WoKz94TrdmBPhLGhvxWEtiCulht/6kiCpyeq6xywwm1ESU+R5q9S4QrgLQK tEW5wOnHze4GHKdIjMC8ocQlUXmROC0rSMpGKuZe9FzzMFBGEgfay8kuWj/sHjJxtj+G vnBMckN/WmHj+JPjqcgaNW6ulATKRCyGtuI4uzhK/Inr2GWmxpYj03jCD0ILjz+ROqSP 2zkg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc; bh=7fZ5HILtG3kfI60e7wcW/PIym4PFzvtnF5Uve7FWkPI=; b=aNuTq+0QdACTQXkpTOh2vYUGKdl709XS/XPNbLSlIERpZhHftz9YYW/fFPWNRsNpwG +mT3q2VG7Xkbz81Oq8Sh8fGoIjdNXxESUx2KOFV+fMbCSCUs0areiDXFFyE3pBCcJ1tj lHg6bUkjBmhAW8yfRQu++1kZeRIzIq2Xl3bFP3D/rMbprxyp23MtHqpfw8ZKR+8/QzZs L70z1GvPp2qgEpE4KFWrnDKRZVkM8898BFI3Th0Bvy2hegIhrFi+WgabAAGVAhxwoFeN 1KKtWhUuQAWFLfuCq+lbn8FiCNAJDGq7rVMO6wwOmm3U4Xg7Th3lLeIlGjgEtjQFHwcK rQLA== X-Gm-Message-State: ACgBeo0MENUO6U/gFpyAKOMXVloAEu6qq5BoJE1yAtxGBNwVQ3P9CDMs uQHuMdylpBO7pyJ1gC+AeBKc03N02Rm7CZLUmmrw9F49AfR5osI2wuoqTe+RRWxxCCMoOylGLD0 Om2ekyWo6f31IDX/XUJpc/dKSIn3Va4DtJh8bBXCNk7Hwwahz0FXNVOx4svk= X-Google-Smtp-Source: AA6agR5lAaXB3xAaTmvjxHyDnTyBZEmH01vI89GGomOXxhDxIgMXUdPHILDhJa8R/p05mHWcZDIUSWsVfNud X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:90b:612:b0:1fd:5ec1:6c74 with SMTP id gb18-20020a17090b061200b001fd5ec16c74mr5372174pjb.221.1661551417230; Fri, 26 Aug 2022 15:03:37 -0700 (PDT) Date: Fri, 26 Aug 2022 15:03:21 -0700 In-Reply-To: <20220826220329.1495407-1-zokeefe@google.com> Mime-Version: 1.0 References: <20220826220329.1495407-1-zokeefe@google.com> X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog Message-ID: <20220826220329.1495407-3-zokeefe@google.com> Subject: [PATCH mm-unstable v2 2/9] mm/khugepaged: attempt to map file/shmem-backed pte-mapped THPs by pmds From: "Zach O'Keefe" To: linux-mm@kvack.org Cc: Andrew Morton , linux-api@vger.kernel.org, Axel Rasmussen , James Houghton , Hugh Dickins , Yang Shi , Miaohe Lin , David Hildenbrand , David Rientjes , Matthew Wilcox , Pasha Tatashin , Peter Xu , Rongwei Wang , SeongJae Park , Song Liu , Vlastimil Babka , Chris Kennelly , "Kirill A. Shutemov" , Minchan Kim , Patrick Xia , "Zach O'Keefe" ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=p0hsfxqg; spf=pass (imf25.hostedemail.com: domain of 3OUMJYwcKCAQ3soiijiksskpi.gsqpmry1-qqozego.svk@flex--zokeefe.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=3OUMJYwcKCAQ3soiijiksskpi.gsqpmry1-qqozego.svk@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661551418; a=rsa-sha256; cv=none; b=h0sV5s4JdX1WSYFpsjpK8el8btsw64cWb5Jbpd3O6tG68x5wtXhOeDhXBq0hH4nK/9o0ii SIiRV9xC7znlbTE8i4f0qQo9fyhGlFMEh/MJhS7T2lebNdDf5SThtN8GuoX+fKq03KVmKT B1WDOuZsYv8IScQrmyw/8reol+pzxJ0= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661551418; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7fZ5HILtG3kfI60e7wcW/PIym4PFzvtnF5Uve7FWkPI=; b=Z3jIVFV7PSpl/Bp8I2YzCcd7e1GQcZOJjipU7ItvQ2x10Plr5HV30TmrP+lo6DtWEOM+2v jH5nf5eDS1VnZSKyOufLyqh/kEgHuJm/ouEpC/LBREBJjqnqu6DIHdiVEnUFau5L8iN2tS +XDWBIxL+xr6mpDp5oOMeYzIzMZWzlY= X-Rspam-User: X-Stat-Signature: p514nuxxkqzjtmkuaoegrywjy1qge4rh X-Rspamd-Queue-Id: 531AAA0030 Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=p0hsfxqg; spf=pass (imf25.hostedemail.com: domain of 3OUMJYwcKCAQ3soiijiksskpi.gsqpmry1-qqozego.svk@flex--zokeefe.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=3OUMJYwcKCAQ3soiijiksskpi.gsqpmry1-qqozego.svk@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam09 X-HE-Tag: 1661551418-899221 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The main benefit of THPs are that they can be mapped at the pmd level, increasing the likelihood of TLB hit and spending less cycles in page table walks. pte-mapped hugepages - that is - hugepage-aligned compound pages of order HPAGE_PMD_ORDER - although being contiguous in physical memory, don't have this advantage. In fact, one could argue they are detrimental to system performance overall since they occupy a precious hugepage-aligned/sized region of physical memory that could otherwise be used more effectively. Additionally, pte-mapped hugepages can be the cheapest memory to collapse for khugepaged since no new hugepage allocation or copying of memory contents is necessary - we only need to update the mapping page tables. In the anonymous collapse path, we are able to collapse pte-mapped hugepages (albeit, perhaps suboptimally), but the file/shmem path makes no effort when compound pages (of any order) are encountered. Identify pte-mapped hugepages in the file/shmem collapse path. In khugepaged context, attempt to update page tables mapping this hugepage. Note that these collapses still count towards the /sys/kernel/mm/transparent_hugepage/khugepaged/pages_collapsed counter, and if the pte-mapped hugepage was also mapped into multiple process' address spaces, could be incremented for each page table update. Since we increment the counter when a pte-mapped hugepage is successfully added to the list of to-collapse pte-mapped THPs, it's possible that we never actually update the page table either. This is different from how file/shmem pages_collapsed accounting works today where only a successful page cache update is counted (it's also possible here that no page tables are actually changed). Though it incurs some slop, this is preferred to either not accounting for the event at all, or plumbing through data in struct mm_slot on whether to account for the collapse or not. Note that work still needs to be done to support arbitrary compound pages, and that this should all be converted to using folios. Signed-off-by: Zach O'Keefe --- include/trace/events/huge_memory.h | 1 + mm/khugepaged.c | 49 ++++++++++++++++++++++++++---- 2 files changed, 44 insertions(+), 6 deletions(-) diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h index 55392bf30a03..fbbb25494d60 100644 --- a/include/trace/events/huge_memory.h +++ b/include/trace/events/huge_memory.h @@ -17,6 +17,7 @@ EM( SCAN_EXCEED_SHARED_PTE, "exceed_shared_pte") \ EM( SCAN_PTE_NON_PRESENT, "pte_non_present") \ EM( SCAN_PTE_UFFD_WP, "pte_uffd_wp") \ + EM( SCAN_PTE_MAPPED_HUGEPAGE, "pte_mapped_hugepage") \ EM( SCAN_PAGE_RO, "no_writable_page") \ EM( SCAN_LACK_REFERENCED_PAGE, "lack_referenced_page") \ EM( SCAN_PAGE_NULL, "page_null") \ diff --git a/mm/khugepaged.c b/mm/khugepaged.c index d8e388106322..6022a08db1cd 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -34,6 +34,7 @@ enum scan_result { SCAN_EXCEED_SHARED_PTE, SCAN_PTE_NON_PRESENT, SCAN_PTE_UFFD_WP, + SCAN_PTE_MAPPED_HUGEPAGE, SCAN_PAGE_RO, SCAN_LACK_REFERENCED_PAGE, SCAN_PAGE_NULL, @@ -1349,18 +1350,22 @@ static void collect_mm_slot(struct mm_slot *mm_slot) * Notify khugepaged that given addr of the mm is pte-mapped THP. Then * khugepaged should try to collapse the page table. */ -static void khugepaged_add_pte_mapped_thp(struct mm_struct *mm, +static bool khugepaged_add_pte_mapped_thp(struct mm_struct *mm, unsigned long addr) { struct mm_slot *mm_slot; + bool ret = false; VM_BUG_ON(addr & ~HPAGE_PMD_MASK); spin_lock(&khugepaged_mm_lock); mm_slot = get_mm_slot(mm); - if (likely(mm_slot && mm_slot->nr_pte_mapped_thp < MAX_PTE_MAPPED_THP)) + if (likely(mm_slot && mm_slot->nr_pte_mapped_thp < MAX_PTE_MAPPED_THP)) { mm_slot->pte_mapped_thp[mm_slot->nr_pte_mapped_thp++] = addr; + ret = true; + } spin_unlock(&khugepaged_mm_lock); + return ret; } static void collapse_and_free_pmd(struct mm_struct *mm, struct vm_area_struct *vma, @@ -1397,9 +1402,16 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr) pte_t *start_pte, *pte; pmd_t *pmd; spinlock_t *ptl; - int count = 0; + int count = 0, result = SCAN_FAIL; int i; + mmap_assert_write_locked(mm); + + /* Fast check before locking page if already PMD-mapped */ + result = find_pmd_or_thp_or_none(mm, haddr, &pmd); + if (result != SCAN_SUCCEED) + return; + if (!vma || !vma->vm_file || !range_in_vma(vma, haddr, haddr + HPAGE_PMD_SIZE)) return; @@ -1748,7 +1760,11 @@ static int collapse_file(struct mm_struct *mm, struct file *file, * we locked the first page, then a THP might be there already. */ if (PageTransCompound(page)) { - result = SCAN_PAGE_COMPOUND; + result = compound_order(page) == HPAGE_PMD_ORDER && + index == start + /* Maybe PMD-mapped */ + ? SCAN_PTE_MAPPED_HUGEPAGE + : SCAN_PAGE_COMPOUND; goto out_unlock; } @@ -1986,7 +2002,11 @@ static int khugepaged_scan_file(struct mm_struct *mm, struct file *file, * into a PMD sized page */ if (PageTransCompound(page)) { - result = SCAN_PAGE_COMPOUND; + result = compound_order(page) == HPAGE_PMD_ORDER && + xas.xa_index == start + /* Maybe PMD-mapped */ + ? SCAN_PTE_MAPPED_HUGEPAGE + : SCAN_PAGE_COMPOUND; break; } @@ -2046,6 +2066,12 @@ static int khugepaged_scan_file(struct mm_struct *mm, struct file *file, static void khugepaged_collapse_pte_mapped_thps(struct mm_slot *mm_slot) { } + +static bool khugepaged_add_pte_mapped_thp(struct mm_struct *mm, + unsigned long addr) +{ + return false; +} #endif static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result, @@ -2137,8 +2163,19 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result, &mmap_locked, cc); } - if (*result == SCAN_SUCCEED) + switch (*result) { + case SCAN_PTE_MAPPED_HUGEPAGE: + if (!khugepaged_add_pte_mapped_thp(mm, + khugepaged_scan.address)) + break; + fallthrough; + case SCAN_SUCCEED: ++khugepaged_pages_collapsed; + break; + default: + break; + } + /* move to next address */ khugepaged_scan.address += HPAGE_PMD_SIZE; progress += HPAGE_PMD_NR;