From patchwork Sun Apr 10 13:54:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12808144 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1604AC433EF for ; Sun, 10 Apr 2022 13:54:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0DDD86B0071; Sun, 10 Apr 2022 09:54:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 066A06B0073; Sun, 10 Apr 2022 09:54:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E22A96B0074; Sun, 10 Apr 2022 09:54:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id CBA826B0071 for ; Sun, 10 Apr 2022 09:54:58 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id A411723AE9 for ; Sun, 10 Apr 2022 13:54:58 +0000 (UTC) X-FDA: 79341115476.09.8CAD653 Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com [209.85.210.201]) by imf08.hostedemail.com (Postfix) with ESMTP id 2C4FC160002 for ; Sun, 10 Apr 2022 13:54:58 +0000 (UTC) Received: by mail-pf1-f201.google.com with SMTP id n22-20020a056a00213600b005056a13e1c1so5384885pfj.20 for ; Sun, 10 Apr 2022 06:54:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:message-id:mime-version:subject:from:to:cc :content-transfer-encoding; bh=HY9Yx7cVfSX5YPVvymviMOUlhSImRT4uEyYn4Tk6cKo=; b=sejCb2so6fPo9Z//B8LqKkO1XGgj+6njdhbs1n67POARuQ1u2bjVdwPDokkh8Sw756 9vzblxxov/Oe8K5tki1Bnxz8JqnbRZ+cKa6ChOt27RrlQGWAkvtZ/SXrSuLi2WF+qTCa cmfFqdzj5VUl7+5CrtHLITj8VsjL5p4gs+hKzbPcmK9US7Etjuqo710ix2Jqre9Eqdnn 8/lYioAoTJBEfHZqQ0xgsRwDuq47M7KX6LcaYsCvpmYo2idiJfR50AogbM9M1CKi4jkq xwElzflDz05NlZigvp4dSPL8xLGmPfGmUzf2Ah77eXa8nSeTsixkuYpReSK9sxAVFo5W wVjg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:message-id:mime-version:subject:from:to:cc :content-transfer-encoding; bh=HY9Yx7cVfSX5YPVvymviMOUlhSImRT4uEyYn4Tk6cKo=; b=wnIpV5a+PoFJMHjFBJRpKcUxahO0T1vOj5zrg5vmKGRhynyuW9StMkn6Pl/bDb11W7 F5CcxmgAhdfpSYHpdeFm+xK8HKEhQmFBDkRSMAfRyGDxq3vG0F9NejPglx1UZJfOg2Wz n1wXAREIasWoobeLeMuacD7DPWc9e/OKelO76ypE+Qw/w6+3hvUsFpc6Up7XgDJBfLsB 5Hl62LKPzb1/0j6fNhJgX2+aTD1K+8oSMw4aCV51vHlznsm3lEwiRPSWrAwSzoHk/+ZE bZF44wZpkZgEblM4aO0wdMLzSJbRfeYm9m2DNIVqmeY2C2AdELXBOZ0uhbYfmEMUWWvA ThMg== X-Gm-Message-State: AOAM531jraorNcNPA7vq9SzA/s63c9RZ14ecEUN5MTqbjHxOcfc5xhbz R0ofK16V2AJ7nu1w2xV/Z3AtpayjrVg/ X-Google-Smtp-Source: ABdhPJz2j8kwo5/avcGH0tky6FFookvo/cGekeAI4S4SObLBFHI07xLnXs7AR0Mjzzk4U0yjXWUEDYg/i1p3 X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a63:ed05:0:b0:39d:4f84:1fe3 with SMTP id d5-20020a63ed05000000b0039d4f841fe3mr1171412pgi.420.1649598896829; Sun, 10 Apr 2022 06:54:56 -0700 (PDT) Date: Sun, 10 Apr 2022 06:54:33 -0700 Message-Id: <20220410135445.3897054-1-zokeefe@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.35.1.1178.g4f1659d476-goog Subject: [PATCH 00/12] mm: userspace hugepage collapse From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Peter Xu , Thomas Bogendoerfer , "Zach O'Keefe" Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=sejCb2so; spf=pass (imf08.hostedemail.com: domain of 3sOFSYgcKCNcSHD77879HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--zokeefe.bounces.google.com designates 209.85.210.201 as permitted sender) smtp.mailfrom=3sOFSYgcKCNcSHD77879HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Stat-Signature: g3gk4x9uhty7cw11bwgmrjh9dzjuqnxk X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 2C4FC160002 X-HE-Tag: 1649598898-241578 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Introduction -------------------------------- This series provides a mechanism for userspace to induce a collapse of eligible ranges of memory into transparent hugepages in process context, thus permitting users to more tightly control their own hugepage utilization policy at their own expense. This idea was introduced by David Rientjes[1], and the semantics and implementation were introduced and discussed in a previous PATCH RFC[2]. Interface -------------------------------- The proposed interface adds a new madvise(2) mode, MADV_COLLAPSE, and leverages the new process_madvise(2) call. (*) process_madvise(2) Performs a synchronous collapse of the native pages mapped by the list of iovecs into transparent hugepages. Allocation semantics are the same as khugepaged, and depend on (1) the active sysfs settings /sys/kernel/mm/transparent_hugepage/enabled and /sys/kernel/mm/transparent_hugepage/khugepaged/defrag, and (2) the VMA flags of the memory range being collapsed. Collapse eligibility criteria differs from khugepaged in that the sysfs files /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_[none|swap|shared] are ignored. When a range spans multiple hugepage-aligned/sized regions, the semantics of the collapse of each region is independent from the others. Caller must have CAP_SYS_ADMIN if not acting on self. Return value follows existing process_madvise(2) conventions. A “success” indicates that all hugepage-sized/aligned regions covered by the provided range were either successfully collapsed, or were already pmd-mapped THPs. (*) madvise(2) Equivalent to process_madvise(2) on self, with 0 returned on “success”. Future work -------------------------------- Only private anonymous memory is supported by this series. File and shmem memory support will be added later. One possible user of this functionality is a userspace agent that attempts to optimize THP utilization system-wide by allocating THPs based on, for example, task priority, task performance requirements, or heatmaps. For the latter, one idea that has already surfaced is using DAMON to identify hot regions, and driving THP collapse through a new DAMOS_COLLAPSE scheme[3]. Sequence of Patches -------------------------------- Patches 1-4 perform refactoring of collapse logic within khugepaged.c and introduce the notion of a collapse context. Patches 5-9 introduces MADV_COLLAPSE, does some renaming, adds support so that MADV_COLLAPSE context has the eligibility and allocation semantics referenced above, and adds process_madivse(2) support. Patches 10-12 add selftests to test the new functionality. Applies against next-20220408. [1] https://lore.kernel.org/all/C8C89F13-3F04-456B-BA76-DE2C378D30BF@nvidia.com/ [2] https://lore.kernel.org/linux-mm/20220308213417.1407042-1-zokeefe@google.com/ [3] https://lore.kernel.org/lkml/bcc8d9a0-81d-5f34-5e4-fcc28eb7ce@google.com/T/ Zach O'Keefe (13): mm/khugepaged: separate hugepage preallocation and cleanup mm/khugepaged: record SCAN_PMD_MAPPED when scan_pmd() finds THP mm/khugepaged: add struct collapse_control mm/khugepaged: make hugepage allocation context-specific mm/khugepaged: add struct collapse_result mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse mm/khugepaged: remove khugepaged prefix from shared collapse functions mm/khugepaged: add flag to ignore khugepaged_max_ptes_* mm/khugepaged: add flag to ignore page young/referenced requirement mm/madvise: add MADV_COLLAPSE to process_madvise() selftests/vm: modularize collapse selftests selftests/vm: add MADV_COLLAPSE collapse context to selftests selftests/vm: add test to verify recollapse of THPs include/linux/huge_mm.h | 12 + include/trace/events/huge_memory.h | 5 +- include/uapi/asm-generic/mman-common.h | 2 + mm/internal.h | 1 + mm/khugepaged.c | 598 ++++++++++++++++-------- mm/madvise.c | 11 +- mm/rmap.c | 15 +- tools/testing/selftests/vm/khugepaged.c | 417 +++++++++++------ 8 files changed, 702 insertions(+), 359 deletions(-)