From patchwork Thu Jul 4 04:30:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oscar Salvador X-Patchwork-Id: 13723178 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AF8A5C30653 for ; Thu, 4 Jul 2024 04:31:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 08FD16B0085; Thu, 4 Jul 2024 00:31:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 017656B0088; Thu, 4 Jul 2024 00:31:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DD39D6B008A; Thu, 4 Jul 2024 00:31:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id BBCA06B0085 for ; Thu, 4 Jul 2024 00:31:53 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 5CC4140A22 for ; Thu, 4 Jul 2024 04:31:53 +0000 (UTC) X-FDA: 82300797306.26.DA25DD8 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by imf17.hostedemail.com (Postfix) with ESMTP id 3750140008 for ; Thu, 4 Jul 2024 04:31:50 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=CSJ5Fy12; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=S1ou7Q+e; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=CSJ5Fy12; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=S1ou7Q+e; spf=pass (imf17.hostedemail.com: domain of osalvador@suse.de designates 195.135.223.131 as permitted sender) smtp.mailfrom=osalvador@suse.de; dmarc=pass (policy=none) header.from=suse.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720067498; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=vjq/LdX8GfGKHPJUSWIjWZoCQH5MDyUKxUAhI9Il2Vs=; b=BxUQR2bDuetx4eVjkOIFferGCivbXHNr9WE2XLsvfVnWQOi5HV4pMzOLc5YglpW6KEINOr g97R5uu9pU1u4kGHXdyc7U5Hp0q5+h6LLAiPYufgSVFhsueThcm6qHywewn1nFjbuJuyxa IyXQ+gUyjOqgya5tjT34EhbQEr+7lCY= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=CSJ5Fy12; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=S1ou7Q+e; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=CSJ5Fy12; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=S1ou7Q+e; spf=pass (imf17.hostedemail.com: domain of osalvador@suse.de designates 195.135.223.131 as permitted sender) smtp.mailfrom=osalvador@suse.de; dmarc=pass (policy=none) header.from=suse.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720067498; a=rsa-sha256; cv=none; b=llhXW69cZ4EWqR8fRbhqA5aJQ6Bo2uBVTHVUccFLfzjRVZW3shScAGtERG1U6cSs/VbwDp JoFZVyVVl0UNgh40LwKYUwXedm4jWyImYFQGhwDEpN400sfhIQphsZNYRktlU6fV67eezu E+sr2RRNZLDLVzntR/NlYrxFvawoIsw= Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 64F881FCF8; Thu, 4 Jul 2024 04:31:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1720067509; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=vjq/LdX8GfGKHPJUSWIjWZoCQH5MDyUKxUAhI9Il2Vs=; b=CSJ5Fy129t7GTIAnVb/zkJULdKGWTD2VGj2snFSMVu5AAcfFC5HbSxcLQ9PY32t2hIVEWt i/BEmJYIRY0m1t3zXxXbwqZtAtNhf8sWlD2QzjC9YwAKHyhEGNGKJZRZiBL76qyYEV9rDb IGTzREEq9NLXdTHio2Z9u0ygr0jQr/8= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1720067509; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=vjq/LdX8GfGKHPJUSWIjWZoCQH5MDyUKxUAhI9Il2Vs=; b=S1ou7Q+eCw7aHphdLfENGgWuIiJEXjKWkUYeQFjWklRoGxLGffu1GhhP7dDMifV7RZngQx CMCoziXylTqF5uDQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1720067509; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=vjq/LdX8GfGKHPJUSWIjWZoCQH5MDyUKxUAhI9Il2Vs=; b=CSJ5Fy129t7GTIAnVb/zkJULdKGWTD2VGj2snFSMVu5AAcfFC5HbSxcLQ9PY32t2hIVEWt i/BEmJYIRY0m1t3zXxXbwqZtAtNhf8sWlD2QzjC9YwAKHyhEGNGKJZRZiBL76qyYEV9rDb IGTzREEq9NLXdTHio2Z9u0ygr0jQr/8= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1720067509; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=vjq/LdX8GfGKHPJUSWIjWZoCQH5MDyUKxUAhI9Il2Vs=; b=S1ou7Q+eCw7aHphdLfENGgWuIiJEXjKWkUYeQFjWklRoGxLGffu1GhhP7dDMifV7RZngQx CMCoziXylTqF5uDQ== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 350F013889; Thu, 4 Jul 2024 04:31:48 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id d7iFB7QlhmbnDAAAD6G6ig (envelope-from ); Thu, 04 Jul 2024 04:31:48 +0000 From: Oscar Salvador To: Andrew Morton Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Peter Xu , Muchun Song , David Hildenbrand , SeongJae Park , Miaohe Lin , Michal Hocko , Matthew Wilcox , Christophe Leroy , Oscar Salvador Subject: [PATCH 00/45] hugetlb pagewalk unification Date: Thu, 4 Jul 2024 06:30:47 +0200 Message-ID: <20240704043132.28501-1-osalvador@suse.de> X-Mailer: git-send-email 2.44.0 MIME-Version: 1.0 X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 3750140008 X-Stat-Signature: qdiooiqemuigfijcukaxqxwxdc9o1iq6 X-Rspam-User: X-HE-Tag: 1720067510-347531 X-HE-Meta: U2FsdGVkX194s48mdsixgbOTHc3BFu/tDHuLST33hqqXXvT1b2MqqFV7+TjmW4wJyBsuXjO0DGQeuQ6ewRjG3N36hTh+muI2AEKNqEuaiPwBvZFwAkSB8tMQxKWhkJ21bMslipfEseTR3WIvUxFwsaBfbFbMff1/jd5Nb9qKcck3cxnKBh1Zv3g9OmLwxjzmLnrqf83x4B2WzwG91PVLCYkXM2jTHLjtcTNZuUykHxf//HbGfQFJC0cENVFas0Xoi+ZPUMguNvCv18M8Sxx3O+yGX+bB4Ro7MAl+BUhZNM9zdAH5n6yM9qZSBXu5Uw1culB5/9CAdyrXd4ynmmNUK8QwRWcMuqLg77e0GwAESbLuvf8VStCmNMwkaFj0+xQs5Qg7nPrx5qDabVXqtnChtNX0hnTjSGT2uVlGOXeCFenNM9UZlLXBMDsQTu5HSJBrWNVQFJMIJ6WIPfl5n1DWDRE0aq1rbkD5lRJbWQPiLBfV1NO7URGmxRDaaCMHnKz8rdW5P6FZMXjkt0ahkuoUrF+UzPZzANldmqRj3TltqCwCUxl6hwh9ZGKvDupiyooh5+cTe3VdYRPqJP0RnLR0VM4nBdQDa3cON8CAnQboac92qcja698IUng5Pl3snr5Y1TKXdShCVxXXDXi8hTq8DkPn99fMhbE/tZaFwOX7Jt0YgVTvuvJHEOet+ByrTmHFU74xfriuMFlGK4NDkm71Ol7kIRBsR8ZXSIDfzdPEEVn0LFOPWihjaE8pdj0FQRxrJr587OI29JtpI0H027oTdfy73uayIynUql54mHesXvMO0QJbqOjxPuIQa+1cqzTGvmLYSpD0aAmGNqDvARirGIF7c1h6EhVLD/iCubGWxwyIzIWLvNg1U5cWxp/s5QEUg8oWnHR2aOWX8SdjCxYMBDHX8YiiLaxx/hedmbYim1nz1AX0HGE+j+zd9XEWcjPDBIAMmgNfaNMrwHO+8LW gKq7LgVA Yyo+FGvBekeh7Y4rZBgB7NblrYGfk6pWihPn0HG5LBOjWPGVR9Sezk0zGFTG3eNMdDUim1QIvwPk0xarFLwOPY+1r7QnMp1k7Sya/ZhynYEKNjg5n+XYdAK2kYpl+6HuKryUGUh1CNl/L8u1fBXskcQMhZhFLV6fWg32RUQXaT5zJw4KzBbRJHMEqntu57pIMbSJLOx0hbnxqb8Sx8itGd/aLKsxpJeMkq4C74N5gBbR/2uo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi all, During Peter's talk at the LSFMM, it was agreed that one of the things that need to be done in order to further integrate hugetlb into mm core, is to unify generic and hugetlb pagewalkers. I started with this one, which is unifying hugetlb into generic pagewalk, instead of having its hugetlb_entry entries. Which means that pmd_entry/pte_entry(for cont-pte) entries will also deal with hugetlb vmas as well, and so will new pud_entry entries since hugetlb can be pud mapped (devm pages as well but we seem not to care about those with the exception of hmm code). The outcome is this RFC. Before you continue, let me clarify certain points: This patchset is not yet finished, as there are things that 1) need more thought, 2) are still broken (like the hmm bits as I am clueless about that) 3) some paths have not been tested at all. The things I tested were: - memory-failure - smaps/numa_maps/pagemap (the latter only for pud/pmd, not cont-{pmd,ptes} - mempolicy on arm64 (for 64KB and 32M hugetlb pages) and on x86_64 (for 2MB and 1GB hugetlb pages). More tests need to be conducted, and I plan to borrow a pp64le machine to also carry out some tests there, but for now this is what my bandwith allowed me to do. I am well aware that there are two things that might scare people, one being the number of patches, and the other being the amount of code added. For the former, I will by no means ask anyone to review 45 patches, but since this patchset touches isolated paths (damon, mincore, hmm, task_mmu, memory-failure, mempolicy), I will point out some people that might be able to help me out with those different bits: - Miaohe for memory-failure bits - David for task_mmu bits - SeongJae Park for damon bits - Jerome for hmm bits - feel freel to join for the rest I think that that might be a good approach, and instead of having to review 45 patches, one has only to review at most 5 or 6. For the latter, there is an explanation: hugetlb operates on ptes (although it allocates puds/pmds and the operations work on that level too), which means that now that we will handle PUD/PMD-mapped hugetlb with {pud,pmd}_* operations, we need to introduce quite a few functions that do not exist yet and we need from now onwards. Although I am sending this out, this is not a "rfc ready material", as I said there are still things that need to be improved/fixed/tested, but I wanted to make it public nevertheless so we can gather some constructive feedback that helps us moving in the right direction and to also widen the discussions. So take this more of a "Hey, let me show what I am doing and call me out on things you consider wrong". Thanks in advance Oscar Salvador (45): arch/x86: Drop own definition of pgd,p4d_leaf mm: Add {pmd,pud}_huge_lock helper mm/pagewalk: Move vma_pgtable_walk_begin and vma_pgtable_walk_end upfront mm/pagewalk: Only call pud_entry when we have a pud leaf mm/pagewalk: Enable walk_pmd_range to handle cont-pmds mm/pagewalk: Do not try to split non-thp pud or pmd leafs arch/s390: Enable __s390_enable_skey_pmd to handle hugetlb vmas fs/proc: Enable smaps_pmd_entry to handle PMD-mapped hugetlb vmas mm: Implement pud-version functions for swap and vm_normal_page_pud fs/proc: Create smaps_pud_range to handle PUD-mapped hugetlb vmas fs/proc: Enable smaps_pte_entry to handle cont-pte mapped hugetlb vmas fs/proc: Enable pagemap_pmd_range to handle hugetlb vmas mm: Implement pud-version uffd functions fs/proc: Create pagemap_pud_range to handle PUD-mapped hugetlb vmas fs/proc: Adjust pte_to_pagemap_entry for hugetlb vmas fs/proc: Enable pagemap_scan_pmd_entry to handle hugetlb vmas mm: Implement pud-version for pud_mkinvalid and pudp_establish fs/proc: Create pagemap_scan_pud_entry to handle PUD-mapped hugetlb vmas fs/proc: Enable gather_pte_stats to handle hugetlb vmas fs/proc: Enable gather_pte_stats to handle cont-pte mapped hugetlb vmas fs/proc: Create gather_pud_stats to handle PUD-mapped hugetlb pages mm/mempolicy: Enable queue_folios_pmd to handle hugetlb vmas mm/mempolicy: Create queue_folios_pud to handle PUD-mapped hugetlb vmas mm/memory_failure: Enable check_hwpoisoned_pmd_entry to handle hugetlb vmas mm/memory-failure: Create check_hwpoisoned_pud_entry to handle PUD-mapped hugetlb vmas mm/damon: Enable damon_young_pmd_entry to handle hugetlb vmas mm/damon: Create damon_young_pud_entry to handle PUD-mapped hugetlb vmas mm/damon: Enable damon_mkold_pmd_entry to handle hugetlb vmas mm/damon: Create damon_mkold_pud_entry to handle PUD-mapped hugetlb vmas mm,mincore: Enable mincore_pte_range to handle hugetlb vmas mm/mincore: Create mincore_pud_range to handle PUD-mapped hugetlb vmas mm/hmm: Enable hmm_vma_walk_pmd, to handle hugetlb vmas mm/hmm: Enable hmm_vma_walk_pud to handle PUD-mapped hugetlb vmas arch/powerpc: Skip hugetlb vmas in subpage_mark_vma_nohuge arch/s390: Skip hugetlb vmas in thp_split_mm fs/proc: Make clear_refs_test_walk skip hugetlb vmas mm/lock: Make mlock_test_walk skip hugetlb vmas mm/madvise: Make swapin_test_walk skip hugetlb vmas mm/madvise: Make madvise_cold_test_walk skip hugetlb vmas mm/madvise: Make madvise_free_test_walk skip hugetlb vmas mm/migrate_device: Make migrate_vma_test_walk skip hugetlb vmas mm/memcontrol: Make mem_cgroup_move_test_walk skip hugetlb vmas mm/memcontrol: Make mem_cgroup_count_test_walk skip hugetlb vmas mm/hugetlb_vmemmap: Make vmemmap_test_walk skip hugetlb vmas mm: Delete all hugetlb_entry entries arch/arm64/include/asm/pgtable.h | 19 + arch/loongarch/include/asm/pgtable.h | 8 + arch/mips/include/asm/pgtable.h | 7 + arch/powerpc/include/asm/book3s/64/pgtable.h | 8 +- arch/powerpc/mm/book3s64/pgtable.c | 15 +- arch/powerpc/mm/book3s64/subpage_prot.c | 2 + arch/riscv/include/asm/pgtable.h | 15 + arch/s390/mm/gmap.c | 37 +- arch/x86/include/asm/pgtable.h | 199 +++++---- fs/proc/task_mmu.c | 434 ++++++++++++------- include/asm-generic/pgtable_uffd.h | 30 ++ include/linux/mm.h | 4 + include/linux/mm_inline.h | 34 ++ include/linux/pagewalk.h | 10 - include/linux/pgtable.h | 77 +++- include/linux/swapops.h | 27 ++ mm/damon/ops-common.c | 21 +- mm/damon/vaddr.c | 173 ++++---- mm/hmm.c | 69 +-- mm/hugetlb_vmemmap.c | 12 + mm/madvise.c | 36 ++ mm/memcontrol-v1.c | 24 + mm/memory-failure.c | 99 +++-- mm/memory.c | 51 +++ mm/mempolicy.c | 121 +++--- mm/migrate_device.c | 12 + mm/mincore.c | 46 +- mm/mlock.c | 12 + mm/mprotect.c | 10 - mm/pagewalk.c | 73 +--- mm/pgtable-generic.c | 21 + 31 files changed, 1089 insertions(+), 617 deletions(-)