From patchwork Wed Mar 8 09:41:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 13165509 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 93E2DC64EC4 for ; Wed, 8 Mar 2023 09:41:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F28196B0074; Wed, 8 Mar 2023 04:41:26 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id ED7856B0075; Wed, 8 Mar 2023 04:41:26 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DA077280001; Wed, 8 Mar 2023 04:41:26 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id CCCCE6B0074 for ; Wed, 8 Mar 2023 04:41:26 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 871D91409F8 for ; Wed, 8 Mar 2023 09:41:26 +0000 (UTC) X-FDA: 80545238172.08.FBA743D Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf17.hostedemail.com (Postfix) with ESMTP id C6CF540006 for ; Wed, 8 Mar 2023 09:41:23 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=QsK0Ah8v; spf=pass (imf17.hostedemail.com: domain of rppt@kernel.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=rppt@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1678268485; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=uVhq3cXwNsZXVwSBxNmRJlgVOsvtm7Gb9bAFFzGH2a8=; b=U/JpawBjxeK86hl7c74kCm6uGNN7ACXpahSiEvn8bwuprA+9wyZg22GuRlzY+y4JfzTXFY n/mIoY8Xbi1cy+N29IG+IHVRz2aZhdjiAFvbCEagf5H8jR+uDit9wPCi35T1vFiJWgM3UK 8wHAW2ibcZqcUjGgqk+rZ1rSXqbncdU= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=QsK0Ah8v; spf=pass (imf17.hostedemail.com: domain of rppt@kernel.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=rppt@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1678268485; a=rsa-sha256; cv=none; b=sTevR2QwA0Nl3iM27TeGCuDkLuCdqEQFD3G56IH+J/9kI3kdCdj/LxPtqe2NUlNxP8xLrk 4ziec7g2jzqbafVnUAFpI0VVaGtQt3fo/AWZYKUORI+IVQ7XXq6PumSogErrzjRMN4uZen 0Q69I/8CGe/rIzxDoAHyDNAKQVY07NM= Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id CE1B8B81BFF; Wed, 8 Mar 2023 09:41:21 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 22927C433D2; Wed, 8 Mar 2023 09:41:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1678268480; bh=tV1FkJmvvpLgLM/QVfb5MeuZc5NoN8LYwUl9QsKCPEM=; h=From:To:Cc:Subject:Date:From; b=QsK0Ah8vYMUaBAVEWBHTAByeMxqN8AED6Vctltmgg4DdiRrGhICj22ZsmQZ28UaqU fUoo0Z6AAa0aWV/6X9Hpk8mvJef8U2s4U0dLe31SpYFp3NviCP/qVf0mNeZpldbz8B 07P/S7uG823gJOI7piHUGfUdpmycRDjOJzOmFDDXFaKO1i3FkyOwW3RunOGCK0fqYr 5A+H6m6PewKo0LdSi7GtmeDX6Q9pvVy1jxwF4zP8HwsdADo15F1ehwSkk3Q7JZDfEB jU7BP52PxzOS4275O8JYC1Ocgml1d7AsRgWaAmMeT7LS5DYiUgbCUYiqd0mKUVxUPq Wg4WMMgpeeWWg== From: Mike Rapoport To: linux-mm@kvack.org Cc: Andrew Morton , Dave Hansen , Mike Rapoport , Peter Zijlstra , Rick Edgecombe , Song Liu , Thomas Gleixner , Vlastimil Babka , linux-kernel@vger.kernel.org, x86@kernel.org Subject: [RFC PATCH 0/5] Prototype for direct map awareness in page allocator Date: Wed, 8 Mar 2023 11:41:01 +0200 Message-Id: <20230308094106.227365-1-rppt@kernel.org> X-Mailer: git-send-email 2.35.1 MIME-Version: 1.0 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: C6CF540006 X-Rspam-User: X-Stat-Signature: ure3q14aag6ydbwg3ag8doknf5a9491g X-HE-Tag: 1678268483-837939 X-HE-Meta: U2FsdGVkX1/m+6awVu9AdV6nj1gyagijd6QCvHzRidI9INENt6unlud0mW+nBcNX739nR7/oQKXDG+WzEWL7tLMss053SIxLLdC4V1BDZS2uJYADCykGdiL/WvzD3IPBD3wImLRbVRiQBq3DOZ1aAZ0JJ2kql/SiYRl4rWBPD5RnBVtTxsAokeKT59XBwG+ZZXsqBkLoNggcU3RyDgxOTtfd8oJ2FQBEiMdLiyGxvltqc6h1UOroiH3EPbDpUHvMphVohqnJPeyNToyjb5xnlFbNB6xUYUaAspzGyTorXx+Z/PAHvzEA6JE7z72XpNesebXlDKXWnVnzpx+PtIxDz4tuM48aXWOewv3w5L2QZWQu8l+CPSdyUNKBuMKvOooFab9kJPRgkaAZksbFMTEiplk/hW8XPgJimS1KaAunb4PhWihA/ClA9r+CauqCe6g5LhBFjUNQv0T0/EU8Vk7iYf3tmuH0lqH/6KSJqqLNCwsrH8rAN2RGIwCk6haAqYXur1cQSgxR+kDduGnUSPb40wh7BtSmznszvIMsEGJe/1Wv+iAdzNTgJKoFWuqX0VBXF2LKznw2xE/7vCM0XHFxE/j4pSuJHGQZ8un/WmgMnSy+MrsX7qvSVCQ8MgJclxEPmEYAFmspT8ktRyP6F/TgzlZO65c/oCo1RumusMYYkYGxvTjRGkC/0ZZVMm62nY7DiSi55viDgHUipH+MVx6oLvA4CedcKvuN9R+8W/+TIi6LbbLVEAUOo5YWtSOAYn7lrmhBcu89vZiGb98RA1Zra98NOVChpSI2r2VthkPZRWIT8Qp7dO8IcLma7KxzKanAVL9duv0q85YPSBY+8kgVVNyiNuKOxczOPEamIr7Mzd/ePKuhFd5qkVvk8JFlCBeul3Fh6Obgqe6rfkEreuaz6x7L0BOp4tetxC5wLxysfvW4swJwe44LgaT1ONJGwl+V4ZyhUDUcsUuzyMi9suM jBYJega4 /BDYLKD0dPIeOpQFVIk8ILnJzmxMuHaDDnV6c1HpyLm5yXRmdBiNA+Poa4HDLr4+zKqB3UbwyZU2vq3svdbIk4HSZjBVP7tZpiIsC/1EZdl79Ar2+K2p3MTx2osaZi5MexMLhkNs/5J9QR2TuoRbgqm0erE+9QDvGFLXGnQghwqhebyvbqcCjWGz9zY91U9R0Qgfi+tNxAkPGX34ngah+pRPWOn4hPAnLIZF4JLPwkAXqxrvaPJPZ7E9rDtOIijxxGSoTkVDm8f/SfincWBhh/lIRH1srQq6Mt1Kn2FineRJCdaFAwIUn5hMR0mVHDXSwyBGimAWZ47qLzsFvBdFXrhjM3UvcMZmFlkLWQ1XLp1N/yUBh0WHMoibQPcw36SHQNetB X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: "Mike Rapoport (IBM)" Hi, This is a third attempt to make page allocator aware of the direct map layout and allow grouping of the pages that must be unmapped from the direct map. This a new implementation of __GFP_UNMAPPED, kinda a follow up for this set: https://lore.kernel.org/all/20220127085608.306306-1-rppt@kernel.org but instead of using a migrate type to cache the unmapped pages, the current implementation adds a dedicated cache to serve __GFP_UNMAPPED allocations. The last two patches in the series demonstrate how __GFP_UNMAPPED can be used in two in-tree use cases. First one is to switch secretmem to use the new mechanism, which is straight forward optimization. The second use-case is to enable __GFP_UNMAPPED in x86::module_alloc() that is essentially used as a method to allocate code pages and thus requires permission changes for basic pages in the direct map. This set is x86 specific at the moment because other architectures either do not support set_memory APIs that split the direct^w linear map (e.g. PowerPC) or only enable set_memory APIs when the linear map uses basic page size (like arm64). The patches are only lightly tested. == Motivation == There are use-cases that need to remove pages from the direct map or at least map them with 4K granularity. Whenever this is done e.g. with set_memory/set_direct_map APIs, the PUD and PMD sized mappings in the direct map are split into smaller pages. To reduce the performance hit caused by the fragmentation of the direct map it makes sense to group and/or cache the pages removed from the direct map so that the split large pages won't be all over the place. There were RFCs for grouped page allocations for vmalloc permissions [1] and for using PKS to protect page tables [2] as well as an attempt to use a pool of large pages in secretmtm [3], but these suggestions address each use case separately, while having a common mechanism at the core mm level could be used by all use cases. == Implementation overview == The pages that need to be removed from the direct map are grouped in a dedicated cache. When there is a page allocation request with __GFP_UNMAPPED set, it is redirected from __alloc_pages() to that cache using a new unmapped_alloc() function. The cache is implemented as a buddy allocator and it can handle high order requests. The cache starts empty and whenever it does not have enough pages to satisfy an allocation request the cache attempts to allocate PMD_SIZE page to replenish the cache. If PMD_SIZE page cannot be allocated, the cache is replenished with a page of the highest order available. That page is removed from the direct map and added to the local buddy allocator. There is also a shrinker that releases pages from the unmapped cache when there us a memory pressure in the system. When shrinker releases a page it is mapped back into the direct map. [1] https://lore.kernel.org/lkml/20210405203711.1095940-1-rick.p.edgecombe@intel.com [2] https://lore.kernel.org/lkml/20210505003032.489164-1-rick.p.edgecombe@intel.com [3] https://lore.kernel.org/lkml/20210121122723.3446-8-rppt@kernel.org Mike Rapoport (IBM) (5): mm: intorduce __GFP_UNMAPPED and unmapped_alloc() mm/unmapped_alloc: add debugfs file similar to /proc/pagetypeinfo mm/unmapped_alloc: add shrinker EXPERIMENTAL: x86: use __GFP_UNMAPPED for modele_alloc() EXPERIMENTAL: mm/secretmem: use __GFP_UNMAPPED arch/x86/Kconfig | 3 + arch/x86/kernel/module.c | 2 +- include/linux/gfp_types.h | 11 +- include/linux/page-flags.h | 6 + include/linux/pageblock-flags.h | 28 +++ include/trace/events/mmflags.h | 10 +- mm/Kconfig | 4 + mm/Makefile | 1 + mm/internal.h | 24 +++ mm/page_alloc.c | 39 +++- mm/secretmem.c | 26 +-- mm/unmapped-alloc.c | 334 ++++++++++++++++++++++++++++++++ mm/vmalloc.c | 2 +- 13 files changed, 459 insertions(+), 31 deletions(-) create mode 100644 mm/unmapped-alloc.c base-commit: fe15c26ee26efa11741a7b632e9f23b01aca4cc6