From patchwork Thu Sep 22 01:12:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 12984352 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 16A14C32771 for ; Thu, 22 Sep 2022 01:12:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 69321940007; Wed, 21 Sep 2022 21:12:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 61C706B0072; Wed, 21 Sep 2022 21:12:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 46E8D940007; Wed, 21 Sep 2022 21:12:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 306466B0071 for ; Wed, 21 Sep 2022 21:12:56 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id CD9328138B for ; Thu, 22 Sep 2022 01:12:55 +0000 (UTC) X-FDA: 79937947110.10.B227542 Received: from out2-smtp.messagingengine.com (out2-smtp.messagingengine.com [66.111.4.26]) by imf16.hostedemail.com (Postfix) with ESMTP id 5864E18000B for ; Thu, 22 Sep 2022 01:12:55 +0000 (UTC) Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.nyi.internal (Postfix) with ESMTP id DC22F5C00AE; Wed, 21 Sep 2022 21:12:54 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute3.internal (MEProxy); Wed, 21 Sep 2022 21:12:54 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=cc :cc:content-transfer-encoding:date:date:from:from:in-reply-to :message-id:mime-version:reply-to:reply-to:sender:subject :subject:to:to; s=fm2; t=1663809174; x=1663895574; bh=enpNk7D22O rhfKgczdlhV3ST7IUTsOP//CWl583ukHw=; b=2TMBMUQ3JvaFhv7ofjr4mBzfhh ZA9yMd7hZx9DsE+1wd/lcaEWpBJzvfa15pZfbaD1habTvkwcRU7PnlIjXGuwHy2N BsEAIEUojil6wshBB9rJrOCUMfWcSy065fRnvHi7y7T0kTGcwhKupsIUD7+yZ8OF 6mm9Rtr8FO8HTv5TDuFlAD4g43/fK7Sz9ARk2oWQEppYwUgv4kRBEScj8pLvV5Zb HWVgxKTwCoDyRIFBxneH9cfhILbw7xNs1PI8rcxxJeYzZIzXu4eA9Xajyi7BhQcS ZMQo8DkUmPYMfW5xntKbqUNV6kGegUnKEE8ExCJPJNLBTG6tAZI+C9bz/LbA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding:date:date :feedback-id:feedback-id:from:from:in-reply-to:message-id :mime-version:reply-to:reply-to:sender:subject:subject:to:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; t=1663809174; x=1663895574; bh=enpNk7D22OrhfKgczdlhV3ST7IUT sOP//CWl583ukHw=; b=idkEbL8c16dXZ/DFBWOFr5152UyTNL0vkDrjIn308gBo tDmU0BqF/o2s1NrTnnbDCIKTzgmv+ytcIE0jRLVDT0lhBX+mCF4SbHYojnJK3FTX QvQdEYfKHeOquJejFEksLMMQr18tyn62A+PocS4emx6xHAPISAofiGeh4wNyAgGX lI15kWPkTHENcF1d89arP6dU+upJ2Q23GzAG2M/CvYA3Ox2Kz4rkbtZv6YsfA5si evBS08kFXyWw8JForxfyilgWDCo/87ldGhAqd3EvNOde7xL0LowgvKtiXsyp5NqI Cp+cYdyh5vIzKEibL/NUBHuNEbPX5k6IluwP9uEVHw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvfedrfeefvddggeefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvvefufffkofhrggfgsedtqhertdertddtnecuhfhrohhmpegkihcujggr nhcuoeiiihdrhigrnhesshgvnhhtrdgtohhmqeenucggtffrrghtthgvrhhnpedvieeutd ehtddthfegveekueevfedvueehjeeltdevgfejteekudfgvdelveekffenucffohhmrghi nhepkhgvrhhnvghlrdhorhhgnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpe hmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomh X-ME-Proxy: Feedback-ID: iccd040f4:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 21 Sep 2022 21:12:53 -0400 (EDT) From: Zi Yan To: linux-mm@kvack.org Cc: Zi Yan , David Hildenbrand , Matthew Wilcox , Vlastimil Babka , "Kirill A . Shutemov" , Mike Kravetz , John Hubbard , Yang Shi , David Rientjes , James Houghton , Mike Rapoport , Muchun Song , Andrew Morton , linux-kernel@vger.kernel.org Subject: [PATCH v1 00/12] Make MAX_ORDER adjustable as a kernel boot time parameter. Date: Wed, 21 Sep 2022 21:12:40 -0400 Message-Id: <20220922011252.2266780-1-zi.yan@sent.com> X-Mailer: git-send-email 2.35.1 Reply-To: Zi Yan MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1663809175; a=rsa-sha256; cv=none; b=8dOreY+XQyMUUDA1f8jHnf9c7CF3Zx2QkHKptxWmEJB4b+JjoIkDYbOLQdK8EfABMcSIDE Tsj1v+lziCe/742rptIb9QlK43l4NHraIISN6yNkGE5GQgs8b8RQ2STnSfPCQgXkSOBVlw oaIk7mREBf/kj9j6ouZhOylfewGXBBM= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=sent.com header.s=fm2 header.b=2TMBMUQ3; dkim=pass header.d=messagingengine.com header.s=fm2 header.b=idkEbL8c; dmarc=pass (policy=none) header.from=sent.com; spf=pass (imf16.hostedemail.com: domain of zi.yan@sent.com designates 66.111.4.26 as permitted sender) smtp.mailfrom=zi.yan@sent.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1663809175; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=enpNk7D22OrhfKgczdlhV3ST7IUTsOP//CWl583ukHw=; b=www5SQPBNmDyaCOK9GfGh3zMlyBjopsWDM+7xdge9pXJUZlnAkUC3odS+CUv1qN7angKgp WiGpDmdWD5y2QNlIBos3d1hEnGgEwkbBm+oUyMTeOZ/3j9oyYGMzCQMt5H6u7bojShbRjP NhLp79j7Y6tg6Hu6uK05iZ2mhhMx8tk= X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 5864E18000B Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=sent.com header.s=fm2 header.b=2TMBMUQ3; dkim=pass header.d=messagingengine.com header.s=fm2 header.b=idkEbL8c; dmarc=pass (policy=none) header.from=sent.com; spf=pass (imf16.hostedemail.com: domain of zi.yan@sent.com designates 66.111.4.26 as permitted sender) smtp.mailfrom=zi.yan@sent.com X-Stat-Signature: 9cxr51eydking17pedxo1ss33153docx X-HE-Tag: 1663809175-73455 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Zi Yan Hi all, This patchset adds support for kernel boot time adjustable MAX_ORDER, so that user can change the largest size of pages buddy allocator allocates. It is on top of mm-everything-2022-09-19-00-45. Changelog === From RFCv2 1. Dropped RFC, collected reviewed-by. 2. Added back page validation check in find_buddy_page_pfn() since it is needed when zone is not contiguous. 3. Converted MAX_ORDER sized static array used in recently added kmsan code to a dynamic one. Motivation === This enables kernel to allocate 1GB pages and is necessary for my ongoing work on adding support for 1GB PUD THP[1]. This is also the conclusion I came up with after some discussion with David Hildenbrand on what methods should be used for allocating gigantic pages[2], since other approaches like using CMA allocator or alloc_contig_pages() are regarded as suboptimal. In addition, make MAX_ORDER a kernel boot time parameter can enable user to adjust buddy allocator without recompiling the kernel for their own needs, so that one can still have a small MAX_ORDER if he/she does not need to allocate gigantic pages like 1GB PUD THPs. Background === At the moment, kernel imposes MAX_ORDER - 1 + PAGE_SHFIT < SECTION_SIZE_BITS restriction. This prevents buddy allocator merging pages across memory sections, as PFNs might not be contiguous and code like page++ would fail. But this would not be an issue when SPARSEMEM_VMEMMAP is set, since all struct page are virtually contiguous. So boot time adjustable MAX_ORDER depends on SPARSEMEM_VMEMMAP. Description === I tested the patchset on both x86_64 and ARM64 at 4KB base pages. The systems boot and run. In terms of the concerns on performance degradation if MAX_ORDER is increased, I run vm-scalability from lkp comparing current system, my patchset with MAX_ORDER=11 and my patchset with MAX_ORDER=20 on a x86_64 VM and saw almost no performance difference, please vm-scalability reports in the RFCv2: https://lore.kernel.org/linux-mm/20220811231643.1012912-1-zi.yan@sent.com/ Patch 1 changes MAX_ORDER to represent the max order of pages allocated by buddy allocator. right now MAX_ORDER - 1 represents that and it is confusing. Suggested by Vlastimil Babka. checkpatch.pl is updated to warn future use of MAX_ORDER, since its semantics is changed. Patch 2 adds a page validation in find_buddy_page_pfn() when zone is not contiguous, since some pages in the middle of a zone can be invalid. Patch 3 make deferred struct page initialization work when MAX_ORDER is bigger than a memory section size. Patch 4-7 convert the use of MAX_ORDER to pageblock_order. Since pageblock_order is a constant when MAX_ORDER can be changed at boot time and close to current MAX_ORDER value. I separate changes to different patches for easy review and can merge them into a single one if that works better. Patch 8 replaces MAX_ORDER with MAX_PHYS_CONTIG_ORDER when it is used to indicate the maximum number of physically contiguous pages. Patch 9 adds a new Kconfig option SET_MAX_ORDER to allow specifying MAX_ORDER when ARCH_FORCE_MAX_ORDER is not used by the arch, like x86_64. Patch 10 converts statically allocated arrays with MAX_ORDER length to dynamic ones if possible and prepares for making MAX_ORDER a boot time parameter. Patch 11 adds a new MIN_MAX_ORDER constant to replace soon-to-be-dynamic MAX_ORDER for places where converting static array to dynamic one is causing hassle and not necessary, i.e., ARM64 hypervisor page allocation and SLAB. Patch 12 changes MAX_ORDER to be a kernel boot time parameter and it is opt-in as an mm/Kconfig option. Any suggestion and/or comment is welcome. Thanks. [1] https://lore.kernel.org/linux-mm/20200928175428.4110504-1-zi.yan@sent.com/ [2] https://lore.kernel.org/linux-mm/e132fdd9-65af-1cad-8a6e-71844ebfe6a2@redhat.com/ Zi Yan (12): mm: rectify MAX_ORDER semantics to be the largest page order from buddy allocator mm: check page validity when find a buddy page in a non-contiguous zone mm: adapt deferred struct page init to new MAX_ORDER. mm: prevent pageblock size being larger than section size. fs: proc: use pageblock_nr_pages for reschedule period in read_kcore() virtio: virtio_balloon: use pageblock_order instead of MAX_ORDER mm/page_reporting: set page_reporting_order to -1 to prevent it running mm: replace MAX_ORDER when it is used to indicate max physical contiguity. mm: Make MAX_ORDER of buddy allocator configurable via Kconfig SET_MAX_ORDER. mm: convert MAX_ORDER sized static arrays to dynamic ones. mm: introduce MIN_MAX_ORDER to replace MAX_ORDER as compile time constant. mm: make MAX_ORDER a kernel boot time parameter. .../admin-guide/kdump/vmcoreinfo.rst | 4 +- .../admin-guide/kernel-parameters.txt | 9 +- arch/Kconfig | 4 + arch/arc/Kconfig | 4 +- arch/arm/Kconfig | 12 +- arch/arm/configs/imx_v6_v7_defconfig | 2 +- arch/arm/configs/milbeaut_m10v_defconfig | 2 +- arch/arm/configs/oxnas_v6_defconfig | 2 +- arch/arm/configs/pxa_defconfig | 2 +- arch/arm/configs/sama7_defconfig | 2 +- arch/arm/configs/sp7021_defconfig | 2 +- arch/arm64/Kconfig | 16 +-- arch/arm64/include/asm/sparsemem.h | 2 +- arch/arm64/kvm/hyp/include/nvhe/gfp.h | 2 +- arch/arm64/kvm/hyp/nvhe/page_alloc.c | 2 +- arch/csky/Kconfig | 2 +- arch/ia64/Kconfig | 8 +- arch/ia64/include/asm/sparsemem.h | 4 +- arch/ia64/mm/hugetlbpage.c | 2 +- arch/loongarch/Kconfig | 16 +-- arch/m68k/Kconfig.cpu | 8 +- arch/mips/Kconfig | 22 ++- arch/nios2/Kconfig | 10 +- arch/powerpc/Kconfig | 30 ++--- arch/powerpc/configs/85xx/ge_imp3a_defconfig | 2 +- arch/powerpc/configs/fsl-emb-nonhw.config | 2 +- arch/powerpc/mm/book3s64/iommu_api.c | 2 +- arch/powerpc/mm/hugetlbpage.c | 2 +- arch/powerpc/platforms/powernv/pci-ioda.c | 2 +- arch/sh/configs/ecovec24_defconfig | 2 +- arch/sh/mm/Kconfig | 20 ++- arch/sparc/Kconfig | 8 +- arch/sparc/kernel/pci_sun4v.c | 2 +- arch/sparc/kernel/traps_64.c | 2 +- arch/sparc/mm/tsb.c | 4 +- arch/um/kernel/um_arch.c | 4 +- arch/xtensa/Kconfig | 8 +- drivers/base/regmap/regmap-debugfs.c | 8 +- drivers/crypto/hisilicon/sgl.c | 6 +- .../gpu/drm/i915/gem/selftests/huge_pages.c | 2 +- drivers/gpu/drm/ttm/ttm_device.c | 7 +- drivers/gpu/drm/ttm/ttm_pool.c | 72 ++++++++-- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 +- drivers/irqchip/irq-gic-v3-its.c | 4 +- drivers/md/dm-bufio.c | 2 +- drivers/misc/genwqe/card_utils.c | 2 +- .../net/ethernet/hisilicon/hns3/hns3_enet.c | 2 +- drivers/net/ethernet/ibm/ibmvnic.h | 2 +- drivers/video/fbdev/hyperv_fb.c | 6 +- drivers/virtio/virtio_balloon.c | 2 +- drivers/virtio/virtio_mem.c | 8 +- fs/proc/kcore.c | 2 +- fs/ramfs/file-nommu.c | 2 +- include/drm/ttm/ttm_pool.h | 4 +- include/linux/hugetlb.h | 2 +- include/linux/mmzone.h | 36 ++++- include/linux/pageblock-flags.h | 21 ++- include/linux/slab.h | 8 +- kernel/crash_core.c | 2 +- kernel/dma/pool.c | 8 +- kernel/events/ring_buffer.c | 2 +- mm/Kconfig | 33 ++++- mm/compaction.c | 8 +- mm/debug_vm_pgtable.c | 4 +- mm/huge_memory.c | 2 +- mm/hugetlb.c | 4 +- mm/internal.h | 8 +- mm/kmsan/init.c | 18 ++- mm/memblock.c | 8 +- mm/memory.c | 4 +- mm/memory_hotplug.c | 6 +- mm/page_alloc.c | 127 +++++++++++++----- mm/page_isolation.c | 14 +- mm/page_owner.c | 6 +- mm/page_reporting.c | 8 +- mm/shuffle.h | 2 +- mm/slab.c | 2 +- mm/slub.c | 6 +- mm/vmscan.c | 1 - mm/vmstat.c | 14 +- net/smc/smc_ib.c | 2 +- scripts/checkpatch.pl | 8 ++ security/integrity/ima/ima_crypto.c | 2 +- tools/testing/memblock/linux/mmzone.h | 6 +- 84 files changed, 462 insertions(+), 272 deletions(-)