From patchwork Tue Mar 4 22:19:25 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Yang Shi X-Patchwork-Id: 14001630 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0311EC021B8 for ; Tue, 4 Mar 2025 23:30:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:MIME-Version: Content-Transfer-Encoding:Content-Type:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender :Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References:List-Owner; bh=OYou3N35PhCOYtx4Ld/uaGKFWQt98y/q2GbA/qZ/5Is=; b=wI8yQMpLDwGvWLK3YcSuPiqXxV KB0QXaFTE2xAYzG9yptSxJKhJs8M0WWW5QpIkXIERK1TZPy47Ce5Gor/jc7080p6+7BLfui59a+4u PYx8KSxwvXhTdIqmnfoO/fGcRHMJCitLDH+tZNpgu9mYQDsUJntvF5kvBnzECUtWFR4DfV2/Do2ck mslthAWN884UZVQGbJieYl8Db8hgEQr6M4ycu+VJrLuhQ0GkMAiS6tpVjNAwyzgItnP8h6BlzVy/j MG5cXnQ5MTPLwImq8Jjvknqn2RxlDt4lTUvmUqtiUOjOA+kGb8Q3OxUqhHgIBaYOjkkgsSS+Mkyei ZehttldA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tpbhs-00000006WqG-1ADc; Tue, 04 Mar 2025 23:29:48 +0000 Received: from mail-eastus2azlp170100001.outbound.protection.outlook.com ([2a01:111:f403:c110::1] helo=BN1PR04CU002.outbound.protection.outlook.com) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tpad8-00000006OM9-1t5R for linux-arm-kernel@lists.infradead.org; Tue, 04 Mar 2025 22:20:51 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=qptUxtuiyliB0IHBf+YgkI0MorLTDfBnXcTi82Z+W0DMV9Poh+gCuKfrr56sP+69coO8m6qjlRd0oczKtBXR++WvhJb62KJjQI9U6upgZ0aJCgCVz0xxvP8xAu/SoQtNNnYhb6yaNpejMvRu4m4jI/0lP4Gz70hxjd8KG9w2mzZOITsm5ohEcFrxjHuEUgVChZaIi6q/p0iCr8pQyzap3gn0ng0s1ZVX3h2fOzru7vFyMjnErqwvjeRnGP81Qm1uccuH8o1JvM0VITeCdPJ/qCX4ukFGOQcmKJ1vzAQ+7c5gnJvevSR3/mT75WUQqiVw+YldzlLYRzvk5ouJF3HYxg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=OYou3N35PhCOYtx4Ld/uaGKFWQt98y/q2GbA/qZ/5Is=; b=fYXGteuZaPGnfu04u59sHqTDgV0e52KgdkoLtq1JVCrAjs3sACJ9GeEea7+nMJr9jqEpYBJ37voXZY0HXPOssl2rp4lAfJXl0phfmuZ2kGuYUziBB5mbSKNXn1AufqdQtnACyPiiLtcj4U+16awZDV4U67EkZ4R3N8XYoVVGVq5vMaFALXwi7OzDbD+4V4zqMsuPurw7t+6cCK3RJrLL6GcJBbf25+RhVOmgkDwvW2gRyCyeJxzxN33asiV+pWDrxwR9JtxGi7BXB7qEOi8LqQWJ+KbN8vKmCNNCTqpZbDdleoJJkXoPSTACPCA/vJepfZn1rnyMjtq8hpTvF6W7UQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=os.amperecomputing.com; dmarc=pass action=none header.from=os.amperecomputing.com; dkim=pass header.d=os.amperecomputing.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=os.amperecomputing.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=OYou3N35PhCOYtx4Ld/uaGKFWQt98y/q2GbA/qZ/5Is=; b=aQiWPhqB7AZXPnFz5t5hUBIL6z8QEAbER2ovwVWMPDt4GNukONNpBNtZoc4GxWC2s9aL6P9AK2TsRCsrhPZT+QYCtZWQe6AvZXIQyEqaHkwHqrxZB0shT8IEizi9Hn/VggemNitoKE87eACitXbpuuCuvZjex5dCmRLAjZmY0Xc= Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=os.amperecomputing.com; Received: from CH0PR01MB6873.prod.exchangelabs.com (2603:10b6:610:112::22) by PH7PR01MB7931.prod.exchangelabs.com (2603:10b6:510:275::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8489.26; Tue, 4 Mar 2025 22:20:44 +0000 Received: from CH0PR01MB6873.prod.exchangelabs.com ([fe80::3850:9112:f3bf:6460]) by CH0PR01MB6873.prod.exchangelabs.com ([fe80::3850:9112:f3bf:6460%2]) with mapi id 15.20.8489.025; Tue, 4 Mar 2025 22:20:44 +0000 From: Yang Shi To: ryan.roberts@arm.com, will@kernel.org, catalin.marinas@arm.com, Miko.Lenczewski@arm.com, scott@os.amperecomputing.com, cl@gentwo.org Cc: linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [v3 PATCH 0/6] arm64: support FEAT_BBM level 2 and large block mapping when rodata=full Date: Tue, 4 Mar 2025 14:19:25 -0800 Message-ID: <20250304222018.615808-1-yang@os.amperecomputing.com> X-Mailer: git-send-email 2.47.0 X-ClientProxiedBy: SN7P222CA0014.NAMP222.PROD.OUTLOOK.COM (2603:10b6:806:124::8) To CH0PR01MB6873.prod.exchangelabs.com (2603:10b6:610:112::22) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CH0PR01MB6873:EE_|PH7PR01MB7931:EE_ X-MS-Office365-Filtering-Correlation-Id: 95d43395-a70e-4b3f-d089-08dd5b6ace58 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|366016|376014|1800799024|52116014|38350700014; X-Microsoft-Antispam-Message-Info: =?utf-8?q?zcfaI04Dw/ACvmpk7lkyYMt9XoDczJv?= =?utf-8?q?r7UjjwEv/sjccgyNo5G1PziiZMWG/7yVdRlIah+Z5Y2jRdxkkhy07YPdTJhu/27n2?= =?utf-8?q?PXO+CfoNVsMo5IPPGZ+Ra0sp4F5EBR36RXxPYafEGAL7+a0qImN0T1JyDLjBa6CIW?= =?utf-8?q?33CgjpO04GHhutL19aG/dHoATkzqpla+A+3utF8ww9ghYxpJsC/UdpIXWsYwAYEgV?= =?utf-8?q?cHcxY9FIf0164Lld00NznDOtbbLdUYFn0WThvG9D4tuYa9Ot2fMbCtrj7i3+f61Rm?= =?utf-8?q?z1elRA1/01x94B+b0aw4uob72ObhadpG0u+vMNu8akAv3iooR9LopSGBtUN+O008V?= =?utf-8?q?x3Jz3ztHpFwwhw8GOwdhUIq3SORnPPaycasQnmBB8TDm52uYHRZeBP40UaeKnwEnw?= =?utf-8?q?NfD4OmmDUr094Pq/2VfYTbb49mWdZ2Ss7XUzR0fTjbvp+VaP8oxUvtVNEFLLGVwjB?= =?utf-8?q?rNGmiEGZhv3JMZp1YHBdOauGlAmm5BQ4UKQN+7fi14DrvCmyeulyFuu52IGrD8gVO?= =?utf-8?q?cxeZq1O8hrpeAhMkajtg3ID7jyUyX4If7vq4rtEq7Krdsdos1wn7jjOQRGoiShOmu?= =?utf-8?q?t0tNpSUispi/TpEuEIDT6e24T8i6Y+0ddYmxuPMzrrKjAsXarfiVJK/G6CqyJ73wY?= =?utf-8?q?22o0/hByfo2ftMw23EWXFm/dEqwAY1rVENEowKkMz4MJ8pgF7WTP5Vjkrg/qkU5ID?= =?utf-8?q?zZXq9BPPYYbmZiOrWwR4SMT4tfO9ufVoAXwsEqFuD1yLqJHyYtxUzcx792J58x1BL?= =?utf-8?q?jsxIh9hoKDzEo2QR5ILPQepGAmDb39zr1wiC51XobDfxbzJOwWfbAtO2Dg9DKjSdy?= =?utf-8?q?KtsXTF3VJeamic4FC+zNYa995H0MZQMhEl597n96qAzF5wpVfloI+ikDtebNnmHcu?= =?utf-8?q?FaoqIHbTOHYoAyp6zZ/FSrLta3wGaXMDJlVwzJH8aFWVvfrOEeHmtefvYATft6deO?= =?utf-8?q?HtA4XssQ0DjFdHdGfvzw3lYMSAApiI8dJpk517J+i3dEunFRnTiP2jqaiFXzCHD8j?= =?utf-8?q?bYj0j2NZgtxZh6q2U0zi2mQ5WayEwYft7GSHD8ZuFCY2S3FDVJ+DkB42/LvfWFiUO?= =?utf-8?q?5MZebOry3N26FapRLvEcZyjthXCfzs+BgEU3zu0ZxoxoLQkvdxRoMH3+l1IkcRjpf?= =?utf-8?q?MuhmAdvTNzE7r+1nce44yjX9JHlQq7I5580+mHmRbpAsDI2XvcLoT4yENf0dKj6uL?= =?utf-8?q?mvtcdQE5UXo21FEQRMCxC1BqvPp0lUY9VOsFaiIwdsvSCh1w0ELl8UvpHwWeyeRyq?= =?utf-8?q?FqM16kd9Nbx/J2Giyq69B7jq/4RdhhIBUvRejXlpIv6MHw45zxFVb8bIbkwL/XAde?= =?utf-8?q?xyn35NX56ouHxMmn0XZqiC2GawpEoLWrJbp1RDpuXccEtONoScrIdM+qhFLdS6/xQ?= =?utf-8?q?f4djidrUtb+?= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:CH0PR01MB6873.prod.exchangelabs.com;PTR:;CAT:NONE;SFS:(13230040)(366016)(376014)(1800799024)(52116014)(38350700014);DIR:OUT;SFP:1102; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?q?pDy9LtmwwMjZee33/rjOu4/xaYNs?= =?utf-8?q?Lmc8SFlHCtjlqtudVeHNrIfM4pnpOmnvEyY69/59NiQMBFUC6RBtAtfUlp4+l/b3w?= =?utf-8?q?WKgcIqUXGF/S0DNEYMnPa4GNuCWntNw99vGmzOnBHc/fgkUnn0X5LZ82nqHSZYYhP?= =?utf-8?q?xhGQK3+Txvbv2nLO3F+Yk4MyJsnJ26/vTUWtCfuDdktGod2+RXSiJPJGmp4Lx8ib8?= =?utf-8?q?AxX/d3FzwBhvjq8E88mvJaKHVikLcbG8jvrlRqIKluMes/9nf8BvWkUlD29LHFLeh?= =?utf-8?q?uKdbqfcGpVs383elUtHhx8vgTgygs9TPai4h8xWBCP6kxe/bFZbOhjxMwjj5QCXPQ?= =?utf-8?q?qqOqGTHSZeaUF04uxI8j+gaeyPC1LLMr06viHBF6EIpLb9fjkCaTMdc1AEHVa/UPN?= =?utf-8?q?nSVV7MF8Rli3tvBypDnIUPytxDOPwUFHjyAFaIzwwvHk5/WPcTvDcJ1EKPk80tY8n?= =?utf-8?q?jne0jOvzvcYIZLPqJUiGxpsoatkuSGUbA2ohFhtTDJctio+5/TE8p3f/hanS8O5cW?= =?utf-8?q?dIlVcNig94SMoncP1k8/AraEE/YhzeXKiRL9dtPNH0TQsSnzqju86hIoOl7ZLg2WW?= =?utf-8?q?7sZUgQxLsWCEZ2BVJs2WriE8kfYBTLxkbZ/8jN6wdEFDsgvEf5CL0BVB++pIrTt2b?= =?utf-8?q?9mSRr/uKPmEPilV3OoCde2jzY28fsqLHHvcYA18TdhSMbyIH4/6R6ZWp9jP3hZu9g?= =?utf-8?q?9KoIiJ6K+uYt/ac5ShveedauOn+4qvig1M3LNl0RFoqgu4qBlaJGXK/k7gU8b/BE9?= =?utf-8?q?LoO/rmuDdO1+Uaji+5JjXACr/lB8JYUjp+8rmG2NzuBxn+j/52kN1ohY/1PrelIMy?= =?utf-8?q?krxrGEiN3j8f/PrMnWeVLeF4Vx3WBD+1aCzz5j0IZ6rR2WoDC2aBcCVRhnFzVFBQ5?= =?utf-8?q?vGvw0KlGJei9AHF4kKe7EppK0czaVwDJaiuCVcB1H7cj7kCwyrZdPhj8gKOmJvkuI?= =?utf-8?q?REvt5CGVx19Pf0regS1vViEnLCudE3USLajllfTFIhZqe+rtTWZXFy6mSm6fV2GoC?= =?utf-8?q?AaqT76IopzsfzbpLVdExXX7hAfcOQ7vqiTeIla8IlMVNS8+Q2CJ0JeNTfQicuTvvp?= =?utf-8?q?CfjKiQw/Q8gbxJ2uee4VenQdyphwvERbHlfeTk6XtXB19JaeJkNp9tF7j3BXoBTGT?= =?utf-8?q?rTh/d+gxJymq8VSL2ovi2ZCLU9jJ4ockGj+sqOamxg4OUy5ENTXFMnkRhUhdODn0G?= =?utf-8?q?H7icQXutvBdzSlZoI2pQvXbugcatVWgClwycpmHXCyWoRTX9+1dsRuzfL4lgbqv3O?= =?utf-8?q?WtPGp4vZgu6ogpUZS4aKxwX+fUdG+TX+deo+EA/AKiH0xTWLyNXcktrPfuq/zUMH1?= =?utf-8?q?OobxuVOFLkqzhGhk8NBrWooXvJ6mgcNeXNMnYFdvulXzKVOP9CmyspMsPDu3eV57V?= =?utf-8?q?6vXq0EjdO0VywxaVXS7zs4O1y5dezyNhqJ+OWKLmTAp9sVKhm9lP+qwQDuuNNxeSo?= =?utf-8?q?i3Emd0vJ8SYoX1NCkkz06ogwdyUvgsY/rQ6PePA9iTsCBGlIubXop4eyPGNljDQxM?= =?utf-8?q?fRwlzGN9/sFtWFx66/fKP2zei0vJybrIKnPPGrHadVOUdNxV0lpBXgo=3D?= X-OriginatorOrg: os.amperecomputing.com X-MS-Exchange-CrossTenant-Network-Message-Id: 95d43395-a70e-4b3f-d089-08dd5b6ace58 X-MS-Exchange-CrossTenant-AuthSource: CH0PR01MB6873.prod.exchangelabs.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 04 Mar 2025 22:20:44.6202 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 3bc2b170-fd94-476d-b0ce-4229bdc904a7 X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: bMbtdK7EqzfsqT1AfHgaHoFvjjRRyhauTqVglsOAY6A7FhrHxptbTFM9gsnigyX0XSeMRSb22ipxv+pa8it7PsbOlCgEEdNr531b/FR/aRo= X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH7PR01MB7931 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250304_142050_579268_2D8C7300 X-CRM114-Status: GOOD ( 21.35 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Changelog ========= v3: * Rebased to v6.14-rc4. * Based on Miko's BBML2 cpufeature patch (https://lore.kernel.org/linux-arm-kernel/20250228182403.6269-3-miko.lenczewski@arm.com/). Also included in this series in order to have the complete patchset. * Enhanced __create_pgd_mapping() to handle split as well per Ryan. * Supported CONT mappings per Ryan. * Supported asymmetric system by splitting kernel linear mapping if such system is detected per Ryan. I don't have such system to test, so the testing is done by hacking kernel to call linear mapping repainting unconditionally. The linear mapping doesn't have any block and cont mappings after booting. RFC v2: * Used allowlist to advertise BBM lv2 on the CPUs which can handle TLB conflict gracefully per Will Deacon * Rebased onto v6.13-rc5 * https://lore.kernel.org/linux-arm-kernel/20250103011822.1257189-1-yang@os.amperecomputing.com/ RFC v1: https://lore.kernel.org/lkml/20241118181711.962576-1-yang@os.amperecomputing.com/ Description =========== When rodata=full kernel linear mapping is mapped by PTE due to arm's break-before-make rule. A number of performance issues arise when the kernel linear map is using PTE entries due to arm's break-before-make rule: - performance degradation - more TLB pressure - memory waste for kernel page table These issues can be avoided by specifying rodata=on the kernel command line but this disables the alias checks on page table permissions and therefore compromises security somewhat. With FEAT_BBM level 2 support it is no longer necessary to invalidate the page table entry when changing page sizes. This allows the kernel to split large mappings after boot is complete. This patch adds support for splitting large mappings when FEAT_BBM level 2 is available and rodata=full is used. This functionality will be used when modifying page permissions for individual page frames. Without FEAT_BBM level 2 we will keep the kernel linear map using PTEs only. If the system is asymmetric, the kernel linear mapping may be repainted once the BBML2 capability is finalized on all CPUs. See patch #6 for more details. We saw significant performance increases in some benchmarks with rodata=full without compromising the security features of the kernel. Testing ======= The test was done on AmpereOne machine (192 cores, 1P) with 256GB memory and 4K page size + 48 bit VA. Function test (4K/16K/64K page size) - Kernel boot. Kernel needs change kernel linear mapping permission at boot stage, if the patch didn't work, kernel typically didn't boot. - Module stress from stress-ng. Kernel module load change permission for linear mapping. - A test kernel module which allocates 80% of total memory via vmalloc(), then change the vmalloc area permission to RO, this also change linear mapping permission to RO, then change it back before vfree(). Then launch a VM which consumes almost all physical memory. - VM with the patchset applied in guest kernel too. - Kernel build in VM with guest kernel which has this series applied. - rodata=on. Make sure other rodata mode is not broken. - Boot on the machine which doesn't support BBML2. Performance =========== Memory consumption Before: MemTotal: 258988984 kB MemFree: 254821700 kB After: MemTotal: 259505132 kB MemFree: 255410264 kB Around 500MB more memory are free to use. The larger the machine, the more memory saved. Performance benchmarking * Memcached We saw performance degradation when running Memcached benchmark with rodata=full vs rodata=on. Our profiling pointed to kernel TLB pressure. With this patchset we saw ops/sec is increased by around 3.5%, P99 latency is reduced by around 9.6%. The gain mainly came from reduced kernel TLB misses. The kernel TLB MPKI is reduced by 28.5%. The benchmark data is now on par with rodata=on too. * Disk encryption (dm-crypt) benchmark Ran fio benchmark with the below command on a 128G ramdisk (ext4) with disk encryption (by dm-crypt). fio --directory=/data --random_generator=lfsr --norandommap --randrepeat 1 \ --status-interval=999 --rw=write --bs=4k --loops=1 --ioengine=sync \ --iodepth=1 --numjobs=1 --fsync_on_close=1 --group_reporting --thread \ --name=iops-test-job --eta-newline=1 --size 100G The IOPS is increased by 90% - 150% (the variance is high, but the worst number of good case is around 90% more than the best number of bad case). The bandwidth is increased and the avg clat is reduced proportionally. * Sequential file read Read 100G file sequentially on XFS (xfs_io read with page cache populated). The bandwidth is increased by 150%. MikoĊ‚aj Lenczewski (1): arm64: Add BBM Level 2 cpu feature Yang Shi (5): arm64: cpufeature: add AmpereOne to BBML2 allow list arm64: mm: make __create_pgd_mapping() and helpers non-void arm64: mm: support large block mapping when rodata=full arm64: mm: support split CONT mappings arm64: mm: split linear mapping if BBML2 is not supported on secondary CPUs arch/arm64/Kconfig | 11 +++++ arch/arm64/include/asm/cpucaps.h | 2 + arch/arm64/include/asm/cpufeature.h | 15 ++++++ arch/arm64/include/asm/mmu.h | 4 ++ arch/arm64/include/asm/pgtable.h | 12 ++++- arch/arm64/kernel/cpufeature.c | 95 +++++++++++++++++++++++++++++++++++++ arch/arm64/mm/mmu.c | 397 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------- arch/arm64/mm/pageattr.c | 37 ++++++++++++--- arch/arm64/tools/cpucaps | 1 + 9 files changed, 518 insertions(+), 56 deletions(-)