Message ID | 20241102104235.62560-1-yangyicong@huawei.com (mailing list archive) |
---|---|
Headers | show |
Series | Support Armv8.9/v9.4 FEAT_HAFT | expand |
On Sat, 02 Nov 2024 18:42:30 +0800, Yicong Yang wrote: > This series adds basic support for FEAT_HAFT introduced in Armv8.9/v9.4 > and enable ARCH_HAS_NONLEAF_PMD_YOUNG. The latter will be used in > lru-gen aging. Tested with lru-gen in below steps: > 1. Generate a 1GiB workingset by `stress-ng --vm 1`. Then hang the task to > stop accessing the memory. (AF bit won't be updated) > 2. try to age the memory by /sys/kernel/debug/lru_gen > > [...] Applied to arm64 (for-next/haft), thanks! I added back the ID check as in v3 following Marc pointing out the Ampere erratum. Who knows, we may get similar bugs for FEAT_HAFT in the future, so better have it covered. [1/5] arm64/sysreg: Update ID_AA64MMFR1_EL1 register https://git.kernel.org/arm64/c/aa47dcda2708 [2/5] arm64: setup: name 'tcr2' register https://git.kernel.org/arm64/c/926b66e2ebc8 [3/5] arm64: Add support for FEAT_HAFT https://git.kernel.org/arm64/c/efe72541355d [4/5] arm64: Enable ARCH_HAS_NONLEAF_PMD_YOUNG https://git.kernel.org/arm64/c/62df5870ebf7 [5/5] arm64: pgtable: Warn unexpected pmdp_test_and_clear_young() https://git.kernel.org/arm64/c/b349a5a2b6e2
From: Yicong Yang <yangyicong@hisilicon.com> This series adds basic support for FEAT_HAFT introduced in Armv8.9/v9.4 and enable ARCH_HAS_NONLEAF_PMD_YOUNG. The latter will be used in lru-gen aging. Tested with lru-gen in below steps: 1. Generate a 1GiB workingset by `stress-ng --vm 1`. Then hang the task to stop accessing the memory. (AF bit won't be updated) 2. try to age the memory by /sys/kernel/debug/lru_gen Run above steps with LRU_GEN_NONLEAF_YOUNG(0x4) and not respectively (switching by /sys/kernel/mm/lru_gen/enabled). LRU_GEN_NONLEAF_YOUNG will clear and test the PMD AF bit on page walking for aging, otherwise will clear and test the PTE AF bit for aging. In this case LRU_GEN_NONLEAF_YOUNG will improve the efficiency of page scanning since pages won't be accessed and we don't need to scan each PTE. Observed ~40% time saved for 1GiB memory on our emulated platform with LRU_GEN_NONLEAF_YOUNG. For lru-gen aging: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/admin-guide/mm/multigen_lru.rst?h=v6.11-rc1#n94 Change since v3: Address the comments per Catalin. Add tags for Patch 1/5 and 2/5. - Make HAFT a ARM64_CPUCAP_SYSTEM_FEATURE feature then: o checking the feature will be more efficient o avoid race between onlining a non-HAFT CPU when using the HAFT features - Set table AF for task entries as well - Set TCR2.HAFT unconditionally Link: https://lore.kernel.org/linux-arm-kernel/20241022092734.59984-1-yangyicong@huawei.com/ Change since v2: - Address comments per Will and Catalin: o detect and enable the feature in __cpu_setup() o allow online the CPU that doesn't have this feature and mismatch with the boot CPU o only advertise the feature if it's enabled system widely o set AF bit for kernel page table entries to save later hardware update o warn unexpected pmdp_test_and_clear_young() - Update all the new AA64MMFR1_EL1 fields per Mark Link: https://lore.kernel.org/linux-arm-kernel/20240814092333.7727-1-yangyicong@huawei.com/ Change since v1: - Address comments from Marc, improve comments/Kconfig, clean code. Thanks for the comments. Link: https://lore.kernel.org/linux-arm-kernel/20240802093458.32683-1-yangyicong@huawei.com/ Yicong Yang (5): arm64/sysreg: Update ID_AA64MMFR1_EL1 register arm64: setup: name 'tcr2' register arm64: Add support for FEAT_HAFT arm64: Enable ARCH_HAS_NONLEAF_PMD_YOUNG arm64: pgtable: Warn unexpected pmdp_test_and_clear_young() arch/arm64/Kconfig | 16 ++++++++++++++++ arch/arm64/include/asm/cpufeature.h | 6 ++++++ arch/arm64/include/asm/pgalloc.h | 12 +++++++----- arch/arm64/include/asm/pgtable-hwdef.h | 4 ++++ arch/arm64/include/asm/pgtable.h | 10 ++++++++-- arch/arm64/kernel/cpufeature.c | 15 +++++++++++++++ arch/arm64/mm/fixmap.c | 9 ++++++--- arch/arm64/mm/mmu.c | 8 ++++---- arch/arm64/mm/proc.S | 16 ++++++++++++++-- arch/arm64/tools/cpucaps | 1 + arch/arm64/tools/sysreg | 4 ++++ 11 files changed, 85 insertions(+), 16 deletions(-)