Message ID | 20240802093458.32683-1-yangyicong@huawei.com (mailing list archive) |
---|---|
Headers | show |
Series | Support Armv8.9/v9.4 FEAT_HAFT | expand |
On Fri, 02 Aug 2024 10:34:56 +0100, Yicong Yang <yangyicong@huawei.com> wrote: > > From: Yicong Yang <yangyicong@hisilicon.com> > > This series adds basic support for FEAT_HAFT introduced in Armv8.9/v9.4 > and enable ARCH_HAS_NONLEAF_PMD_YOUNG. The latter will be used in > lru-gen aging. Tested with lru-gen in below steps: > 1. Generate a 1GiB workingset by `stress-ng --vm 1`. Then hang the task to > stop accessing the memory. (AF bit won't be updated) > 2. try to age the memory by /sys/kernel/debug/lru_gen > > Run above steps with LRU_GEN_NONLEAF_YOUNG(0x4) and not respectively > (switching by /sys/kernel/mm/lru_gen/enabled). LRU_GEN_NONLEAF_YOUNG > will clear and test the PMD AF bit on page walking for aging, > otherwise will clear and test the PTE AF bit for aging. In this case > LRU_GEN_NONLEAF_YOUNG will improve the efficiency of page scanning > since pages won't be accessed and we don't need to scan each PTE. Improve by how much? Can you please publish numbers that demonstrate the effect of this feature? Thanks, M.
On 2024/8/2 18:40, Marc Zyngier wrote: > On Fri, 02 Aug 2024 10:34:56 +0100, > Yicong Yang <yangyicong@huawei.com> wrote: >> >> From: Yicong Yang <yangyicong@hisilicon.com> >> >> This series adds basic support for FEAT_HAFT introduced in Armv8.9/v9.4 >> and enable ARCH_HAS_NONLEAF_PMD_YOUNG. The latter will be used in >> lru-gen aging. Tested with lru-gen in below steps: >> 1. Generate a 1GiB workingset by `stress-ng --vm 1`. Then hang the task to >> stop accessing the memory. (AF bit won't be updated) >> 2. try to age the memory by /sys/kernel/debug/lru_gen >> >> Run above steps with LRU_GEN_NONLEAF_YOUNG(0x4) and not respectively >> (switching by /sys/kernel/mm/lru_gen/enabled). LRU_GEN_NONLEAF_YOUNG >> will clear and test the PMD AF bit on page walking for aging, >> otherwise will clear and test the PTE AF bit for aging. In this case >> LRU_GEN_NONLEAF_YOUNG will improve the efficiency of page scanning >> since pages won't be accessed and we don't need to scan each PTE. > > Improve by how much? Can you please publish numbers that demonstrate > the effect of this feature? > With LRU_GEN_NONLEAF_YOUNG ~40% time saved for 1GiB memory observed on our emulated platform. Thanks.
On Tue, 06 Aug 2024 04:43:52 +0100, Yicong Yang <yangyicong@huawei.com> wrote: > > On 2024/8/2 18:40, Marc Zyngier wrote: > > On Fri, 02 Aug 2024 10:34:56 +0100, > > Yicong Yang <yangyicong@huawei.com> wrote: > >> > >> From: Yicong Yang <yangyicong@hisilicon.com> > >> > >> This series adds basic support for FEAT_HAFT introduced in Armv8.9/v9.4 > >> and enable ARCH_HAS_NONLEAF_PMD_YOUNG. The latter will be used in > >> lru-gen aging. Tested with lru-gen in below steps: > >> 1. Generate a 1GiB workingset by `stress-ng --vm 1`. Then hang the task to > >> stop accessing the memory. (AF bit won't be updated) > >> 2. try to age the memory by /sys/kernel/debug/lru_gen > >> > >> Run above steps with LRU_GEN_NONLEAF_YOUNG(0x4) and not respectively > >> (switching by /sys/kernel/mm/lru_gen/enabled). LRU_GEN_NONLEAF_YOUNG > >> will clear and test the PMD AF bit on page walking for aging, > >> otherwise will clear and test the PTE AF bit for aging. In this case > >> LRU_GEN_NONLEAF_YOUNG will improve the efficiency of page scanning > >> since pages won't be accessed and we don't need to scan each PTE. > > > > Improve by how much? Can you please publish numbers that demonstrate > > the effect of this feature? > > > > With LRU_GEN_NONLEAF_YOUNG ~40% time saved for 1GiB memory observed on our > emulated platform. This certainly looks impressive, but it is a very ad-hoc benchmark, and emulation numbers don't necessarily result in similar improvement on actual HW. How does this translate for a more realistic/useful workload? Even numbers obtained on another architecture would be useful. Thanks, M.
On 2024/8/6 16:06, Marc Zyngier wrote: > On Tue, 06 Aug 2024 04:43:52 +0100, > Yicong Yang <yangyicong@huawei.com> wrote: >> >> On 2024/8/2 18:40, Marc Zyngier wrote: >>> On Fri, 02 Aug 2024 10:34:56 +0100, >>> Yicong Yang <yangyicong@huawei.com> wrote: >>>> >>>> From: Yicong Yang <yangyicong@hisilicon.com> >>>> >>>> This series adds basic support for FEAT_HAFT introduced in Armv8.9/v9.4 >>>> and enable ARCH_HAS_NONLEAF_PMD_YOUNG. The latter will be used in >>>> lru-gen aging. Tested with lru-gen in below steps: >>>> 1. Generate a 1GiB workingset by `stress-ng --vm 1`. Then hang the task to >>>> stop accessing the memory. (AF bit won't be updated) >>>> 2. try to age the memory by /sys/kernel/debug/lru_gen >>>> >>>> Run above steps with LRU_GEN_NONLEAF_YOUNG(0x4) and not respectively >>>> (switching by /sys/kernel/mm/lru_gen/enabled). LRU_GEN_NONLEAF_YOUNG >>>> will clear and test the PMD AF bit on page walking for aging, >>>> otherwise will clear and test the PTE AF bit for aging. In this case >>>> LRU_GEN_NONLEAF_YOUNG will improve the efficiency of page scanning >>>> since pages won't be accessed and we don't need to scan each PTE. >>> >>> Improve by how much? Can you please publish numbers that demonstrate >>> the effect of this feature? >>> >> >> With LRU_GEN_NONLEAF_YOUNG ~40% time saved for 1GiB memory observed on our >> emulated platform. > > This certainly looks impressive, but it is a very ad-hoc benchmark, > and emulation numbers don't necessarily result in similar improvement > on actual HW. > Yes indeed. I just design this case for testing it works. The real case maybe more complex and not that ideal and may also involves other things like THP (for THP we may already use the PMD block mapping so the advantage of HAFT may not take effects). > How does this translate for a more realistic/useful workload? Even > numbers obtained on another architecture would be useful. > Currently I have no numbers for the real workload yet. Maybe for the next step once the platform's available (for a x86 or arm64 one which can run real workloads). Thanks.
From: Yicong Yang <yangyicong@hisilicon.com> This series adds basic support for FEAT_HAFT introduced in Armv8.9/v9.4 and enable ARCH_HAS_NONLEAF_PMD_YOUNG. The latter will be used in lru-gen aging. Tested with lru-gen in below steps: 1. Generate a 1GiB workingset by `stress-ng --vm 1`. Then hang the task to stop accessing the memory. (AF bit won't be updated) 2. try to age the memory by /sys/kernel/debug/lru_gen Run above steps with LRU_GEN_NONLEAF_YOUNG(0x4) and not respectively (switching by /sys/kernel/mm/lru_gen/enabled). LRU_GEN_NONLEAF_YOUNG will clear and test the PMD AF bit on page walking for aging, otherwise will clear and test the PTE AF bit for aging. In this case LRU_GEN_NONLEAF_YOUNG will improve the efficiency of page scanning since pages won't be accessed and we don't need to scan each PTE. For lru-gen aging: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/admin-guide/mm/multigen_lru.rst?h=v6.11-rc1#n94 Yicong Yang (2): arm64: Add support for FEAT_HAFT arm64: Enable ARCH_HAS_NONLEAF_PMD_YOUNG arch/arm64/Kconfig | 21 ++++++++++++++ arch/arm64/include/asm/pgtable-hwdef.h | 5 ++++ arch/arm64/include/asm/pgtable.h | 14 ++++++++-- arch/arm64/kernel/cpufeature.c | 38 ++++++++++++++++++++++++++ arch/arm64/tools/cpucaps | 1 + arch/arm64/tools/sysreg | 1 + 6 files changed, 78 insertions(+), 2 deletions(-)