Message ID | 20250321130635.227011-1-alexghiti@rivosinc.com (mailing list archive) |
---|---|
Headers | show |
Series | Merge arm64/riscv hugetlbfs contpte support | expand |
Le 21/03/2025 à 14:06, Alexandre Ghiti a écrit : > This patchset intends to merge the contiguous ptes hugetlbfs implementation > of arm64 and riscv. Can we also add powerpc in the dance ? powerpc also use contiguous PTEs allthough there is not (yet) a special name for it: - b250c8c08c79 powerpc/8xx: Manage 512k huge pages as standard pages - e47168f3d1b1 powerpc/8xx: Support 16k hugepages with 4k pages powerpc also use configuous PMDs/PUDs for larger hugepages: - 57fb15c32f4f ("powerpc/64s: use contiguous PMD/PUD instead of HUGEPD") - 7c44202e3609 ("powerpc/e500: use contiguous PMD instead of hugepd") - 0549e7666373 ("powerpc/8xx: rework support for 8M pages using contiguous PTE entries") Christophe > > Both arm64 and riscv support the use of contiguous ptes to map pages that > are larger than the default page table size, respectively called contpte > and svnapot. > > The riscv implementation differs from the arm64's in that the LSBs of the > pfn of a svnapot pte are used to store the size of the mapping, allowing > for future sizes to be added (for now only 64KB is supported). That's an > issue for the core mm code which expects to find the *real* pfn a pte points > to. Patch 1 fixes that by always returning svnapot ptes with the real pfn > and restores the size of the mapping when it is written to a page table. > > The following patches are just merges of the 2 different implementations > that currently exist in arm64 and riscv which are very similar. It paves > the way to the reuse of the recent contpte THP work by Ryan [1] to avoid > reimplementing the same in riscv. > > This patchset was tested by running the libhugetlbfs testsuite with 64KB > and 2MB pages on both architectures (on a 4KB base page size arm64 kernel). > > [1] https://lore.kernel.org/linux-arm-kernel/20240215103205.2607016-1-ryan.roberts@arm.com/ > > v4: https://lore.kernel.org/linux-riscv/20250127093530.19548-1-alexghiti@rivosinc.com/ > v3: https://lore.kernel.org/all/20240802151430.99114-1-alexghiti@rivosinc.com/ > v2: https://lore.kernel.org/linux-riscv/20240508113419.18620-1-alexghiti@rivosinc.com/ > v1: https://lore.kernel.org/linux-riscv/20240301091455.246686-1-alexghiti@rivosinc.com/ > > Changes in v5: > - Fix "int i" unused variable in patch 2 (as reported by PW) > - Fix !svnapot build > - Fix arch_make_huge_pte() which returned a real napot pte > - Make __ptep_get(), ptep_get_and_clear() and __set_ptes() napot aware to > avoid leaking real napot pfns to core mm > - Fix arch_contpte_get_num_contig() that used to always try to get the > mapping size from the ptep, which does not work if the ptep comes the core mm > - Rebase on top of 6.14-rc7 + fix for > huge_ptep_get_and_clear()/huge_pte_clear() > https://lore.kernel.org/linux-riscv/20250317072551.572169-1-alexghiti@rivosinc.com/ > > Changes in v4: > - Rebase on top of 6.13 > > Changes in v3: > - Split set_ptes and ptep_get into internal and external API (Ryan) > - Rename ARCH_HAS_CONTPTE into ARCH_WANT_GENERAL_HUGETLB_CONTPTE so that > we split hugetlb functions from contpte functions (actually riscv contpte > functions to support THP will come into another series) (Ryan) > - Rebase on top of 6.11-rc1 > > Changes in v2: > - Rebase on top of 6.9-rc3 > > Alexandre Ghiti (9): > riscv: Safely remove huge_pte_offset() when manipulating NAPOT ptes > riscv: Restore the pfn in a NAPOT pte when manipulated by core mm code > mm: Use common huge_ptep_get() function for riscv/arm64 > mm: Use common set_huge_pte_at() function for riscv/arm64 > mm: Use common huge_pte_clear() function for riscv/arm64 > mm: Use common huge_ptep_get_and_clear() function for riscv/arm64 > mm: Use common huge_ptep_set_access_flags() function for riscv/arm64 > mm: Use common huge_ptep_set_wrprotect() function for riscv/arm64 > mm: Use common huge_ptep_clear_flush() function for riscv/arm64 > > arch/arm64/Kconfig | 1 + > arch/arm64/include/asm/hugetlb.h | 22 +-- > arch/arm64/include/asm/pgtable.h | 68 ++++++- > arch/arm64/mm/hugetlbpage.c | 294 +--------------------------- > arch/riscv/Kconfig | 1 + > arch/riscv/include/asm/hugetlb.h | 36 +--- > arch/riscv/include/asm/pgtable-64.h | 11 ++ > arch/riscv/include/asm/pgtable.h | 222 ++++++++++++++++++--- > arch/riscv/mm/hugetlbpage.c | 243 +---------------------- > arch/riscv/mm/pgtable.c | 6 +- > include/linux/hugetlb_contpte.h | 39 ++++ > mm/Kconfig | 3 + > mm/Makefile | 1 + > mm/hugetlb_contpte.c | 258 ++++++++++++++++++++++++ > 14 files changed, 583 insertions(+), 622 deletions(-) > create mode 100644 include/linux/hugetlb_contpte.h > create mode 100644 mm/hugetlb_contpte.c >
Hi Christophe, On 21/03/2025 18:24, Christophe Leroy wrote: > > > Le 21/03/2025 à 14:06, Alexandre Ghiti a écrit : >> This patchset intends to merge the contiguous ptes hugetlbfs >> implementation >> of arm64 and riscv. > > Can we also add powerpc in the dance ? > > powerpc also use contiguous PTEs allthough there is not (yet) a > special name for it: > - b250c8c08c79 powerpc/8xx: Manage 512k huge pages as standard pages > - e47168f3d1b1 powerpc/8xx: Support 16k hugepages with 4k pages > > powerpc also use configuous PMDs/PUDs for larger hugepages: > - 57fb15c32f4f ("powerpc/64s: use contiguous PMD/PUD instead of HUGEPD") > - 7c44202e3609 ("powerpc/e500: use contiguous PMD instead of hugepd") > - 0549e7666373 ("powerpc/8xx: rework support for 8M pages using > contiguous PTE entries") So I have been looking at the powerpc hugetlb implementation and I have to admit that I'm struggling to find similarities with how arm64 and riscv deal with contiguous pte mappings. I think the 2 main characteristics of contpte (arm64) and svnapot (riscv) are the break-before-make requirement and the HW A/D update on only a single pte. Those make the handling of hugetlb pages very similar between arm64 and riscv. But I may have missed something, the powerpc hugetlb implementation is quite "scattered" because of the radix/hash page table and 32/64 bit. Thanks, Alex > > Christophe > >> >> Both arm64 and riscv support the use of contiguous ptes to map pages >> that >> are larger than the default page table size, respectively called contpte >> and svnapot. >> >> The riscv implementation differs from the arm64's in that the LSBs of >> the >> pfn of a svnapot pte are used to store the size of the mapping, allowing >> for future sizes to be added (for now only 64KB is supported). That's an >> issue for the core mm code which expects to find the *real* pfn a pte >> points >> to. Patch 1 fixes that by always returning svnapot ptes with the real >> pfn >> and restores the size of the mapping when it is written to a page table. >> >> The following patches are just merges of the 2 different implementations >> that currently exist in arm64 and riscv which are very similar. It paves >> the way to the reuse of the recent contpte THP work by Ryan [1] to avoid >> reimplementing the same in riscv. >> >> This patchset was tested by running the libhugetlbfs testsuite with 64KB >> and 2MB pages on both architectures (on a 4KB base page size arm64 >> kernel). >> >> [1] >> https://lore.kernel.org/linux-arm-kernel/20240215103205.2607016-1-ryan.roberts@arm.com/ >> >> v4: >> https://lore.kernel.org/linux-riscv/20250127093530.19548-1-alexghiti@rivosinc.com/ >> v3: >> https://lore.kernel.org/all/20240802151430.99114-1-alexghiti@rivosinc.com/ >> v2: >> https://lore.kernel.org/linux-riscv/20240508113419.18620-1-alexghiti@rivosinc.com/ >> v1: >> https://lore.kernel.org/linux-riscv/20240301091455.246686-1-alexghiti@rivosinc.com/ >> >> Changes in v5: >> - Fix "int i" unused variable in patch 2 (as reported by PW) >> - Fix !svnapot build >> - Fix arch_make_huge_pte() which returned a real napot pte >> - Make __ptep_get(), ptep_get_and_clear() and __set_ptes() napot >> aware to >> avoid leaking real napot pfns to core mm >> - Fix arch_contpte_get_num_contig() that used to always try to get >> the >> mapping size from the ptep, which does not work if the ptep >> comes the core mm >> - Rebase on top of 6.14-rc7 + fix for >> huge_ptep_get_and_clear()/huge_pte_clear() >> https://lore.kernel.org/linux-riscv/20250317072551.572169-1-alexghiti@rivosinc.com/ >> >> Changes in v4: >> - Rebase on top of 6.13 >> >> Changes in v3: >> - Split set_ptes and ptep_get into internal and external API (Ryan) >> - Rename ARCH_HAS_CONTPTE into ARCH_WANT_GENERAL_HUGETLB_CONTPTE >> so that >> we split hugetlb functions from contpte functions (actually >> riscv contpte >> functions to support THP will come into another series) (Ryan) >> - Rebase on top of 6.11-rc1 >> >> Changes in v2: >> - Rebase on top of 6.9-rc3 >> >> Alexandre Ghiti (9): >> riscv: Safely remove huge_pte_offset() when manipulating NAPOT ptes >> riscv: Restore the pfn in a NAPOT pte when manipulated by core mm >> code >> mm: Use common huge_ptep_get() function for riscv/arm64 >> mm: Use common set_huge_pte_at() function for riscv/arm64 >> mm: Use common huge_pte_clear() function for riscv/arm64 >> mm: Use common huge_ptep_get_and_clear() function for riscv/arm64 >> mm: Use common huge_ptep_set_access_flags() function for riscv/arm64 >> mm: Use common huge_ptep_set_wrprotect() function for riscv/arm64 >> mm: Use common huge_ptep_clear_flush() function for riscv/arm64 >> >> arch/arm64/Kconfig | 1 + >> arch/arm64/include/asm/hugetlb.h | 22 +-- >> arch/arm64/include/asm/pgtable.h | 68 ++++++- >> arch/arm64/mm/hugetlbpage.c | 294 +--------------------------- >> arch/riscv/Kconfig | 1 + >> arch/riscv/include/asm/hugetlb.h | 36 +--- >> arch/riscv/include/asm/pgtable-64.h | 11 ++ >> arch/riscv/include/asm/pgtable.h | 222 ++++++++++++++++++--- >> arch/riscv/mm/hugetlbpage.c | 243 +---------------------- >> arch/riscv/mm/pgtable.c | 6 +- >> include/linux/hugetlb_contpte.h | 39 ++++ >> mm/Kconfig | 3 + >> mm/Makefile | 1 + >> mm/hugetlb_contpte.c | 258 ++++++++++++++++++++++++ >> 14 files changed, 583 insertions(+), 622 deletions(-) >> create mode 100644 include/linux/hugetlb_contpte.h >> create mode 100644 mm/hugetlb_contpte.c >> > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv
Can someone from arm64 review this? I think it's preferable to share the same implementation between riscv and arm64. The end goal is the support of mTHP using svnapot on riscv, which we want soon, so if that patchset does not gain any traction, I'll just copy/paste the arm64 implementation into riscv. Thanks, Alex On 21/03/2025 14:06, Alexandre Ghiti wrote: > This patchset intends to merge the contiguous ptes hugetlbfs implementation > of arm64 and riscv. > > Both arm64 and riscv support the use of contiguous ptes to map pages that > are larger than the default page table size, respectively called contpte > and svnapot. > > The riscv implementation differs from the arm64's in that the LSBs of the > pfn of a svnapot pte are used to store the size of the mapping, allowing > for future sizes to be added (for now only 64KB is supported). That's an > issue for the core mm code which expects to find the *real* pfn a pte points > to. Patch 1 fixes that by always returning svnapot ptes with the real pfn > and restores the size of the mapping when it is written to a page table. > > The following patches are just merges of the 2 different implementations > that currently exist in arm64 and riscv which are very similar. It paves > the way to the reuse of the recent contpte THP work by Ryan [1] to avoid > reimplementing the same in riscv. > > This patchset was tested by running the libhugetlbfs testsuite with 64KB > and 2MB pages on both architectures (on a 4KB base page size arm64 kernel). > > [1] https://lore.kernel.org/linux-arm-kernel/20240215103205.2607016-1-ryan.roberts@arm.com/ > > v4: https://lore.kernel.org/linux-riscv/20250127093530.19548-1-alexghiti@rivosinc.com/ > v3: https://lore.kernel.org/all/20240802151430.99114-1-alexghiti@rivosinc.com/ > v2: https://lore.kernel.org/linux-riscv/20240508113419.18620-1-alexghiti@rivosinc.com/ > v1: https://lore.kernel.org/linux-riscv/20240301091455.246686-1-alexghiti@rivosinc.com/ > > Changes in v5: > - Fix "int i" unused variable in patch 2 (as reported by PW) > - Fix !svnapot build > - Fix arch_make_huge_pte() which returned a real napot pte > - Make __ptep_get(), ptep_get_and_clear() and __set_ptes() napot aware to > avoid leaking real napot pfns to core mm > - Fix arch_contpte_get_num_contig() that used to always try to get the > mapping size from the ptep, which does not work if the ptep comes the core mm > - Rebase on top of 6.14-rc7 + fix for > huge_ptep_get_and_clear()/huge_pte_clear() > https://lore.kernel.org/linux-riscv/20250317072551.572169-1-alexghiti@rivosinc.com/ > > Changes in v4: > - Rebase on top of 6.13 > > Changes in v3: > - Split set_ptes and ptep_get into internal and external API (Ryan) > - Rename ARCH_HAS_CONTPTE into ARCH_WANT_GENERAL_HUGETLB_CONTPTE so that > we split hugetlb functions from contpte functions (actually riscv contpte > functions to support THP will come into another series) (Ryan) > - Rebase on top of 6.11-rc1 > > Changes in v2: > - Rebase on top of 6.9-rc3 > > Alexandre Ghiti (9): > riscv: Safely remove huge_pte_offset() when manipulating NAPOT ptes > riscv: Restore the pfn in a NAPOT pte when manipulated by core mm code > mm: Use common huge_ptep_get() function for riscv/arm64 > mm: Use common set_huge_pte_at() function for riscv/arm64 > mm: Use common huge_pte_clear() function for riscv/arm64 > mm: Use common huge_ptep_get_and_clear() function for riscv/arm64 > mm: Use common huge_ptep_set_access_flags() function for riscv/arm64 > mm: Use common huge_ptep_set_wrprotect() function for riscv/arm64 > mm: Use common huge_ptep_clear_flush() function for riscv/arm64 > > arch/arm64/Kconfig | 1 + > arch/arm64/include/asm/hugetlb.h | 22 +-- > arch/arm64/include/asm/pgtable.h | 68 ++++++- > arch/arm64/mm/hugetlbpage.c | 294 +--------------------------- > arch/riscv/Kconfig | 1 + > arch/riscv/include/asm/hugetlb.h | 36 +--- > arch/riscv/include/asm/pgtable-64.h | 11 ++ > arch/riscv/include/asm/pgtable.h | 222 ++++++++++++++++++--- > arch/riscv/mm/hugetlbpage.c | 243 +---------------------- > arch/riscv/mm/pgtable.c | 6 +- > include/linux/hugetlb_contpte.h | 39 ++++ > mm/Kconfig | 3 + > mm/Makefile | 1 + > mm/hugetlb_contpte.c | 258 ++++++++++++++++++++++++ > 14 files changed, 583 insertions(+), 622 deletions(-) > create mode 100644 include/linux/hugetlb_contpte.h > create mode 100644 mm/hugetlb_contpte.c >