diff mbox series

[1/1] mm/pgtable/debug: Add test validating architecture page table helpers

Message ID 1567497706-8649-2-git-send-email-anshuman.khandual@arm.com (mailing list archive)
State New, archived
Headers show
Series mm/debug: Add tests for architecture exported page table helpers | expand

Commit Message

Anshuman Khandual Sept. 3, 2019, 8:01 a.m. UTC
This adds a test module which will validate architecture page table helpers
and accessors regarding compliance with generic MM semantics expectations.
This will help various architectures in validating changes to the existing
page table helpers or addition of new ones.

Test page table and memory pages creating it's entries at various level are
all allocated from system memory with required alignments. If memory pages
with required size and alignment could not be allocated, then all depending
individual tests are skipped.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Steven Price <Steven.Price@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Sri Krishna chowdary <schowdary@nvidia.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Russell King - ARM Linux <linux@armlinux.org.uk>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-snps-arc@lists.infradead.org
Cc: linux-mips@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-ia64@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org

Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 mm/Kconfig.debug       |  14 ++
 mm/Makefile            |   1 +
 mm/arch_pgtable_test.c | 425 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 440 insertions(+)
 create mode 100644 mm/arch_pgtable_test.c

Comments

Anshuman Khandual Sept. 4, 2019, 6:14 a.m. UTC | #1
On 09/03/2019 04:43 PM, kbuild test robot wrote:
> Hi Anshuman,
> 
> Thank you for the patch! Yet something to improve:
> 
> [auto build test ERROR on linus/master]
> [cannot apply to v5.3-rc7 next-20190902]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
> 
> url:    https://github.com/0day-ci/linux/commits/Anshuman-Khandual/mm-debug-Add-tests-for-architecture-exported-page-table-helpers/20190903-162959
> config: m68k-allmodconfig (attached as .config)
> compiler: m68k-linux-gcc (GCC) 7.4.0
> reproduce:
>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # save the attached .config to linux build tree
>         GCC_VERSION=7.4.0 make.cross ARCH=m68k 
> 
> If you fix the issue, kindly add following tag
> Reported-by: kbuild test robot <lkp@intel.com>
> 
> All error/warnings (new ones prefixed by >>):
> 
>    In file included from arch/m68k/include/asm/bug.h:32:0,
>                     from include/linux/bug.h:5,
>                     from include/linux/thread_info.h:12,
>                     from include/asm-generic/preempt.h:5,
>                     from ./arch/m68k/include/generated/asm/preempt.h:1,
>                     from include/linux/preempt.h:78,
>                     from arch/m68k/include/asm/irqflags.h:6,
>                     from include/linux/irqflags.h:16,
>                     from arch/m68k/include/asm/atomic.h:6,
>                     from include/linux/atomic.h:7,
>                     from include/linux/mm_types_task.h:13,
>                     from include/linux/mm_types.h:5,
>                     from include/linux/hugetlb.h:5,
>                     from mm/arch_pgtable_test.c:14:
>    mm/arch_pgtable_test.c: In function 'pmd_clear_tests':
>>> arch/m68k/include/asm/page.h:31:22: error: lvalue required as unary '&' operand
>     #define pmd_val(x) ((&x)->pmd[0])
>                          ^
>    include/asm-generic/bug.h:124:25: note: in definition of macro 'WARN_ON'
>      int __ret_warn_on = !!(condition);    \
>                             ^~~~~~~~~
>>> arch/m68k/include/asm/motorola_pgtable.h:138:26: note: in expansion of macro 'pmd_val'
>     #define pmd_none(pmd)  (!pmd_val(pmd))
>                              ^~~~~~~
>>> mm/arch_pgtable_test.c:233:11: note: in expansion of macro 'pmd_none'
>      WARN_ON(!pmd_none(READ_ONCE(*pmdp)));
>               ^~~~~~~~
>    mm/arch_pgtable_test.c: In function 'pmd_populate_tests':
>>> arch/m68k/include/asm/page.h:31:22: error: lvalue required as unary '&' operand
>     #define pmd_val(x) ((&x)->pmd[0])

Storing READ_ONCE(*pmdp) in a local pmd_t variable first solves the problem.
Kirill A. Shutemov Sept. 4, 2019, 2:19 p.m. UTC | #2
On Tue, Sep 03, 2019 at 01:31:46PM +0530, Anshuman Khandual wrote:
> This adds a test module which will validate architecture page table helpers
> and accessors regarding compliance with generic MM semantics expectations.
> This will help various architectures in validating changes to the existing
> page table helpers or addition of new ones.
> 
> Test page table and memory pages creating it's entries at various level are
> all allocated from system memory with required alignments. If memory pages
> with required size and alignment could not be allocated, then all depending
> individual tests are skipped.

See my comments below.

> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Mark Brown <broonie@kernel.org>
> Cc: Steven Price <Steven.Price@arm.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Sri Krishna chowdary <schowdary@nvidia.com>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Russell King - ARM Linux <linux@armlinux.org.uk>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Vineet Gupta <vgupta@synopsys.com>
> Cc: James Hogan <jhogan@kernel.org>
> Cc: Paul Burton <paul.burton@mips.com>
> Cc: Ralf Baechle <ralf@linux-mips.org>
> Cc: linux-snps-arc@lists.infradead.org
> Cc: linux-mips@vger.kernel.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-ia64@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-s390@vger.kernel.org
> Cc: linux-sh@vger.kernel.org
> Cc: sparclinux@vger.kernel.org
> Cc: x86@kernel.org
> Cc: linux-kernel@vger.kernel.org
> 
> Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
>  mm/Kconfig.debug       |  14 ++
>  mm/Makefile            |   1 +
>  mm/arch_pgtable_test.c | 425 +++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 440 insertions(+)
>  create mode 100644 mm/arch_pgtable_test.c
> 
> diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug
> index 327b3ebf23bf..ce9c397f7b07 100644
> --- a/mm/Kconfig.debug
> +++ b/mm/Kconfig.debug
> @@ -117,3 +117,17 @@ config DEBUG_RODATA_TEST
>      depends on STRICT_KERNEL_RWX
>      ---help---
>        This option enables a testcase for the setting rodata read-only.
> +
> +config DEBUG_ARCH_PGTABLE_TEST
> +	bool "Test arch page table helpers for semantics compliance"
> +	depends on MMU
> +	depends on DEBUG_KERNEL
> +	help
> +	  This options provides a kernel module which can be used to test
> +	  architecture page table helper functions on various platform in
> +	  verifying if they comply with expected generic MM semantics. This
> +	  will help architectures code in making sure that any changes or
> +	  new additions of these helpers will still conform to generic MM
> +	  expected semantics.
> +
> +	  If unsure, say N.
> diff --git a/mm/Makefile b/mm/Makefile
> index d996846697ef..bb572c5aa8c5 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -86,6 +86,7 @@ obj-$(CONFIG_HWPOISON_INJECT) += hwpoison-inject.o
>  obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o
>  obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak-test.o
>  obj-$(CONFIG_DEBUG_RODATA_TEST) += rodata_test.o
> +obj-$(CONFIG_DEBUG_ARCH_PGTABLE_TEST) += arch_pgtable_test.o
>  obj-$(CONFIG_PAGE_OWNER) += page_owner.o
>  obj-$(CONFIG_CLEANCACHE) += cleancache.o
>  obj-$(CONFIG_MEMORY_ISOLATION) += page_isolation.o
> diff --git a/mm/arch_pgtable_test.c b/mm/arch_pgtable_test.c
> new file mode 100644
> index 000000000000..f15be8a73723
> --- /dev/null
> +++ b/mm/arch_pgtable_test.c
> @@ -0,0 +1,425 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * This kernel module validates architecture page table helpers &
> + * accessors and helps in verifying their continued compliance with
> + * generic MM semantics.
> + *
> + * Copyright (C) 2019 ARM Ltd.
> + *
> + * Author: Anshuman Khandual <anshuman.khandual@arm.com>
> + */
> +#define pr_fmt(fmt) "arch_pgtable_test: %s " fmt, __func__
> +
> +#include <linux/kernel.h>
> +#include <linux/hugetlb.h>
> +#include <linux/mm.h>
> +#include <linux/mman.h>
> +#include <linux/mm_types.h>
> +#include <linux/module.h>
> +#include <linux/printk.h>
> +#include <linux/swap.h>
> +#include <linux/swapops.h>
> +#include <linux/pfn_t.h>
> +#include <linux/gfp.h>
> +#include <linux/spinlock.h>
> +#include <linux/sched/mm.h>
> +#include <asm/pgalloc.h>
> +#include <asm/pgtable.h>
> +
> +/*
> + * Basic operations
> + *
> + * mkold(entry)			= An old and not a young entry
> + * mkyoung(entry)		= A young and not an old entry
> + * mkdirty(entry)		= A dirty and not a clean entry
> + * mkclean(entry)		= A clean and not a dirty entry
> + * mkwrite(entry)		= A write and not a write protected entry
> + * wrprotect(entry)		= A write protected and not a write entry
> + * pxx_bad(entry)		= A mapped and non-table entry
> + * pxx_same(entry1, entry2)	= Both entries hold the exact same value
> + */
> +#define VADDR_TEST	(PGDIR_SIZE + PUD_SIZE + PMD_SIZE + PAGE_SIZE)

What is special about this address? How do you know if it is not occupied
yet?

> +#define VMA_TEST_FLAGS	(VM_READ|VM_WRITE|VM_EXEC)
> +#define RANDOM_NZVALUE	(0xbe)
> +
> +static bool pud_aligned;
> +static bool pmd_aligned;
> +
> +extern struct mm_struct *mm_alloc(void);
> +
> +static void pte_basic_tests(struct page *page, pgprot_t prot)
> +{
> +	pte_t pte = mk_pte(page, prot);
> +
> +	WARN_ON(!pte_same(pte, pte));
> +	WARN_ON(!pte_young(pte_mkyoung(pte)));
> +	WARN_ON(!pte_dirty(pte_mkdirty(pte)));
> +	WARN_ON(!pte_write(pte_mkwrite(pte)));
> +	WARN_ON(pte_young(pte_mkold(pte)));
> +	WARN_ON(pte_dirty(pte_mkclean(pte)));
> +	WARN_ON(pte_write(pte_wrprotect(pte)));
> +}
> +
> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE
> +static void pmd_basic_tests(struct page *page, pgprot_t prot)
> +{
> +	pmd_t pmd;
> +
> +	/*
> +	 * Memory block here must be PMD_SIZE aligned. Abort this
> +	 * test in case we could not allocate such a memory block.
> +	 */
> +	if (!pmd_aligned) {
> +		pr_warn("Could not proceed with PMD tests\n");
> +		return;
> +	}
> +
> +	pmd = mk_pmd(page, prot);
> +	WARN_ON(!pmd_same(pmd, pmd));
> +	WARN_ON(!pmd_young(pmd_mkyoung(pmd)));
> +	WARN_ON(!pmd_dirty(pmd_mkdirty(pmd)));
> +	WARN_ON(!pmd_write(pmd_mkwrite(pmd)));
> +	WARN_ON(pmd_young(pmd_mkold(pmd)));
> +	WARN_ON(pmd_dirty(pmd_mkclean(pmd)));
> +	WARN_ON(pmd_write(pmd_wrprotect(pmd)));
> +	/*
> +	 * A huge page does not point to next level page table
> +	 * entry. Hence this must qualify as pmd_bad().
> +	 */
> +	WARN_ON(!pmd_bad(pmd_mkhuge(pmd)));
> +}
> +#else
> +static void pmd_basic_tests(struct page *page, pgprot_t prot) { }
> +#endif
> +
> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
> +static void pud_basic_tests(struct page *page, pgprot_t prot)
> +{
> +	pud_t pud;
> +
> +	/*
> +	 * Memory block here must be PUD_SIZE aligned. Abort this
> +	 * test in case we could not allocate such a memory block.
> +	 */
> +	if (!pud_aligned) {
> +		pr_warn("Could not proceed with PUD tests\n");
> +		return;
> +	}
> +
> +	pud = pfn_pud(page_to_pfn(page), prot);
> +	WARN_ON(!pud_same(pud, pud));
> +	WARN_ON(!pud_young(pud_mkyoung(pud)));
> +	WARN_ON(!pud_write(pud_mkwrite(pud)));
> +	WARN_ON(pud_write(pud_wrprotect(pud)));
> +	WARN_ON(pud_young(pud_mkold(pud)));
> +
> +#if !defined(__PAGETABLE_PMD_FOLDED) && !defined(__ARCH_HAS_4LEVEL_HACK)
> +	/*
> +	 * A huge page does not point to next level page table
> +	 * entry. Hence this must qualify as pud_bad().
> +	 */
> +	WARN_ON(!pud_bad(pud_mkhuge(pud)));
> +#endif
> +}
> +#else
> +static void pud_basic_tests(struct page *page, pgprot_t prot) { }
> +#endif
> +
> +static void p4d_basic_tests(struct page *page, pgprot_t prot)
> +{
> +	p4d_t p4d;
> +
> +	memset(&p4d, RANDOM_NZVALUE, sizeof(p4d_t));
> +	WARN_ON(!p4d_same(p4d, p4d));
> +}
> +
> +static void pgd_basic_tests(struct page *page, pgprot_t prot)
> +{
> +	pgd_t pgd;
> +
> +	memset(&pgd, RANDOM_NZVALUE, sizeof(pgd_t));
> +	WARN_ON(!pgd_same(pgd, pgd));
> +}
> +
> +#if !defined(__PAGETABLE_PMD_FOLDED) && !defined(__ARCH_HAS_4LEVEL_HACK)
> +static void pud_clear_tests(pud_t *pudp)
> +{
> +	memset(pudp, RANDOM_NZVALUE, sizeof(pud_t));
> +	pud_clear(pudp);
> +	WARN_ON(!pud_none(READ_ONCE(*pudp)));
> +}
> +
> +static void pud_populate_tests(struct mm_struct *mm, pud_t *pudp, pmd_t *pmdp)
> +{
> +	/*
> +	 * This entry points to next level page table page.
> +	 * Hence this must not qualify as pud_bad().
> +	 */
> +	pmd_clear(pmdp);
> +	pud_clear(pudp);
> +	pud_populate(mm, pudp, pmdp);
> +	WARN_ON(pud_bad(READ_ONCE(*pudp)));
> +}
> +#else
> +static void pud_clear_tests(pud_t *pudp) { }
> +static void pud_populate_tests(struct mm_struct *mm, pud_t *pudp, pmd_t *pmdp)
> +{
> +}
> +#endif
> +
> +#if !defined(__PAGETABLE_PUD_FOLDED) && !defined(__ARCH_HAS_5LEVEL_HACK)
> +static void p4d_clear_tests(p4d_t *p4dp)
> +{
> +	memset(p4dp, RANDOM_NZVALUE, sizeof(p4d_t));
> +	p4d_clear(p4dp);
> +	WARN_ON(!p4d_none(READ_ONCE(*p4dp)));
> +}
> +
> +static void p4d_populate_tests(struct mm_struct *mm, p4d_t *p4dp, pud_t *pudp)
> +{
> +	/*
> +	 * This entry points to next level page table page.
> +	 * Hence this must not qualify as p4d_bad().
> +	 */
> +	pud_clear(pudp);
> +	p4d_clear(p4dp);
> +	p4d_populate(mm, p4dp, pudp);
> +	WARN_ON(p4d_bad(READ_ONCE(*p4dp)));
> +}
> +#else
> +static void p4d_clear_tests(p4d_t *p4dp) { }
> +static void p4d_populate_tests(struct mm_struct *mm, p4d_t *p4dp, pud_t *pudp)
> +{
> +}
> +#endif
> +
> +#ifndef __PAGETABLE_P4D_FOLDED
> +static void pgd_clear_tests(pgd_t *pgdp)
> +{
> +	memset(pgdp, RANDOM_NZVALUE, sizeof(pgd_t));
> +	pgd_clear(pgdp);
> +	WARN_ON(!pgd_none(READ_ONCE(*pgdp)));
> +}
> +
> +static void pgd_populate_tests(struct mm_struct *mm, pgd_t *pgdp, p4d_t *p4dp)
> +{
> +	/*
> +	 * This entry points to next level page table page.
> +	 * Hence this must not qualify as pgd_bad().
> +	 */
> +	p4d_clear(p4dp);
> +	pgd_clear(pgdp);
> +	pgd_populate(mm, pgdp, p4dp);
> +	WARN_ON(pgd_bad(READ_ONCE(*pgdp)));
> +}
> +#else
> +static void pgd_clear_tests(pgd_t *pgdp) { }
> +static void pgd_populate_tests(struct mm_struct *mm, pgd_t *pgdp, p4d_t *p4dp)
> +{
> +}
> +#endif

This will not work if p4d is folded at runtime. Like for x86-64 and s390.

Here's the fixup. It should work for both x86-64 and s390, but I only
tested on x86-64:

diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 52e5f5f2240d..b882792a3999 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -40,6 +40,8 @@ static inline bool pgtable_l5_enabled(void)
 #define pgtable_l5_enabled() 0
 #endif /* CONFIG_X86_5LEVEL */
 
+#define mm_p4d_folded(mm) (!pgtable_l5_enabled())
+
 extern unsigned int pgdir_shift;
 extern unsigned int ptrs_per_p4d;
 
diff --git a/mm/arch_pgtable_test.c b/mm/arch_pgtable_test.c
index f15be8a73723..206fe3334a28 100644
--- a/mm/arch_pgtable_test.c
+++ b/mm/arch_pgtable_test.c
@@ -193,9 +193,11 @@ static void p4d_populate_tests(struct mm_struct *mm, p4d_t *p4dp, pud_t *pudp)
 }
 #endif
 
-#ifndef __PAGETABLE_P4D_FOLDED
 static void pgd_clear_tests(pgd_t *pgdp)
 {
+	if (mm_p4d_folded(mm))
+		return;
+
 	memset(pgdp, RANDOM_NZVALUE, sizeof(pgd_t));
 	pgd_clear(pgdp);
 	WARN_ON(!pgd_none(READ_ONCE(*pgdp)));
@@ -203,6 +205,9 @@ static void pgd_clear_tests(pgd_t *pgdp)
 
 static void pgd_populate_tests(struct mm_struct *mm, pgd_t *pgdp, p4d_t *p4dp)
 {
+	if (mm_p4d_folded(mm))
+		return;
+
 	/*
 	 * This entry points to next level page table page.
 	 * Hence this must not qualify as pgd_bad().
@@ -212,12 +217,6 @@ static void pgd_populate_tests(struct mm_struct *mm, pgd_t *pgdp, p4d_t *p4dp)
 	pgd_populate(mm, pgdp, p4dp);
 	WARN_ON(pgd_bad(READ_ONCE(*pgdp)));
 }
-#else
-static void pgd_clear_tests(pgd_t *pgdp) { }
-static void pgd_populate_tests(struct mm_struct *mm, pgd_t *pgdp, p4d_t *p4dp)
-{
-}
-#endif
 
 static void pte_clear_tests(pte_t *ptep)
 {

> +
> +static void pte_clear_tests(pte_t *ptep)
> +{
> +	memset(ptep, RANDOM_NZVALUE, sizeof(pte_t));
> +	pte_clear(NULL, 0, ptep);
> +	WARN_ON(!pte_none(READ_ONCE(*ptep)));
> +}
> +
> +static void pmd_clear_tests(pmd_t *pmdp)
> +{
> +	memset(pmdp, RANDOM_NZVALUE, sizeof(pmd_t));
> +	pmd_clear(pmdp);
> +	WARN_ON(!pmd_none(READ_ONCE(*pmdp)));
> +}
> +
> +static void pmd_populate_tests(struct mm_struct *mm, pmd_t *pmdp,
> +			       pgtable_t pgtable)
> +{
> +	/*
> +	 * This entry points to next level page table page.
> +	 * Hence this must not qualify as pmd_bad().
> +	 */
> +	pmd_clear(pmdp);
> +	pmd_populate(mm, pmdp, pgtable);
> +	WARN_ON(pmd_bad(READ_ONCE(*pmdp)));
> +}
> +
> +static bool pfn_range_valid(struct zone *z, unsigned long start_pfn,
> +			    unsigned long nr_pages)
> +{
> +	unsigned long i, end_pfn = start_pfn + nr_pages;
> +	struct page *page;
> +
> +	for (i = start_pfn; i < end_pfn; i++) {
> +		if (!pfn_valid(i))
> +			return false;
> +
> +		page = pfn_to_page(i);
> +
> +		if (page_zone(page) != z)
> +			return false;
> +
> +		if (PageReserved(page))
> +			return false;
> +
> +		if (page_count(page) > 0)
> +			return false;
> +
> +		if (PageHuge(page))
> +			return false;
> +	}
> +	return true;
> +}
> +
> +static struct page *alloc_gigantic_page(nodemask_t *nodemask,
> +					int nid, gfp_t gfp_mask, int order)
> +{
> +	struct zonelist *zonelist;
> +	struct zone *zone;
> +	struct zoneref *z;
> +	enum zone_type zonesel;
> +	unsigned long ret, pfn, flags, nr_pages;
> +
> +	nr_pages = 1UL << order;
> +	zonesel = gfp_zone(gfp_mask);
> +	zonelist = node_zonelist(nid, gfp_mask);
> +	for_each_zone_zonelist_nodemask(zone, z, zonelist, zonesel, nodemask) {
> +		spin_lock_irqsave(&zone->lock, flags);
> +		pfn = ALIGN(zone->zone_start_pfn, nr_pages);
> +		while (zone_spans_pfn(zone, pfn + nr_pages - 1)) {
> +			if (pfn_range_valid(zone, pfn, nr_pages)) {
> +				spin_unlock_irqrestore(&zone->lock, flags);
> +				ret = alloc_contig_range(pfn, pfn + nr_pages,
> +							 MIGRATE_MOVABLE,
> +							 gfp_mask);
> +				if (!ret)
> +					return pfn_to_page(pfn);
> +				spin_lock_irqsave(&zone->lock, flags);
> +			}
> +			pfn += nr_pages;
> +		}
> +		spin_unlock_irqrestore(&zone->lock, flags);
> +	}
> +	return NULL;
> +}
> +
> +static struct page *alloc_mapped_page(void)
> +{
> +	gfp_t gfp_mask = GFP_KERNEL | __GFP_ZERO;
> +	struct page *page = NULL;
> +
> +	page = alloc_gigantic_page(&node_states[N_MEMORY], first_memory_node,
> +				   gfp_mask, get_order(PUD_SIZE));
> +	if (page) {
> +		pud_aligned = true;
> +		pmd_aligned = true;
> +		return page;
> +	}
> +
> +	page = alloc_pages(gfp_mask, get_order(PMD_SIZE));
> +	if (page) {
> +		pmd_aligned = true;
> +		return page;
> +	}
> +	return alloc_page(gfp_mask);
> +}
> +
> +static void free_mapped_page(struct page *page)
> +{
> +	if (pud_aligned) {
> +		unsigned long pfn = page_to_pfn(page);
> +
> +		free_contig_range(pfn, 1ULL << get_order(PUD_SIZE));
> +		return;
> +	}
> +
> +	if (pmd_aligned) {
> +		int order = get_order(PMD_SIZE);
> +
> +		free_pages((unsigned long)page_address(page), order);
> +		return;
> +	}
> +	free_page((unsigned long)page_address(page));
> +}
> +
> +static int __init arch_pgtable_tests_init(void)
> +{
> +	struct mm_struct *mm;
> +	struct page *page;
> +	pgd_t *pgdp;
> +	p4d_t *p4dp, *saved_p4dp;
> +	pud_t *pudp, *saved_pudp;
> +	pmd_t *pmdp, *saved_pmdp;
> +	pte_t *ptep, *saved_ptep;
> +	pgprot_t prot = vm_get_page_prot(VMA_TEST_FLAGS);
> +	unsigned long vaddr = VADDR_TEST;
> +
> +	mm = mm_alloc();
> +	if (!mm) {
> +		pr_err("mm_struct allocation failed\n");
> +		return 1;
> +	}
> +
> +	page = alloc_mapped_page();
> +	if (!page) {
> +		pr_err("memory allocation failed\n");
> +		return 1;
> +	}
> +
> +	pgdp = pgd_offset(mm, vaddr);
> +	p4dp = p4d_alloc(mm, pgdp, vaddr);
> +	pudp = pud_alloc(mm, p4dp, vaddr);
> +	pmdp = pmd_alloc(mm, pudp, vaddr);
> +	ptep = pte_alloc_map(mm, pmdp, vaddr);
> +
> +	/*
> +	 * Save all the page table page addresses as the page table
> +	 * entries will be used for testing with random or garbage
> +	 * values. These saved addresses will be used for freeing
> +	 * page table pages.
> +	 */
> +	saved_p4dp = p4d_offset(pgdp, 0UL);
> +	saved_pudp = pud_offset(p4dp, 0UL);
> +	saved_pmdp = pmd_offset(pudp, 0UL);
> +	saved_ptep = pte_offset_map(pmdp, 0UL);
> +
> +	pte_basic_tests(page, prot);
> +	pmd_basic_tests(page, prot);
> +	pud_basic_tests(page, prot);
> +	p4d_basic_tests(page, prot);
> +	pgd_basic_tests(page, prot);
> +
> +	pte_clear_tests(ptep);
> +	pmd_clear_tests(pmdp);
> +	pud_clear_tests(pudp);
> +	p4d_clear_tests(p4dp);
> +	pgd_clear_tests(pgdp);
> +
> +	pmd_populate_tests(mm, pmdp, (pgtable_t) page);

This is not correct for architectures that defines pgtable_t as pte_t
pointer, not struct page pointer.

> +	pud_populate_tests(mm, pudp, pmdp);
> +	p4d_populate_tests(mm, p4dp, pudp);
> +	pgd_populate_tests(mm, pgdp, p4dp);

This is wrong. All p?dp points to the second entry in page table entry.
This is not valid pointer for page table and triggers p?d_bad() on x86.

Use saved_p?dp instead.

> +
> +	p4d_free(mm, saved_p4dp);
> +	pud_free(mm, saved_pudp);
> +	pmd_free(mm, saved_pmdp);
> +	pte_free(mm, (pgtable_t) virt_to_page(saved_ptep));
> +
> +	mm_dec_nr_puds(mm);
> +	mm_dec_nr_pmds(mm);
> +	mm_dec_nr_ptes(mm);
> +	__mmdrop(mm);
> +
> +	free_mapped_page(page);
> +	return 0;
> +}
> +
> +static void __exit arch_pgtable_tests_exit(void) { }
> +
> +module_init(arch_pgtable_tests_init);
> +module_exit(arch_pgtable_tests_exit);
> +
> +MODULE_LICENSE("GPL v2");
> +MODULE_AUTHOR("Anshuman Khandual <anshuman.khandual@arm.com>");
> +MODULE_DESCRIPTION("Test archicture page table helpers");
> -- 
> 2.20.1
> 
>
Gerald Schaefer Sept. 4, 2019, 8:16 p.m. UTC | #3
On Tue,  3 Sep 2019 13:31:46 +0530
Anshuman Khandual <anshuman.khandual@arm.com> wrote:

> This adds a test module which will validate architecture page table helpers
> and accessors regarding compliance with generic MM semantics expectations.
> This will help various architectures in validating changes to the existing
> page table helpers or addition of new ones.
> 
> Test page table and memory pages creating it's entries at various level are
> all allocated from system memory with required alignments. If memory pages
> with required size and alignment could not be allocated, then all depending
> individual tests are skipped.

This looks very useful, thanks. Of course, s390 is quite special and does
not work nicely with this patch (yet), mostly because of our dynamic page
table levels/folding. Still need to figure out what can be fixed in the arch
code and what would need to be changed in the test module. See below for some
generic comments/questions.

At least one real bug in the s390 code was already revealed by this, which
is very nice. In pmd/pud_bad(), we also check large pmds/puds for sanity,
instead of reporting them as bad, which is apparently not how it is expected.

[...]
> +/*
> + * Basic operations
> + *
> + * mkold(entry)			= An old and not a young entry
> + * mkyoung(entry)		= A young and not an old entry
> + * mkdirty(entry)		= A dirty and not a clean entry
> + * mkclean(entry)		= A clean and not a dirty entry
> + * mkwrite(entry)		= A write and not a write protected entry
> + * wrprotect(entry)		= A write protected and not a write entry
> + * pxx_bad(entry)		= A mapped and non-table entry
> + * pxx_same(entry1, entry2)	= Both entries hold the exact same value
> + */
> +#define VADDR_TEST	(PGDIR_SIZE + PUD_SIZE + PMD_SIZE + PAGE_SIZE)

Why is P4D_SIZE missing in the VADDR_TEST calculation?

[...]
> +
> +#if !defined(__PAGETABLE_PMD_FOLDED) && !defined(__ARCH_HAS_4LEVEL_HACK)
> +static void pud_clear_tests(pud_t *pudp)
> +{
> +	memset(pudp, RANDOM_NZVALUE, sizeof(pud_t));
> +	pud_clear(pudp);
> +	WARN_ON(!pud_none(READ_ONCE(*pudp)));
> +}

For pgd/p4d/pud_clear(), we only clear if the page table level is present
and not folded. The memset() here overwrites the table type bits, so
pud_clear() will not clear anything on s390 and the pud_none() check will
fail.
Would it be possible to OR a (larger) random value into the table, so that
the lower 12 bits would be preserved?

> +
> +static void pud_populate_tests(struct mm_struct *mm, pud_t *pudp, pmd_t *pmdp)
> +{
> +	/*
> +	 * This entry points to next level page table page.
> +	 * Hence this must not qualify as pud_bad().
> +	 */
> +	pmd_clear(pmdp);
> +	pud_clear(pudp);
> +	pud_populate(mm, pudp, pmdp);
> +	WARN_ON(pud_bad(READ_ONCE(*pudp)));
> +}

This will populate the pud with a pmd pointer that does not point to the
beginning of the pmd table, but to the second entry (because of how
VADDR_TEST is constructed). This will result in failing pud_bad() check
on s390. Not sure why/how it works on other archs, but would it be possible
to align pmdp down to the beginning of the pmd table (and similar for the
other pxd_populate_tests)?

[...]
> +
> +	p4d_free(mm, saved_p4dp);
> +	pud_free(mm, saved_pudp);
> +	pmd_free(mm, saved_pmdp);
> +	pte_free(mm, (pgtable_t) virt_to_page(saved_ptep));

pgtable_t is arch-specific, and on s390 it is not a struct page pointer,
but a pte pointer. So this will go wrong, also on all other archs (if any)
where pgtable_t is not struct page.
Would it be possible to use pte_free_kernel() instead, and just pass
saved_ptep directly?

Regards,
Gerald
Dave Hansen Sept. 4, 2019, 11:14 p.m. UTC | #4
On 9/3/19 1:01 AM, Anshuman Khandual wrote:
> This adds a test module which will validate architecture page table helpers
> and accessors regarding compliance with generic MM semantics expectations.
> This will help various architectures in validating changes to the existing
> page table helpers or addition of new ones.

This looks really cool.  The "only" complication on x86 is the large
number of compile and runtime options that we have.  When this gets
merged, it would be really nice to make sure that the 0day guys have
good coverage of all the configurations.

I'm not _quite_ sure what kind of bugs it will catch on x86 and I
suspect it'll have more value for the other architectures, but it seems
harmless enough.
Anshuman Khandual Sept. 5, 2019, 8:18 a.m. UTC | #5
On 09/04/2019 07:49 PM, Kirill A. Shutemov wrote:
> On Tue, Sep 03, 2019 at 01:31:46PM +0530, Anshuman Khandual wrote:
>> This adds a test module which will validate architecture page table helpers
>> and accessors regarding compliance with generic MM semantics expectations.
>> This will help various architectures in validating changes to the existing
>> page table helpers or addition of new ones.
>>
>> Test page table and memory pages creating it's entries at various level are
>> all allocated from system memory with required alignments. If memory pages
>> with required size and alignment could not be allocated, then all depending
>> individual tests are skipped.
> 
> See my comments below.
> 
>>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Vlastimil Babka <vbabka@suse.cz>
>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
>> Cc: Jason Gunthorpe <jgg@ziepe.ca>
>> Cc: Dan Williams <dan.j.williams@intel.com>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: Michal Hocko <mhocko@kernel.org>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: Mark Brown <broonie@kernel.org>
>> Cc: Steven Price <Steven.Price@arm.com>
>> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
>> Cc: Kees Cook <keescook@chromium.org>
>> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
>> Cc: Matthew Wilcox <willy@infradead.org>
>> Cc: Sri Krishna chowdary <schowdary@nvidia.com>
>> Cc: Dave Hansen <dave.hansen@intel.com>
>> Cc: Russell King - ARM Linux <linux@armlinux.org.uk>
>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>> Cc: Paul Mackerras <paulus@samba.org>
>> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
>> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
>> Cc: "David S. Miller" <davem@davemloft.net>
>> Cc: Vineet Gupta <vgupta@synopsys.com>
>> Cc: James Hogan <jhogan@kernel.org>
>> Cc: Paul Burton <paul.burton@mips.com>
>> Cc: Ralf Baechle <ralf@linux-mips.org>
>> Cc: linux-snps-arc@lists.infradead.org
>> Cc: linux-mips@vger.kernel.org
>> Cc: linux-arm-kernel@lists.infradead.org
>> Cc: linux-ia64@vger.kernel.org
>> Cc: linuxppc-dev@lists.ozlabs.org
>> Cc: linux-s390@vger.kernel.org
>> Cc: linux-sh@vger.kernel.org
>> Cc: sparclinux@vger.kernel.org
>> Cc: x86@kernel.org
>> Cc: linux-kernel@vger.kernel.org
>>
>> Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>> ---
>>  mm/Kconfig.debug       |  14 ++
>>  mm/Makefile            |   1 +
>>  mm/arch_pgtable_test.c | 425 +++++++++++++++++++++++++++++++++++++++++
>>  3 files changed, 440 insertions(+)
>>  create mode 100644 mm/arch_pgtable_test.c
>>
>> diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug
>> index 327b3ebf23bf..ce9c397f7b07 100644
>> --- a/mm/Kconfig.debug
>> +++ b/mm/Kconfig.debug
>> @@ -117,3 +117,17 @@ config DEBUG_RODATA_TEST
>>      depends on STRICT_KERNEL_RWX
>>      ---help---
>>        This option enables a testcase for the setting rodata read-only.
>> +
>> +config DEBUG_ARCH_PGTABLE_TEST
>> +	bool "Test arch page table helpers for semantics compliance"
>> +	depends on MMU
>> +	depends on DEBUG_KERNEL
>> +	help
>> +	  This options provides a kernel module which can be used to test
>> +	  architecture page table helper functions on various platform in
>> +	  verifying if they comply with expected generic MM semantics. This
>> +	  will help architectures code in making sure that any changes or
>> +	  new additions of these helpers will still conform to generic MM
>> +	  expected semantics.
>> +
>> +	  If unsure, say N.
>> diff --git a/mm/Makefile b/mm/Makefile
>> index d996846697ef..bb572c5aa8c5 100644
>> --- a/mm/Makefile
>> +++ b/mm/Makefile
>> @@ -86,6 +86,7 @@ obj-$(CONFIG_HWPOISON_INJECT) += hwpoison-inject.o
>>  obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o
>>  obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak-test.o
>>  obj-$(CONFIG_DEBUG_RODATA_TEST) += rodata_test.o
>> +obj-$(CONFIG_DEBUG_ARCH_PGTABLE_TEST) += arch_pgtable_test.o
>>  obj-$(CONFIG_PAGE_OWNER) += page_owner.o
>>  obj-$(CONFIG_CLEANCACHE) += cleancache.o
>>  obj-$(CONFIG_MEMORY_ISOLATION) += page_isolation.o
>> diff --git a/mm/arch_pgtable_test.c b/mm/arch_pgtable_test.c
>> new file mode 100644
>> index 000000000000..f15be8a73723
>> --- /dev/null
>> +++ b/mm/arch_pgtable_test.c
>> @@ -0,0 +1,425 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * This kernel module validates architecture page table helpers &
>> + * accessors and helps in verifying their continued compliance with
>> + * generic MM semantics.
>> + *
>> + * Copyright (C) 2019 ARM Ltd.
>> + *
>> + * Author: Anshuman Khandual <anshuman.khandual@arm.com>
>> + */
>> +#define pr_fmt(fmt) "arch_pgtable_test: %s " fmt, __func__
>> +
>> +#include <linux/kernel.h>
>> +#include <linux/hugetlb.h>
>> +#include <linux/mm.h>
>> +#include <linux/mman.h>
>> +#include <linux/mm_types.h>
>> +#include <linux/module.h>
>> +#include <linux/printk.h>
>> +#include <linux/swap.h>
>> +#include <linux/swapops.h>
>> +#include <linux/pfn_t.h>
>> +#include <linux/gfp.h>
>> +#include <linux/spinlock.h>
>> +#include <linux/sched/mm.h>
>> +#include <asm/pgalloc.h>
>> +#include <asm/pgtable.h>
>> +
>> +/*
>> + * Basic operations
>> + *
>> + * mkold(entry)			= An old and not a young entry
>> + * mkyoung(entry)		= A young and not an old entry
>> + * mkdirty(entry)		= A dirty and not a clean entry
>> + * mkclean(entry)		= A clean and not a dirty entry
>> + * mkwrite(entry)		= A write and not a write protected entry
>> + * wrprotect(entry)		= A write protected and not a write entry
>> + * pxx_bad(entry)		= A mapped and non-table entry
>> + * pxx_same(entry1, entry2)	= Both entries hold the exact same value
>> + */
>> +#define VADDR_TEST	(PGDIR_SIZE + PUD_SIZE + PMD_SIZE + PAGE_SIZE)
> 
> What is special about this address? How do you know if it is not occupied
> yet?

We are creating the page table from scratch after allocating an mm_struct
for a given random virtual address 'VADDR_TEST'. Hence nothing is occupied
just yet. There is nothing special about this address, just that it tries
to ensure the page table entries are being created with some offset from
beginning of respective page table page at all levels ? The idea is to
have a more representative form of page table structure for test.

> 
>> +#define VMA_TEST_FLAGS	(VM_READ|VM_WRITE|VM_EXEC)
>> +#define RANDOM_NZVALUE	(0xbe)
>> +
>> +static bool pud_aligned;
>> +static bool pmd_aligned;
>> +
>> +extern struct mm_struct *mm_alloc(void);
>> +
>> +static void pte_basic_tests(struct page *page, pgprot_t prot)
>> +{
>> +	pte_t pte = mk_pte(page, prot);
>> +
>> +	WARN_ON(!pte_same(pte, pte));
>> +	WARN_ON(!pte_young(pte_mkyoung(pte)));
>> +	WARN_ON(!pte_dirty(pte_mkdirty(pte)));
>> +	WARN_ON(!pte_write(pte_mkwrite(pte)));
>> +	WARN_ON(pte_young(pte_mkold(pte)));
>> +	WARN_ON(pte_dirty(pte_mkclean(pte)));
>> +	WARN_ON(pte_write(pte_wrprotect(pte)));
>> +}
>> +
>> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE
>> +static void pmd_basic_tests(struct page *page, pgprot_t prot)
>> +{
>> +	pmd_t pmd;
>> +
>> +	/*
>> +	 * Memory block here must be PMD_SIZE aligned. Abort this
>> +	 * test in case we could not allocate such a memory block.
>> +	 */
>> +	if (!pmd_aligned) {
>> +		pr_warn("Could not proceed with PMD tests\n");
>> +		return;
>> +	}
>> +
>> +	pmd = mk_pmd(page, prot);
>> +	WARN_ON(!pmd_same(pmd, pmd));
>> +	WARN_ON(!pmd_young(pmd_mkyoung(pmd)));
>> +	WARN_ON(!pmd_dirty(pmd_mkdirty(pmd)));
>> +	WARN_ON(!pmd_write(pmd_mkwrite(pmd)));
>> +	WARN_ON(pmd_young(pmd_mkold(pmd)));
>> +	WARN_ON(pmd_dirty(pmd_mkclean(pmd)));
>> +	WARN_ON(pmd_write(pmd_wrprotect(pmd)));
>> +	/*
>> +	 * A huge page does not point to next level page table
>> +	 * entry. Hence this must qualify as pmd_bad().
>> +	 */
>> +	WARN_ON(!pmd_bad(pmd_mkhuge(pmd)));
>> +}
>> +#else
>> +static void pmd_basic_tests(struct page *page, pgprot_t prot) { }
>> +#endif
>> +
>> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>> +static void pud_basic_tests(struct page *page, pgprot_t prot)
>> +{
>> +	pud_t pud;
>> +
>> +	/*
>> +	 * Memory block here must be PUD_SIZE aligned. Abort this
>> +	 * test in case we could not allocate such a memory block.
>> +	 */
>> +	if (!pud_aligned) {
>> +		pr_warn("Could not proceed with PUD tests\n");
>> +		return;
>> +	}
>> +
>> +	pud = pfn_pud(page_to_pfn(page), prot);
>> +	WARN_ON(!pud_same(pud, pud));
>> +	WARN_ON(!pud_young(pud_mkyoung(pud)));
>> +	WARN_ON(!pud_write(pud_mkwrite(pud)));
>> +	WARN_ON(pud_write(pud_wrprotect(pud)));
>> +	WARN_ON(pud_young(pud_mkold(pud)));
>> +
>> +#if !defined(__PAGETABLE_PMD_FOLDED) && !defined(__ARCH_HAS_4LEVEL_HACK)
>> +	/*
>> +	 * A huge page does not point to next level page table
>> +	 * entry. Hence this must qualify as pud_bad().
>> +	 */
>> +	WARN_ON(!pud_bad(pud_mkhuge(pud)));
>> +#endif
>> +}
>> +#else
>> +static void pud_basic_tests(struct page *page, pgprot_t prot) { }
>> +#endif
>> +
>> +static void p4d_basic_tests(struct page *page, pgprot_t prot)
>> +{
>> +	p4d_t p4d;
>> +
>> +	memset(&p4d, RANDOM_NZVALUE, sizeof(p4d_t));
>> +	WARN_ON(!p4d_same(p4d, p4d));
>> +}
>> +
>> +static void pgd_basic_tests(struct page *page, pgprot_t prot)
>> +{
>> +	pgd_t pgd;
>> +
>> +	memset(&pgd, RANDOM_NZVALUE, sizeof(pgd_t));
>> +	WARN_ON(!pgd_same(pgd, pgd));
>> +}
>> +
>> +#if !defined(__PAGETABLE_PMD_FOLDED) && !defined(__ARCH_HAS_4LEVEL_HACK)
>> +static void pud_clear_tests(pud_t *pudp)
>> +{
>> +	memset(pudp, RANDOM_NZVALUE, sizeof(pud_t));
>> +	pud_clear(pudp);
>> +	WARN_ON(!pud_none(READ_ONCE(*pudp)));
>> +}
>> +
>> +static void pud_populate_tests(struct mm_struct *mm, pud_t *pudp, pmd_t *pmdp)
>> +{
>> +	/*
>> +	 * This entry points to next level page table page.
>> +	 * Hence this must not qualify as pud_bad().
>> +	 */
>> +	pmd_clear(pmdp);
>> +	pud_clear(pudp);
>> +	pud_populate(mm, pudp, pmdp);
>> +	WARN_ON(pud_bad(READ_ONCE(*pudp)));
>> +}
>> +#else
>> +static void pud_clear_tests(pud_t *pudp) { }
>> +static void pud_populate_tests(struct mm_struct *mm, pud_t *pudp, pmd_t *pmdp)
>> +{
>> +}
>> +#endif
>> +
>> +#if !defined(__PAGETABLE_PUD_FOLDED) && !defined(__ARCH_HAS_5LEVEL_HACK)
>> +static void p4d_clear_tests(p4d_t *p4dp)
>> +{
>> +	memset(p4dp, RANDOM_NZVALUE, sizeof(p4d_t));
>> +	p4d_clear(p4dp);
>> +	WARN_ON(!p4d_none(READ_ONCE(*p4dp)));
>> +}
>> +
>> +static void p4d_populate_tests(struct mm_struct *mm, p4d_t *p4dp, pud_t *pudp)
>> +{
>> +	/*
>> +	 * This entry points to next level page table page.
>> +	 * Hence this must not qualify as p4d_bad().
>> +	 */
>> +	pud_clear(pudp);
>> +	p4d_clear(p4dp);
>> +	p4d_populate(mm, p4dp, pudp);
>> +	WARN_ON(p4d_bad(READ_ONCE(*p4dp)));
>> +}
>> +#else
>> +static void p4d_clear_tests(p4d_t *p4dp) { }
>> +static void p4d_populate_tests(struct mm_struct *mm, p4d_t *p4dp, pud_t *pudp)
>> +{
>> +}
>> +#endif
>> +
>> +#ifndef __PAGETABLE_P4D_FOLDED
>> +static void pgd_clear_tests(pgd_t *pgdp)
>> +{
>> +	memset(pgdp, RANDOM_NZVALUE, sizeof(pgd_t));
>> +	pgd_clear(pgdp);
>> +	WARN_ON(!pgd_none(READ_ONCE(*pgdp)));
>> +}
>> +
>> +static void pgd_populate_tests(struct mm_struct *mm, pgd_t *pgdp, p4d_t *p4dp)
>> +{
>> +	/*
>> +	 * This entry points to next level page table page.
>> +	 * Hence this must not qualify as pgd_bad().
>> +	 */
>> +	p4d_clear(p4dp);
>> +	pgd_clear(pgdp);
>> +	pgd_populate(mm, pgdp, p4dp);
>> +	WARN_ON(pgd_bad(READ_ONCE(*pgdp)));
>> +}
>> +#else
>> +static void pgd_clear_tests(pgd_t *pgdp) { }
>> +static void pgd_populate_tests(struct mm_struct *mm, pgd_t *pgdp, p4d_t *p4dp)
>> +{
>> +}
>> +#endif
> 
> This will not work if p4d is folded at runtime. Like for x86-64 and s390.
> 
> Here's the fixup. It should work for both x86-64 and s390, but I only
> tested on x86-64:
> 
> diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
> index 52e5f5f2240d..b882792a3999 100644
> --- a/arch/x86/include/asm/pgtable_64_types.h
> +++ b/arch/x86/include/asm/pgtable_64_types.h
> @@ -40,6 +40,8 @@ static inline bool pgtable_l5_enabled(void)
>  #define pgtable_l5_enabled() 0
>  #endif /* CONFIG_X86_5LEVEL */
>  
> +#define mm_p4d_folded(mm) (!pgtable_l5_enabled())
> +
>  extern unsigned int pgdir_shift;
>  extern unsigned int ptrs_per_p4d;
>  
> diff --git a/mm/arch_pgtable_test.c b/mm/arch_pgtable_test.c
> index f15be8a73723..206fe3334a28 100644
> --- a/mm/arch_pgtable_test.c
> +++ b/mm/arch_pgtable_test.c
> @@ -193,9 +193,11 @@ static void p4d_populate_tests(struct mm_struct *mm, p4d_t *p4dp, pud_t *pudp)
>  }
>  #endif
>  
> -#ifndef __PAGETABLE_P4D_FOLDED
>  static void pgd_clear_tests(pgd_t *pgdp)
>  {
> +	if (mm_p4d_folded(mm))
> +		return;
> +
>  	memset(pgdp, RANDOM_NZVALUE, sizeof(pgd_t));
>  	pgd_clear(pgdp);
>  	WARN_ON(!pgd_none(READ_ONCE(*pgdp)));
> @@ -203,6 +205,9 @@ static void pgd_clear_tests(pgd_t *pgdp)
>  
>  static void pgd_populate_tests(struct mm_struct *mm, pgd_t *pgdp, p4d_t *p4dp)
>  {
> +	if (mm_p4d_folded(mm))
> +		return;
> +
>  	/*
>  	 * This entry points to next level page table page.
>  	 * Hence this must not qualify as pgd_bad().
> @@ -212,12 +217,6 @@ static void pgd_populate_tests(struct mm_struct *mm, pgd_t *pgdp, p4d_t *p4dp)
>  	pgd_populate(mm, pgdp, p4dp);
>  	WARN_ON(pgd_bad(READ_ONCE(*pgdp)));
>  }
> -#else
> -static void pgd_clear_tests(pgd_t *pgdp) { }
> -static void pgd_populate_tests(struct mm_struct *mm, pgd_t *pgdp, p4d_t *p4dp)
> -{
> -}
> -#endif

This makes sense for runtime cases but there is a problem here.

On arm64, pgd_populate() which takes (pud_t *) as last argument instead of
(p4d_t *) will fail to build when not wrapped in !__PAGETABLE_P4D_FOLDED
on certain configurations.

./arch/arm64/include/asm/pgalloc.h:81:75: note:
expected ‘pud_t *’ {aka ‘struct <anonymous> *’}
but argument is of type ‘pgd_t *’ {aka ‘struct <anonymous> *’}
static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgdp, pud_t *pudp)
                                                                   ~~~~~~~^~~~
Wondering if this is something to be fixed on arm64 or its more general
problem. Will look into this further.

>  
>  static void pte_clear_tests(pte_t *ptep)
>  {
> 
>> +
>> +static void pte_clear_tests(pte_t *ptep)
>> +{
>> +	memset(ptep, RANDOM_NZVALUE, sizeof(pte_t));
>> +	pte_clear(NULL, 0, ptep);
>> +	WARN_ON(!pte_none(READ_ONCE(*ptep)));
>> +}
>> +
>> +static void pmd_clear_tests(pmd_t *pmdp)
>> +{
>> +	memset(pmdp, RANDOM_NZVALUE, sizeof(pmd_t));
>> +	pmd_clear(pmdp);
>> +	WARN_ON(!pmd_none(READ_ONCE(*pmdp)));
>> +}
>> +
>> +static void pmd_populate_tests(struct mm_struct *mm, pmd_t *pmdp,
>> +			       pgtable_t pgtable)
>> +{
>> +	/*
>> +	 * This entry points to next level page table page.
>> +	 * Hence this must not qualify as pmd_bad().
>> +	 */
>> +	pmd_clear(pmdp);
>> +	pmd_populate(mm, pmdp, pgtable);
>> +	WARN_ON(pmd_bad(READ_ONCE(*pmdp)));
>> +}
>> +
>> +static bool pfn_range_valid(struct zone *z, unsigned long start_pfn,
>> +			    unsigned long nr_pages)
>> +{
>> +	unsigned long i, end_pfn = start_pfn + nr_pages;
>> +	struct page *page;
>> +
>> +	for (i = start_pfn; i < end_pfn; i++) {
>> +		if (!pfn_valid(i))
>> +			return false;
>> +
>> +		page = pfn_to_page(i);
>> +
>> +		if (page_zone(page) != z)
>> +			return false;
>> +
>> +		if (PageReserved(page))
>> +			return false;
>> +
>> +		if (page_count(page) > 0)
>> +			return false;
>> +
>> +		if (PageHuge(page))
>> +			return false;
>> +	}
>> +	return true;
>> +}
>> +
>> +static struct page *alloc_gigantic_page(nodemask_t *nodemask,
>> +					int nid, gfp_t gfp_mask, int order)
>> +{
>> +	struct zonelist *zonelist;
>> +	struct zone *zone;
>> +	struct zoneref *z;
>> +	enum zone_type zonesel;
>> +	unsigned long ret, pfn, flags, nr_pages;
>> +
>> +	nr_pages = 1UL << order;
>> +	zonesel = gfp_zone(gfp_mask);
>> +	zonelist = node_zonelist(nid, gfp_mask);
>> +	for_each_zone_zonelist_nodemask(zone, z, zonelist, zonesel, nodemask) {
>> +		spin_lock_irqsave(&zone->lock, flags);
>> +		pfn = ALIGN(zone->zone_start_pfn, nr_pages);
>> +		while (zone_spans_pfn(zone, pfn + nr_pages - 1)) {
>> +			if (pfn_range_valid(zone, pfn, nr_pages)) {
>> +				spin_unlock_irqrestore(&zone->lock, flags);
>> +				ret = alloc_contig_range(pfn, pfn + nr_pages,
>> +							 MIGRATE_MOVABLE,
>> +							 gfp_mask);
>> +				if (!ret)
>> +					return pfn_to_page(pfn);
>> +				spin_lock_irqsave(&zone->lock, flags);
>> +			}
>> +			pfn += nr_pages;
>> +		}
>> +		spin_unlock_irqrestore(&zone->lock, flags);
>> +	}
>> +	return NULL;
>> +}
>> +
>> +static struct page *alloc_mapped_page(void)
>> +{
>> +	gfp_t gfp_mask = GFP_KERNEL | __GFP_ZERO;
>> +	struct page *page = NULL;
>> +
>> +	page = alloc_gigantic_page(&node_states[N_MEMORY], first_memory_node,
>> +				   gfp_mask, get_order(PUD_SIZE));
>> +	if (page) {
>> +		pud_aligned = true;
>> +		pmd_aligned = true;
>> +		return page;
>> +	}
>> +
>> +	page = alloc_pages(gfp_mask, get_order(PMD_SIZE));
>> +	if (page) {
>> +		pmd_aligned = true;
>> +		return page;
>> +	}
>> +	return alloc_page(gfp_mask);
>> +}
>> +
>> +static void free_mapped_page(struct page *page)
>> +{
>> +	if (pud_aligned) {
>> +		unsigned long pfn = page_to_pfn(page);
>> +
>> +		free_contig_range(pfn, 1ULL << get_order(PUD_SIZE));
>> +		return;
>> +	}
>> +
>> +	if (pmd_aligned) {
>> +		int order = get_order(PMD_SIZE);
>> +
>> +		free_pages((unsigned long)page_address(page), order);
>> +		return;
>> +	}
>> +	free_page((unsigned long)page_address(page));
>> +}
>> +
>> +static int __init arch_pgtable_tests_init(void)
>> +{
>> +	struct mm_struct *mm;
>> +	struct page *page;
>> +	pgd_t *pgdp;
>> +	p4d_t *p4dp, *saved_p4dp;
>> +	pud_t *pudp, *saved_pudp;
>> +	pmd_t *pmdp, *saved_pmdp;
>> +	pte_t *ptep, *saved_ptep;
>> +	pgprot_t prot = vm_get_page_prot(VMA_TEST_FLAGS);
>> +	unsigned long vaddr = VADDR_TEST;
>> +
>> +	mm = mm_alloc();
>> +	if (!mm) {
>> +		pr_err("mm_struct allocation failed\n");
>> +		return 1;
>> +	}
>> +
>> +	page = alloc_mapped_page();
>> +	if (!page) {
>> +		pr_err("memory allocation failed\n");
>> +		return 1;
>> +	}
>> +
>> +	pgdp = pgd_offset(mm, vaddr);
>> +	p4dp = p4d_alloc(mm, pgdp, vaddr);
>> +	pudp = pud_alloc(mm, p4dp, vaddr);
>> +	pmdp = pmd_alloc(mm, pudp, vaddr);
>> +	ptep = pte_alloc_map(mm, pmdp, vaddr);
>> +
>> +	/*
>> +	 * Save all the page table page addresses as the page table
>> +	 * entries will be used for testing with random or garbage
>> +	 * values. These saved addresses will be used for freeing
>> +	 * page table pages.
>> +	 */
>> +	saved_p4dp = p4d_offset(pgdp, 0UL);
>> +	saved_pudp = pud_offset(p4dp, 0UL);
>> +	saved_pmdp = pmd_offset(pudp, 0UL);
>> +	saved_ptep = pte_offset_map(pmdp, 0UL);
>> +
>> +	pte_basic_tests(page, prot);
>> +	pmd_basic_tests(page, prot);
>> +	pud_basic_tests(page, prot);
>> +	p4d_basic_tests(page, prot);
>> +	pgd_basic_tests(page, prot);
>> +
>> +	pte_clear_tests(ptep);
>> +	pmd_clear_tests(pmdp);
>> +	pud_clear_tests(pudp);
>> +	p4d_clear_tests(p4dp);
>> +	pgd_clear_tests(pgdp);
>> +
>> +	pmd_populate_tests(mm, pmdp, (pgtable_t) page);
> 
> This is not correct for architectures that defines pgtable_t as pte_t
> pointer, not struct page pointer.

Right, a grep on the source confirms that.

These platforms define pgtable_t as struct page *

arch/alpha/include/asm/page.h:typedef struct page *pgtable_t;
arch/arm/include/asm/page.h:typedef struct page *pgtable_t;
arch/arm64/include/asm/page.h:typedef struct page *pgtable_t;
arch/csky/include/asm/page.h:typedef struct page *pgtable_t;
arch/hexagon/include/asm/page.h:typedef struct page *pgtable_t;
arch/ia64/include/asm/page.h:  typedef struct page *pgtable_t;
arch/ia64/include/asm/page.h:    typedef struct page *pgtable_t;
arch/m68k/include/asm/page.h:typedef struct page *pgtable_t;
arch/microblaze/include/asm/page.h:typedef struct page *pgtable_t;
arch/mips/include/asm/page.h:typedef struct page *pgtable_t;
arch/nds32/include/asm/page.h:typedef struct page *pgtable_t;
arch/nios2/include/asm/page.h:typedef struct page *pgtable_t;
arch/openrisc/include/asm/page.h:typedef struct page *pgtable_t;
arch/parisc/include/asm/page.h:typedef struct page *pgtable_t;
arch/riscv/include/asm/page.h:typedef struct page *pgtable_t;
arch/sh/include/asm/page.h:typedef struct page *pgtable_t;
arch/sparc/include/asm/page_32.h:typedef struct page *pgtable_t;
arch/um/include/asm/page.h:typedef struct page *pgtable_t;
arch/unicore32/include/asm/page.h:typedef struct page *pgtable_t;
arch/x86/include/asm/pgtable_types.h:typedef struct page *pgtable_t;
arch/xtensa/include/asm/page.h:typedef struct page *pgtable_t;

These platforms define pgtable_t as pte_t *

arch/arc/include/asm/page.h:typedef pte_t * pgtable_t;
arch/powerpc/include/asm/mmu.h:typedef pte_t *pgtable_t;
arch/s390/include/asm/page.h:typedef pte_t *pgtable_t;
arch/sparc/include/asm/page_64.h:typedef pte_t *pgtable_t;

Should we need have two pmd_populate_tests() definitions with
different arguments (struct page pointer or pte_t pointer) and then
call either one after detecting the given platform ?

> 
>> +	pud_populate_tests(mm, pudp, pmdp);
>> +	p4d_populate_tests(mm, p4dp, pudp);
>> +	pgd_populate_tests(mm, pgdp, p4dp);
> 
> This is wrong. All p?dp points to the second entry in page table entry.
> This is not valid pointer for page table and triggers p?d_bad() on x86.

Yeah these are second entries because of the way we create the page table.
But I guess its applicable only to the second argument in all these above
cases because the first argument can be any valid entry on previous page
table level.

> 
> Use saved_p?dp instead.

It works on x86. Will test on arm64 and update.
Kirill A. Shutemov Sept. 5, 2019, 8:59 a.m. UTC | #6
On Thu, Sep 05, 2019 at 01:48:27PM +0530, Anshuman Khandual wrote:
> >> +#define VADDR_TEST	(PGDIR_SIZE + PUD_SIZE + PMD_SIZE + PAGE_SIZE)
> > 
> > What is special about this address? How do you know if it is not occupied
> > yet?
> 
> We are creating the page table from scratch after allocating an mm_struct
> for a given random virtual address 'VADDR_TEST'. Hence nothing is occupied
> just yet. There is nothing special about this address, just that it tries
> to ensure the page table entries are being created with some offset from
> beginning of respective page table page at all levels ? The idea is to
> have a more representative form of page table structure for test.

Why P4D_SIZE is missing?

Are you sure it will not land into kernel address space on any arch?

I think more robust way to deal with this would be using
get_unmapped_area() instead of fixed address.

> This makes sense for runtime cases but there is a problem here.
> 
> On arm64, pgd_populate() which takes (pud_t *) as last argument instead of
> (p4d_t *) will fail to build when not wrapped in !__PAGETABLE_P4D_FOLDED
> on certain configurations.
> 
> ./arch/arm64/include/asm/pgalloc.h:81:75: note:
> expected ‘pud_t *’ {aka ‘struct <anonymous> *’}
> but argument is of type ‘pgd_t *’ {aka ‘struct <anonymous> *’}
> static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgdp, pud_t *pudp)
>                                                                    ~~~~~~~^~~~
> Wondering if this is something to be fixed on arm64 or its more general
> problem. Will look into this further.

I think you need wrap this into #ifndef __ARCH_HAS_5LEVEL_HACK.

> >> +	pmd_populate_tests(mm, pmdp, (pgtable_t) page);
> > 
> > This is not correct for architectures that defines pgtable_t as pte_t
> > pointer, not struct page pointer.
> 
> Right, a grep on the source confirms that.
> 
> These platforms define pgtable_t as struct page *
> 
> arch/alpha/include/asm/page.h:typedef struct page *pgtable_t;
> arch/arm/include/asm/page.h:typedef struct page *pgtable_t;
> arch/arm64/include/asm/page.h:typedef struct page *pgtable_t;
> arch/csky/include/asm/page.h:typedef struct page *pgtable_t;
> arch/hexagon/include/asm/page.h:typedef struct page *pgtable_t;
> arch/ia64/include/asm/page.h:  typedef struct page *pgtable_t;
> arch/ia64/include/asm/page.h:    typedef struct page *pgtable_t;
> arch/m68k/include/asm/page.h:typedef struct page *pgtable_t;
> arch/microblaze/include/asm/page.h:typedef struct page *pgtable_t;
> arch/mips/include/asm/page.h:typedef struct page *pgtable_t;
> arch/nds32/include/asm/page.h:typedef struct page *pgtable_t;
> arch/nios2/include/asm/page.h:typedef struct page *pgtable_t;
> arch/openrisc/include/asm/page.h:typedef struct page *pgtable_t;
> arch/parisc/include/asm/page.h:typedef struct page *pgtable_t;
> arch/riscv/include/asm/page.h:typedef struct page *pgtable_t;
> arch/sh/include/asm/page.h:typedef struct page *pgtable_t;
> arch/sparc/include/asm/page_32.h:typedef struct page *pgtable_t;
> arch/um/include/asm/page.h:typedef struct page *pgtable_t;
> arch/unicore32/include/asm/page.h:typedef struct page *pgtable_t;
> arch/x86/include/asm/pgtable_types.h:typedef struct page *pgtable_t;
> arch/xtensa/include/asm/page.h:typedef struct page *pgtable_t;
> 
> These platforms define pgtable_t as pte_t *
> 
> arch/arc/include/asm/page.h:typedef pte_t * pgtable_t;
> arch/powerpc/include/asm/mmu.h:typedef pte_t *pgtable_t;
> arch/s390/include/asm/page.h:typedef pte_t *pgtable_t;
> arch/sparc/include/asm/page_64.h:typedef pte_t *pgtable_t;
> 
> Should we need have two pmd_populate_tests() definitions with
> different arguments (struct page pointer or pte_t pointer) and then
> call either one after detecting the given platform ?

Use pte_alloc_one() instead of alloc_mapped_page() to allocate the page
table.

> >> +	pud_populate_tests(mm, pudp, pmdp);
> >> +	p4d_populate_tests(mm, p4dp, pudp);
> >> +	pgd_populate_tests(mm, pgdp, p4dp);
> > 
> > This is wrong. All p?dp points to the second entry in page table entry.
> > This is not valid pointer for page table and triggers p?d_bad() on x86.
> 
> Yeah these are second entries because of the way we create the page table.
> But I guess its applicable only to the second argument in all these above
> cases because the first argument can be any valid entry on previous page
> table level.

Yes:

@@ -397,9 +396,9 @@ static int __init arch_pgtable_tests_init(void)
 	pgd_clear_tests(pgdp);
 
 	pmd_populate_tests(mm, pmdp, (pgtable_t) page);
-	pud_populate_tests(mm, pudp, pmdp);
-	p4d_populate_tests(mm, p4dp, pudp);
-	pgd_populate_tests(mm, pgdp, p4dp);
+	pud_populate_tests(mm, pudp, saved_pmdp);
+	p4d_populate_tests(mm, p4dp, saved_pudp);
+	pgd_populate_tests(mm, pgdp, saved_p4dp);
 
 	p4d_free(mm, saved_p4dp);
 	pud_free(mm, saved_pudp);
Anshuman Khandual Sept. 5, 2019, 9:18 a.m. UTC | #7
On 09/05/2019 01:46 AM, Gerald Schaefer wrote:
> On Tue,  3 Sep 2019 13:31:46 +0530
> Anshuman Khandual <anshuman.khandual@arm.com> wrote:
> 
>> This adds a test module which will validate architecture page table helpers
>> and accessors regarding compliance with generic MM semantics expectations.
>> This will help various architectures in validating changes to the existing
>> page table helpers or addition of new ones.
>>
>> Test page table and memory pages creating it's entries at various level are
>> all allocated from system memory with required alignments. If memory pages
>> with required size and alignment could not be allocated, then all depending
>> individual tests are skipped.
> 
> This looks very useful, thanks. Of course, s390 is quite special and does
> not work nicely with this patch (yet), mostly because of our dynamic page
> table levels/folding. Still need to figure out what can be fixed in the arch

Hmm.

> code and what would need to be changed in the test module. See below for some
> generic comments/questions.

Sure.

> 
> At least one real bug in the s390 code was already revealed by this, which
> is very nice. In pmd/pud_bad(), we also check large pmds/puds for sanity,
> instead of reporting them as bad, which is apparently not how it is expected.

Hmm, so it has already started being useful :)

> 
> [...]
>> +/*
>> + * Basic operations
>> + *
>> + * mkold(entry)			= An old and not a young entry
>> + * mkyoung(entry)		= A young and not an old entry
>> + * mkdirty(entry)		= A dirty and not a clean entry
>> + * mkclean(entry)		= A clean and not a dirty entry
>> + * mkwrite(entry)		= A write and not a write protected entry
>> + * wrprotect(entry)		= A write protected and not a write entry
>> + * pxx_bad(entry)		= A mapped and non-table entry
>> + * pxx_same(entry1, entry2)	= Both entries hold the exact same value
>> + */
>> +#define VADDR_TEST	(PGDIR_SIZE + PUD_SIZE + PMD_SIZE + PAGE_SIZE)
> 
> Why is P4D_SIZE missing in the VADDR_TEST calculation?

This was a random possible virtual address to generate a representative
page table structure for the test. As there is a default (PGDIR_SIZE) for
P4D_SIZE on platforms which really do not have P4D level, it should be okay
to add P4D_SIZE in the above calculation.

> 
> [...]
>> +
>> +#if !defined(__PAGETABLE_PMD_FOLDED) && !defined(__ARCH_HAS_4LEVEL_HACK)
>> +static void pud_clear_tests(pud_t *pudp)
>> +{
>> +	memset(pudp, RANDOM_NZVALUE, sizeof(pud_t));
>> +	pud_clear(pudp);
>> +	WARN_ON(!pud_none(READ_ONCE(*pudp)));
>> +}
> 
> For pgd/p4d/pud_clear(), we only clear if the page table level is present
> and not folded. The memset() here overwrites the table type bits, so
> pud_clear() will not clear anything on s390 and the pud_none() check will
> fail.
> Would it be possible to OR a (larger) random value into the table, so that
> the lower 12 bits would be preserved?

So the suggestion is instead of doing memset() on entry with RANDOM_NZVALUE,
it should OR a large random value preserving lower 12 bits. Hmm, this should
still do the trick for other platforms, they just need non zero value. So on
s390, the lower 12 bits on the page table entry already has valid value while
entering this function which would make sure that pud_clear() really does
clear the entry ?

> 
>> +
>> +static void pud_populate_tests(struct mm_struct *mm, pud_t *pudp, pmd_t *pmdp)
>> +{
>> +	/*
>> +	 * This entry points to next level page table page.
>> +	 * Hence this must not qualify as pud_bad().
>> +	 */
>> +	pmd_clear(pmdp);
>> +	pud_clear(pudp);
>> +	pud_populate(mm, pudp, pmdp);
>> +	WARN_ON(pud_bad(READ_ONCE(*pudp)));
>> +}
> 
> This will populate the pud with a pmd pointer that does not point to the
> beginning of the pmd table, but to the second entry (because of how
> VADDR_TEST is constructed). This will result in failing pud_bad() check
> on s390. Not sure why/how it works on other archs, but would it be possible
> to align pmdp down to the beginning of the pmd table (and similar for the
> other pxd_populate_tests)?

Right, that was a problem. Will fix it by using the saved entries used for
freeing the page table pages at the end, which always point to the beginning
of a page table page.

> 
> [...]
>> +
>> +	p4d_free(mm, saved_p4dp);
>> +	pud_free(mm, saved_pudp);
>> +	pmd_free(mm, saved_pmdp);
>> +	pte_free(mm, (pgtable_t) virt_to_page(saved_ptep));
> 
> pgtable_t is arch-specific, and on s390 it is not a struct page pointer,
> but a pte pointer. So this will go wrong, also on all other archs (if any)
> where pgtable_t is not struct page.
> Would it be possible to use pte_free_kernel() instead, and just pass
> saved_ptep directly?

It makes sense, will change.
Gerald Schaefer Sept. 5, 2019, 5:06 p.m. UTC | #8
On Thu, 5 Sep 2019 14:48:14 +0530
Anshuman Khandual <anshuman.khandual@arm.com> wrote:

> > [...]  
> >> +
> >> +#if !defined(__PAGETABLE_PMD_FOLDED) && !defined(__ARCH_HAS_4LEVEL_HACK)
> >> +static void pud_clear_tests(pud_t *pudp)
> >> +{
> >> +	memset(pudp, RANDOM_NZVALUE, sizeof(pud_t));
> >> +	pud_clear(pudp);
> >> +	WARN_ON(!pud_none(READ_ONCE(*pudp)));
> >> +}  
> > 
> > For pgd/p4d/pud_clear(), we only clear if the page table level is present
> > and not folded. The memset() here overwrites the table type bits, so
> > pud_clear() will not clear anything on s390 and the pud_none() check will
> > fail.
> > Would it be possible to OR a (larger) random value into the table, so that
> > the lower 12 bits would be preserved?  
> 
> So the suggestion is instead of doing memset() on entry with RANDOM_NZVALUE,
> it should OR a large random value preserving lower 12 bits. Hmm, this should
> still do the trick for other platforms, they just need non zero value. So on
> s390, the lower 12 bits on the page table entry already has valid value while
> entering this function which would make sure that pud_clear() really does
> clear the entry ?

Yes, in theory the table entry on s390 would have the type set in the last
4 bits, so preserving those would be enough. If it does not conflict with
others, I would still suggest preserving all 12 bits since those would contain
arch-specific flags in general, just to be sure. For s390, the pte/pmd tests
would also work with the memset, but for consistency I think the same logic
should be used in all pxd_clear_tests.

However, there is another issue on s390 which will make this only work
for pud_clear_tests(), and not for the p4d/pgd_tests. The problem is that
mm_alloc() will only give you a 3-level page table initially on s390.
This means that pudp == p4dp == pgdp, and so the p4d/pgd_tests will
both see the pud level (of course this also affects other tests).

Not sure yet how to fix this, i.e. how to initialize/update the page table
to 5 levels. We can handle 5 level page tables, and it would be good if
all levels could be tested, but using mm_alloc() to establish the page
tables might not work on s390. One option could be to provide an arch-hook
or weak function to allocate/initialize the mm.

IIUC, the (dummy) mm is really only needed to provide an mm->pgd as starting
point, right?

Regards,
Gerald
Anshuman Khandual Sept. 6, 2019, 6:28 a.m. UTC | #9
On 09/05/2019 10:36 PM, Gerald Schaefer wrote:
> On Thu, 5 Sep 2019 14:48:14 +0530
> Anshuman Khandual <anshuman.khandual@arm.com> wrote:
> 
>>> [...]  
>>>> +
>>>> +#if !defined(__PAGETABLE_PMD_FOLDED) && !defined(__ARCH_HAS_4LEVEL_HACK)
>>>> +static void pud_clear_tests(pud_t *pudp)
>>>> +{
>>>> +	memset(pudp, RANDOM_NZVALUE, sizeof(pud_t));
>>>> +	pud_clear(pudp);
>>>> +	WARN_ON(!pud_none(READ_ONCE(*pudp)));
>>>> +}  
>>>
>>> For pgd/p4d/pud_clear(), we only clear if the page table level is present
>>> and not folded. The memset() here overwrites the table type bits, so
>>> pud_clear() will not clear anything on s390 and the pud_none() check will
>>> fail.
>>> Would it be possible to OR a (larger) random value into the table, so that
>>> the lower 12 bits would be preserved?  
>>
>> So the suggestion is instead of doing memset() on entry with RANDOM_NZVALUE,
>> it should OR a large random value preserving lower 12 bits. Hmm, this should
>> still do the trick for other platforms, they just need non zero value. So on
>> s390, the lower 12 bits on the page table entry already has valid value while
>> entering this function which would make sure that pud_clear() really does
>> clear the entry ?
> 
> Yes, in theory the table entry on s390 would have the type set in the last
> 4 bits, so preserving those would be enough. If it does not conflict with
> others, I would still suggest preserving all 12 bits since those would contain
> arch-specific flags in general, just to be sure. For s390, the pte/pmd tests
> would also work with the memset, but for consistency I think the same logic
> should be used in all pxd_clear_tests.

Makes sense but..

There is a small challenge with this. Modifying individual bits on a given
page table entry from generic code like this test case is bit tricky. That
is because there are not enough helpers to create entries with an absolute
value. This would have been easier if all the platforms provided functions
like __pxx() which is not the case now. Otherwise something like this should
have worked.


pud_t pud = READ_ONCE(*pudp);
pud = __pud(pud_val(pud) | RANDOM_VALUE (keeping lower 12 bits 0))
WRITE_ONCE(*pudp, pud);

But __pud() will fail to build in many platforms.

The other alternative will be to make sure memset() happens on all other
bits except the lower 12 bits which will depend on endianness. If s390
has a fixed endianness, we can still use either of them which will hold
good for others as well.

memset(pudp, RANDOM_NZVALUE, sizeof(pud_t) - 3);

OR

memset(pudp + 3, RANDOM_NZVALUE, sizeof(pud_t) - 3);

> 
> However, there is another issue on s390 which will make this only work
> for pud_clear_tests(), and not for the p4d/pgd_tests. The problem is that
> mm_alloc() will only give you a 3-level page table initially on s390.
> This means that pudp == p4dp == pgdp, and so the p4d/pgd_tests will
> both see the pud level (of course this also affects other tests).

Got it.

> 
> Not sure yet how to fix this, i.e. how to initialize/update the page table
> to 5 levels. We can handle 5 level page tables, and it would be good if
> all levels could be tested, but using mm_alloc() to establish the page
> tables might not work on s390. One option could be to provide an arch-hook
> or weak function to allocate/initialize the mm.

Sure, got it. Though I plan to do add some arch specific tests or init sequence
like the above later on but for now the idea is to get the smallest possible set
of test cases which builds and runs on all platforms without requiring any arch
specific hooks or special casing (#ifdef) to be agreed upon broadly and accepted.

Do you think this is absolutely necessary on s390 for the very first set of test
cases or we can add this later on as an improvement ?

> 
> IIUC, the (dummy) mm is really only needed to provide an mm->pgd as starting
> point, right?

Right.
Anshuman Khandual Sept. 6, 2019, 7:03 a.m. UTC | #10
On 09/05/2019 02:29 PM, Kirill A. Shutemov wrote:
> On Thu, Sep 05, 2019 at 01:48:27PM +0530, Anshuman Khandual wrote:
>>>> +#define VADDR_TEST	(PGDIR_SIZE + PUD_SIZE + PMD_SIZE + PAGE_SIZE)
>>>
>>> What is special about this address? How do you know if it is not occupied
>>> yet?
>>
>> We are creating the page table from scratch after allocating an mm_struct
>> for a given random virtual address 'VADDR_TEST'. Hence nothing is occupied
>> just yet. There is nothing special about this address, just that it tries
>> to ensure the page table entries are being created with some offset from
>> beginning of respective page table page at all levels ? The idea is to
>> have a more representative form of page table structure for test.
> 
> Why P4D_SIZE is missing?

That was an omission even though I was wondering whether it will be
applicable or even make sense on platforms which dont have real P4D.

> 
> Are you sure it will not land into kernel address space on any arch?

Can it even cross user virtual address range with just a single span
at each page table level ? TBH I did not think about that possibility.

> 
> I think more robust way to deal with this would be using
> get_unmapped_area() instead of fixed address.

Makes sense and probably its better to get a virtual address which
is known to have been checked against all boundary conditions. Will
explore around get_unmapped_area() in this regard.

> 
>> This makes sense for runtime cases but there is a problem here.
>>
>> On arm64, pgd_populate() which takes (pud_t *) as last argument instead of
>> (p4d_t *) will fail to build when not wrapped in !__PAGETABLE_P4D_FOLDED
>> on certain configurations.
>>
>> ./arch/arm64/include/asm/pgalloc.h:81:75: note:
>> expected ‘pud_t *’ {aka ‘struct <anonymous> *’}
>> but argument is of type ‘pgd_t *’ {aka ‘struct <anonymous> *’}
>> static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgdp, pud_t *pudp)
>>                                                                    ~~~~~~~^~~~
>> Wondering if this is something to be fixed on arm64 or its more general
>> problem. Will look into this further.
> 
> I think you need wrap this into #ifndef __ARCH_HAS_5LEVEL_HACK.

Okay.

> 
>>>> +	pmd_populate_tests(mm, pmdp, (pgtable_t) page);
>>>
>>> This is not correct for architectures that defines pgtable_t as pte_t
>>> pointer, not struct page pointer.
>>
>> Right, a grep on the source confirms that.
>>
>> These platforms define pgtable_t as struct page *
>>
>> arch/alpha/include/asm/page.h:typedef struct page *pgtable_t;
>> arch/arm/include/asm/page.h:typedef struct page *pgtable_t;
>> arch/arm64/include/asm/page.h:typedef struct page *pgtable_t;
>> arch/csky/include/asm/page.h:typedef struct page *pgtable_t;
>> arch/hexagon/include/asm/page.h:typedef struct page *pgtable_t;
>> arch/ia64/include/asm/page.h:  typedef struct page *pgtable_t;
>> arch/ia64/include/asm/page.h:    typedef struct page *pgtable_t;
>> arch/m68k/include/asm/page.h:typedef struct page *pgtable_t;
>> arch/microblaze/include/asm/page.h:typedef struct page *pgtable_t;
>> arch/mips/include/asm/page.h:typedef struct page *pgtable_t;
>> arch/nds32/include/asm/page.h:typedef struct page *pgtable_t;
>> arch/nios2/include/asm/page.h:typedef struct page *pgtable_t;
>> arch/openrisc/include/asm/page.h:typedef struct page *pgtable_t;
>> arch/parisc/include/asm/page.h:typedef struct page *pgtable_t;
>> arch/riscv/include/asm/page.h:typedef struct page *pgtable_t;
>> arch/sh/include/asm/page.h:typedef struct page *pgtable_t;
>> arch/sparc/include/asm/page_32.h:typedef struct page *pgtable_t;
>> arch/um/include/asm/page.h:typedef struct page *pgtable_t;
>> arch/unicore32/include/asm/page.h:typedef struct page *pgtable_t;
>> arch/x86/include/asm/pgtable_types.h:typedef struct page *pgtable_t;
>> arch/xtensa/include/asm/page.h:typedef struct page *pgtable_t;
>>
>> These platforms define pgtable_t as pte_t *
>>
>> arch/arc/include/asm/page.h:typedef pte_t * pgtable_t;
>> arch/powerpc/include/asm/mmu.h:typedef pte_t *pgtable_t;
>> arch/s390/include/asm/page.h:typedef pte_t *pgtable_t;
>> arch/sparc/include/asm/page_64.h:typedef pte_t *pgtable_t;
>>
>> Should we need have two pmd_populate_tests() definitions with
>> different arguments (struct page pointer or pte_t pointer) and then
>> call either one after detecting the given platform ?
> 
> Use pte_alloc_one() instead of alloc_mapped_page() to allocate the page
> table.

Right, the PTE page table page should come from pte_alloc_one() instead
directly from a struct page. The functions pte_alloc_one() and pte_free()
operate on (struct page or pte_t) pointers depending applicable pgtable_t
definition (in cases where platform defines otherwise). Will fix it.

> 
>>>> +	pud_populate_tests(mm, pudp, pmdp);
>>>> +	p4d_populate_tests(mm, p4dp, pudp);
>>>> +	pgd_populate_tests(mm, pgdp, p4dp);
>>>
>>> This is wrong. All p?dp points to the second entry in page table entry.
>>> This is not valid pointer for page table and triggers p?d_bad() on x86.
>>
>> Yeah these are second entries because of the way we create the page table.
>> But I guess its applicable only to the second argument in all these above
>> cases because the first argument can be any valid entry on previous page
>> table level.
> 
> Yes:
> 
> @@ -397,9 +396,9 @@ static int __init arch_pgtable_tests_init(void)
>  	pgd_clear_tests(pgdp);
>  
>  	pmd_populate_tests(mm, pmdp, (pgtable_t) page);
> -	pud_populate_tests(mm, pudp, pmdp);
> -	p4d_populate_tests(mm, p4dp, pudp);
> -	pgd_populate_tests(mm, pgdp, p4dp);
> +	pud_populate_tests(mm, pudp, saved_pmdp);
> +	p4d_populate_tests(mm, p4dp, saved_pudp);
> +	pgd_populate_tests(mm, pgdp, saved_p4dp);

Sure.
Gerald Schaefer Sept. 6, 2019, 7:03 p.m. UTC | #11
On Fri, 6 Sep 2019 11:58:59 +0530
Anshuman Khandual <anshuman.khandual@arm.com> wrote:

> On 09/05/2019 10:36 PM, Gerald Schaefer wrote:
> > On Thu, 5 Sep 2019 14:48:14 +0530
> > Anshuman Khandual <anshuman.khandual@arm.com> wrote:
> >   
> >>> [...]    
> >>>> +
> >>>> +#if !defined(__PAGETABLE_PMD_FOLDED) && !defined(__ARCH_HAS_4LEVEL_HACK)
> >>>> +static void pud_clear_tests(pud_t *pudp)
> >>>> +{
> >>>> +	memset(pudp, RANDOM_NZVALUE, sizeof(pud_t));
> >>>> +	pud_clear(pudp);
> >>>> +	WARN_ON(!pud_none(READ_ONCE(*pudp)));
> >>>> +}    
> >>>
> >>> For pgd/p4d/pud_clear(), we only clear if the page table level is present
> >>> and not folded. The memset() here overwrites the table type bits, so
> >>> pud_clear() will not clear anything on s390 and the pud_none() check will
> >>> fail.
> >>> Would it be possible to OR a (larger) random value into the table, so that
> >>> the lower 12 bits would be preserved?    
> >>
> >> So the suggestion is instead of doing memset() on entry with RANDOM_NZVALUE,
> >> it should OR a large random value preserving lower 12 bits. Hmm, this should
> >> still do the trick for other platforms, they just need non zero value. So on
> >> s390, the lower 12 bits on the page table entry already has valid value while
> >> entering this function which would make sure that pud_clear() really does
> >> clear the entry ?  
> > 
> > Yes, in theory the table entry on s390 would have the type set in the last
> > 4 bits, so preserving those would be enough. If it does not conflict with
> > others, I would still suggest preserving all 12 bits since those would contain
> > arch-specific flags in general, just to be sure. For s390, the pte/pmd tests
> > would also work with the memset, but for consistency I think the same logic
> > should be used in all pxd_clear_tests.  
> 
> Makes sense but..
> 
> There is a small challenge with this. Modifying individual bits on a given
> page table entry from generic code like this test case is bit tricky. That
> is because there are not enough helpers to create entries with an absolute
> value. This would have been easier if all the platforms provided functions
> like __pxx() which is not the case now. Otherwise something like this should
> have worked.
> 
> 
> pud_t pud = READ_ONCE(*pudp);
> pud = __pud(pud_val(pud) | RANDOM_VALUE (keeping lower 12 bits 0))
> WRITE_ONCE(*pudp, pud);
> 
> But __pud() will fail to build in many platforms.

Hmm, I simply used this on my system to make pud_clear_tests() work, not
sure if it works on all archs:

pud_val(*pudp) |= RANDOM_NZVALUE;

> 
> The other alternative will be to make sure memset() happens on all other
> bits except the lower 12 bits which will depend on endianness. If s390
> has a fixed endianness, we can still use either of them which will hold
> good for others as well.
> 
> memset(pudp, RANDOM_NZVALUE, sizeof(pud_t) - 3);
> 
> OR
> 
> memset(pudp + 3, RANDOM_NZVALUE, sizeof(pud_t) - 3);
> 
> > 
> > However, there is another issue on s390 which will make this only work
> > for pud_clear_tests(), and not for the p4d/pgd_tests. The problem is that
> > mm_alloc() will only give you a 3-level page table initially on s390.
> > This means that pudp == p4dp == pgdp, and so the p4d/pgd_tests will
> > both see the pud level (of course this also affects other tests).  
> 
> Got it.
> 
> > 
> > Not sure yet how to fix this, i.e. how to initialize/update the page table
> > to 5 levels. We can handle 5 level page tables, and it would be good if
> > all levels could be tested, but using mm_alloc() to establish the page
> > tables might not work on s390. One option could be to provide an arch-hook
> > or weak function to allocate/initialize the mm.  
> 
> Sure, got it. Though I plan to do add some arch specific tests or init sequence
> like the above later on but for now the idea is to get the smallest possible set
> of test cases which builds and runs on all platforms without requiring any arch
> specific hooks or special casing (#ifdef) to be agreed upon broadly and accepted.
> 
> Do you think this is absolutely necessary on s390 for the very first set of test
> cases or we can add this later on as an improvement ?

It can be added later, no problem. I did not expect this to work flawlessly
on s390 right from the start anyway, with all our peculiarities, so don't
let this hinder you. I might come up with an add-on patch later.

Actually, using get_unmapped_area() as suggested by Kirill could also
solve this issue. We do create a new mm with 3-level page tables on s390,
and the dynamic upgrade to 4 or 5 levels is then triggered exactly by
arch_get_unmapped_area(), depending on the addr. But I currently don't
see how / where arch_get_unmapped_area() is set up for such a dummy mm
created by mm_alloc().

Regards,
Gerald
Anshuman Khandual Sept. 9, 2019, 6:26 a.m. UTC | #12
On 09/07/2019 12:33 AM, Gerald Schaefer wrote:
> On Fri, 6 Sep 2019 11:58:59 +0530
> Anshuman Khandual <anshuman.khandual@arm.com> wrote:
> 
>> On 09/05/2019 10:36 PM, Gerald Schaefer wrote:
>>> On Thu, 5 Sep 2019 14:48:14 +0530
>>> Anshuman Khandual <anshuman.khandual@arm.com> wrote:
>>>   
>>>>> [...]    
>>>>>> +
>>>>>> +#if !defined(__PAGETABLE_PMD_FOLDED) && !defined(__ARCH_HAS_4LEVEL_HACK)
>>>>>> +static void pud_clear_tests(pud_t *pudp)
>>>>>> +{
>>>>>> +	memset(pudp, RANDOM_NZVALUE, sizeof(pud_t));
>>>>>> +	pud_clear(pudp);
>>>>>> +	WARN_ON(!pud_none(READ_ONCE(*pudp)));
>>>>>> +}    
>>>>>
>>>>> For pgd/p4d/pud_clear(), we only clear if the page table level is present
>>>>> and not folded. The memset() here overwrites the table type bits, so
>>>>> pud_clear() will not clear anything on s390 and the pud_none() check will
>>>>> fail.
>>>>> Would it be possible to OR a (larger) random value into the table, so that
>>>>> the lower 12 bits would be preserved?    
>>>>
>>>> So the suggestion is instead of doing memset() on entry with RANDOM_NZVALUE,
>>>> it should OR a large random value preserving lower 12 bits. Hmm, this should
>>>> still do the trick for other platforms, they just need non zero value. So on
>>>> s390, the lower 12 bits on the page table entry already has valid value while
>>>> entering this function which would make sure that pud_clear() really does
>>>> clear the entry ?  
>>>
>>> Yes, in theory the table entry on s390 would have the type set in the last
>>> 4 bits, so preserving those would be enough. If it does not conflict with
>>> others, I would still suggest preserving all 12 bits since those would contain
>>> arch-specific flags in general, just to be sure. For s390, the pte/pmd tests
>>> would also work with the memset, but for consistency I think the same logic
>>> should be used in all pxd_clear_tests.  
>>
>> Makes sense but..
>>
>> There is a small challenge with this. Modifying individual bits on a given
>> page table entry from generic code like this test case is bit tricky. That
>> is because there are not enough helpers to create entries with an absolute
>> value. This would have been easier if all the platforms provided functions
>> like __pxx() which is not the case now. Otherwise something like this should
>> have worked.
>>
>>
>> pud_t pud = READ_ONCE(*pudp);
>> pud = __pud(pud_val(pud) | RANDOM_VALUE (keeping lower 12 bits 0))
>> WRITE_ONCE(*pudp, pud);
>>
>> But __pud() will fail to build in many platforms.
> 
> Hmm, I simply used this on my system to make pud_clear_tests() work, not
> sure if it works on all archs:
> 
> pud_val(*pudp) |= RANDOM_NZVALUE;

Which compiles on arm64 but then fails on x86 because of the way pmd_val()
has been defined there. on arm64 and s390 (with many others) pmd_val() is
a macro which still got the variable that can be used as lvalue but that is
not true for some other platforms like x86.

arch/arm64/include/asm/pgtable-types.h:	#define pmd_val(x)	((x).pmd)
arch/s390/include/asm/page.h:		#define pmd_val(x)	((x).pmd)
arch/x86/include/asm/pgtable.h:		#define pmd_val(x)       native_pmd_val(x)

static inline pmdval_t native_pmd_val(pmd_t pmd)
{
        return pmd.pmd;
}

Unless I am mistaken, the return value from this function can not be used as
lvalue for future assignments.

mm/arch_pgtable_test.c: In function ‘pud_clear_tests’:
mm/arch_pgtable_test.c:156:17: error: lvalue required as left operand of assignment
  pud_val(*pudp) |= RANDOM_ORVALUE;
                 ^~
AFAICS pxx_val() were never intended to be used as lvalue and using it that way
might just happen to work on all those platforms which define them as macros.
They meant to just provide values for an entry as being determined by the platform.

In principle pxx_val() on an entry was not supposed to be modified directly from
generic code without going through (again) platform helpers for any specific state
change (write, old, dirty, special, huge etc). The current use case is a deviation
for that.

I originally went with memset() just to load up the entries with non-zero value so
that we know pxx_clear() are really doing the clearing. The same is being followed
for all pxx_same() checks.

Another way for fixing the problem would be to mark them with known attributes
like write/young/huge etc instead which for sure will create non-zero entries.
We can do that for pxx_clear() and pxx_same() tests and drop RANDOM_NZVALUE
completely. Does that sound good ?

> 
>>
>> The other alternative will be to make sure memset() happens on all other
>> bits except the lower 12 bits which will depend on endianness. If s390
>> has a fixed endianness, we can still use either of them which will hold
>> good for others as well.
>>
>> memset(pudp, RANDOM_NZVALUE, sizeof(pud_t) - 3);
>>
>> OR
>>
>> memset(pudp + 3, RANDOM_NZVALUE, sizeof(pud_t) - 3);
>>
>>>
>>> However, there is another issue on s390 which will make this only work
>>> for pud_clear_tests(), and not for the p4d/pgd_tests. The problem is that
>>> mm_alloc() will only give you a 3-level page table initially on s390.
>>> This means that pudp == p4dp == pgdp, and so the p4d/pgd_tests will
>>> both see the pud level (of course this also affects other tests).  
>>
>> Got it.
>>
>>>
>>> Not sure yet how to fix this, i.e. how to initialize/update the page table
>>> to 5 levels. We can handle 5 level page tables, and it would be good if
>>> all levels could be tested, but using mm_alloc() to establish the page
>>> tables might not work on s390. One option could be to provide an arch-hook
>>> or weak function to allocate/initialize the mm.  
>>
>> Sure, got it. Though I plan to do add some arch specific tests or init sequence
>> like the above later on but for now the idea is to get the smallest possible set
>> of test cases which builds and runs on all platforms without requiring any arch
>> specific hooks or special casing (#ifdef) to be agreed upon broadly and accepted.
>>
>> Do you think this is absolutely necessary on s390 for the very first set of test
>> cases or we can add this later on as an improvement ?
> 
> It can be added later, no problem. I did not expect this to work flawlessly
> on s390 right from the start anyway, with all our peculiarities, so don't
> let this hinder you. I might come up with an add-on patch later.

Sure.

> 
> Actually, using get_unmapped_area() as suggested by Kirill could also
> solve this issue. We do create a new mm with 3-level page tables on s390,
> and the dynamic upgrade to 4 or 5 levels is then triggered exactly by
> arch_get_unmapped_area(), depending on the addr. But I currently don't
> see how / where arch_get_unmapped_area() is set up for such a dummy mm
> created by mm_alloc().

Normally they are set during program loading but we can set it up explicitly
for the test mm_struct if we need to but there are some other challenges.

load_[aout|elf|flat|..]_binary()
	setup_new_exec()
		arch_pick_mmap_layout().

I did some initial experiments around get_unmapped_area(). Seems bit tricky
to get it working on a pure 'test' mm_struct. It expects a real user context
in the form of current->mm.

get_unmapped_area()
{
	....
	get_area = current->mm->get_unmapped_area;
	....
	addr = get_area(file, addr, len, pgoff, flags); {
		....
		struct mm_struct *mm = current->mm;
		....
		if (addr) {
			...
			vma = find_vma_prev(mm, addr, &prev);
		}
		....
		vm_unmapped_area() {
			struct mm_struct *mm = current->mm;
			....
			/* Walks across mm->mm_rb.rb_node */
		}
	}
	....
}	

Simple call like get_unmapped_area(NULL, 0, PAGE_SIZE, 0, 0) to get an
address fails right away on current->mm->get_unmapped_area which does
not have a valid value in the kernel context.

There might be two methods to get around this problem

1) Write a custom get_unmapped_area() imitating the real one but going
   around the problem by taking an appropriately initialized mm_struct
   instead of current->mm.

2) Create dummy user task with dummy mm, switch 'current' context before
   calling into get_unmapped_area() and switch back again. Dont know if
   this is even possible.

Wondering if this might deviate too much from the original goal of
testing the page table helpers.

Looking back again at the proposed test vaddr, wondering what will be the
real problem in case it goes beyond user address range ? Will pxx_alloc()
fail to create page table ranges at required level ? Apart from skipping
pgtable_page_ctor/dtor for page table pages, it might not really affect
any helpers as such.

VADDR_TEST (PGDIR_SIZE + [P4D_SIZE] + PUD_SIZE + PMD_SIZE + PAGE_SIZE)

OR

A random page aligned address in [FIRST_USER_ADDRESS..TASK_SIZE] range ?
Kirill A. Shutemov Sept. 9, 2019, 3:13 p.m. UTC | #13
On Mon, Sep 09, 2019 at 11:56:50AM +0530, Anshuman Khandual wrote:
> 
> 
> On 09/07/2019 12:33 AM, Gerald Schaefer wrote:
> > On Fri, 6 Sep 2019 11:58:59 +0530
> > Anshuman Khandual <anshuman.khandual@arm.com> wrote:
> > 
> >> On 09/05/2019 10:36 PM, Gerald Schaefer wrote:
> >>> On Thu, 5 Sep 2019 14:48:14 +0530
> >>> Anshuman Khandual <anshuman.khandual@arm.com> wrote:
> >>>   
> >>>>> [...]    
> >>>>>> +
> >>>>>> +#if !defined(__PAGETABLE_PMD_FOLDED) && !defined(__ARCH_HAS_4LEVEL_HACK)
> >>>>>> +static void pud_clear_tests(pud_t *pudp)
> >>>>>> +{
> >>>>>> +	memset(pudp, RANDOM_NZVALUE, sizeof(pud_t));
> >>>>>> +	pud_clear(pudp);
> >>>>>> +	WARN_ON(!pud_none(READ_ONCE(*pudp)));
> >>>>>> +}    
> >>>>>
> >>>>> For pgd/p4d/pud_clear(), we only clear if the page table level is present
> >>>>> and not folded. The memset() here overwrites the table type bits, so
> >>>>> pud_clear() will not clear anything on s390 and the pud_none() check will
> >>>>> fail.
> >>>>> Would it be possible to OR a (larger) random value into the table, so that
> >>>>> the lower 12 bits would be preserved?    
> >>>>
> >>>> So the suggestion is instead of doing memset() on entry with RANDOM_NZVALUE,
> >>>> it should OR a large random value preserving lower 12 bits. Hmm, this should
> >>>> still do the trick for other platforms, they just need non zero value. So on
> >>>> s390, the lower 12 bits on the page table entry already has valid value while
> >>>> entering this function which would make sure that pud_clear() really does
> >>>> clear the entry ?  
> >>>
> >>> Yes, in theory the table entry on s390 would have the type set in the last
> >>> 4 bits, so preserving those would be enough. If it does not conflict with
> >>> others, I would still suggest preserving all 12 bits since those would contain
> >>> arch-specific flags in general, just to be sure. For s390, the pte/pmd tests
> >>> would also work with the memset, but for consistency I think the same logic
> >>> should be used in all pxd_clear_tests.  
> >>
> >> Makes sense but..
> >>
> >> There is a small challenge with this. Modifying individual bits on a given
> >> page table entry from generic code like this test case is bit tricky. That
> >> is because there are not enough helpers to create entries with an absolute
> >> value. This would have been easier if all the platforms provided functions
> >> like __pxx() which is not the case now. Otherwise something like this should
> >> have worked.
> >>
> >>
> >> pud_t pud = READ_ONCE(*pudp);
> >> pud = __pud(pud_val(pud) | RANDOM_VALUE (keeping lower 12 bits 0))
> >> WRITE_ONCE(*pudp, pud);
> >>
> >> But __pud() will fail to build in many platforms.
> > 
> > Hmm, I simply used this on my system to make pud_clear_tests() work, not
> > sure if it works on all archs:
> > 
> > pud_val(*pudp) |= RANDOM_NZVALUE;
> 
> Which compiles on arm64 but then fails on x86 because of the way pmd_val()
> has been defined there.

Use instead

	*pudp = __pud(pud_val(*pudp) | RANDOM_NZVALUE);

It *should* be more portable.
Gerald Schaefer Sept. 9, 2019, 4:51 p.m. UTC | #14
On Mon, 9 Sep 2019 11:56:50 +0530
Anshuman Khandual <anshuman.khandual@arm.com> wrote:

[..]
> > 
> > Hmm, I simply used this on my system to make pud_clear_tests() work, not
> > sure if it works on all archs:
> > 
> > pud_val(*pudp) |= RANDOM_NZVALUE;  
> 
> Which compiles on arm64 but then fails on x86 because of the way pmd_val()
> has been defined there. on arm64 and s390 (with many others) pmd_val() is
> a macro which still got the variable that can be used as lvalue but that is
> not true for some other platforms like x86.
> 
> arch/arm64/include/asm/pgtable-types.h:	#define pmd_val(x)	((x).pmd)
> arch/s390/include/asm/page.h:		#define pmd_val(x)	((x).pmd)
> arch/x86/include/asm/pgtable.h:		#define pmd_val(x)       native_pmd_val(x)
> 
> static inline pmdval_t native_pmd_val(pmd_t pmd)
> {
>         return pmd.pmd;
> }
> 
> Unless I am mistaken, the return value from this function can not be used as
> lvalue for future assignments.
> 
> mm/arch_pgtable_test.c: In function ‘pud_clear_tests’:
> mm/arch_pgtable_test.c:156:17: error: lvalue required as left operand of assignment
>   pud_val(*pudp) |= RANDOM_ORVALUE;
>                  ^~
> AFAICS pxx_val() were never intended to be used as lvalue and using it that way
> might just happen to work on all those platforms which define them as macros.
> They meant to just provide values for an entry as being determined by the platform.
> 
> In principle pxx_val() on an entry was not supposed to be modified directly from
> generic code without going through (again) platform helpers for any specific state
> change (write, old, dirty, special, huge etc). The current use case is a deviation
> for that.
> 
> I originally went with memset() just to load up the entries with non-zero value so
> that we know pxx_clear() are really doing the clearing. The same is being followed
> for all pxx_same() checks.
> 
> Another way for fixing the problem would be to mark them with known attributes
> like write/young/huge etc instead which for sure will create non-zero entries.
> We can do that for pxx_clear() and pxx_same() tests and drop RANDOM_NZVALUE
> completely. Does that sound good ?

Umm, not really. Those mkwrite/young/huge etc. helpers do only exist for
page table levels where we can also have large mappings, at least on s390.
Also, we do (on s390) again check for certain sanity before actually setting
the bits.
Good news is that at least for the pxx_same() checks the memset() is no
problem, because pxx_same() does not do any checks other than the same check.

For the pxx_clear_tests(), maybe it could be an option to put them behind the
pxx_populate_tests(), and rely on them having properly populated (non-clear)
values after that?

[...]
> > 
> > Actually, using get_unmapped_area() as suggested by Kirill could also
> > solve this issue. We do create a new mm with 3-level page tables on s390,
> > and the dynamic upgrade to 4 or 5 levels is then triggered exactly by
> > arch_get_unmapped_area(), depending on the addr. But I currently don't
> > see how / where arch_get_unmapped_area() is set up for such a dummy mm
> > created by mm_alloc().  
> 
> Normally they are set during program loading but we can set it up explicitly
> for the test mm_struct if we need to but there are some other challenges.
> 
> load_[aout|elf|flat|..]_binary()
> 	setup_new_exec()
> 		arch_pick_mmap_layout().
> 
> I did some initial experiments around get_unmapped_area(). Seems bit tricky
> to get it working on a pure 'test' mm_struct. It expects a real user context
> in the form of current->mm.

Yes, that's where I stopped because it looked rather complicated :-)
Not sure why Kirill suggested it initially, but if using get_unmapped_area()
would only be necessary to get properly initialized page table levels
on s390, you could also defer this to a later add-on patch.

Regards,
Gerald
Anshuman Khandual Sept. 10, 2019, 3:56 a.m. UTC | #15
On 09/09/2019 08:43 PM, Kirill A. Shutemov wrote:
> On Mon, Sep 09, 2019 at 11:56:50AM +0530, Anshuman Khandual wrote:
>>
>>
>> On 09/07/2019 12:33 AM, Gerald Schaefer wrote:
>>> On Fri, 6 Sep 2019 11:58:59 +0530
>>> Anshuman Khandual <anshuman.khandual@arm.com> wrote:
>>>
>>>> On 09/05/2019 10:36 PM, Gerald Schaefer wrote:
>>>>> On Thu, 5 Sep 2019 14:48:14 +0530
>>>>> Anshuman Khandual <anshuman.khandual@arm.com> wrote:
>>>>>   
>>>>>>> [...]    
>>>>>>>> +
>>>>>>>> +#if !defined(__PAGETABLE_PMD_FOLDED) && !defined(__ARCH_HAS_4LEVEL_HACK)
>>>>>>>> +static void pud_clear_tests(pud_t *pudp)
>>>>>>>> +{
>>>>>>>> +	memset(pudp, RANDOM_NZVALUE, sizeof(pud_t));
>>>>>>>> +	pud_clear(pudp);
>>>>>>>> +	WARN_ON(!pud_none(READ_ONCE(*pudp)));
>>>>>>>> +}    
>>>>>>>
>>>>>>> For pgd/p4d/pud_clear(), we only clear if the page table level is present
>>>>>>> and not folded. The memset() here overwrites the table type bits, so
>>>>>>> pud_clear() will not clear anything on s390 and the pud_none() check will
>>>>>>> fail.
>>>>>>> Would it be possible to OR a (larger) random value into the table, so that
>>>>>>> the lower 12 bits would be preserved?    
>>>>>>
>>>>>> So the suggestion is instead of doing memset() on entry with RANDOM_NZVALUE,
>>>>>> it should OR a large random value preserving lower 12 bits. Hmm, this should
>>>>>> still do the trick for other platforms, they just need non zero value. So on
>>>>>> s390, the lower 12 bits on the page table entry already has valid value while
>>>>>> entering this function which would make sure that pud_clear() really does
>>>>>> clear the entry ?  
>>>>>
>>>>> Yes, in theory the table entry on s390 would have the type set in the last
>>>>> 4 bits, so preserving those would be enough. If it does not conflict with
>>>>> others, I would still suggest preserving all 12 bits since those would contain
>>>>> arch-specific flags in general, just to be sure. For s390, the pte/pmd tests
>>>>> would also work with the memset, but for consistency I think the same logic
>>>>> should be used in all pxd_clear_tests.  
>>>>
>>>> Makes sense but..
>>>>
>>>> There is a small challenge with this. Modifying individual bits on a given
>>>> page table entry from generic code like this test case is bit tricky. That
>>>> is because there are not enough helpers to create entries with an absolute
>>>> value. This would have been easier if all the platforms provided functions
>>>> like __pxx() which is not the case now. Otherwise something like this should
>>>> have worked.
>>>>
>>>>
>>>> pud_t pud = READ_ONCE(*pudp);
>>>> pud = __pud(pud_val(pud) | RANDOM_VALUE (keeping lower 12 bits 0))
>>>> WRITE_ONCE(*pudp, pud);
>>>>
>>>> But __pud() will fail to build in many platforms.
>>>
>>> Hmm, I simply used this on my system to make pud_clear_tests() work, not
>>> sure if it works on all archs:
>>>
>>> pud_val(*pudp) |= RANDOM_NZVALUE;
>>
>> Which compiles on arm64 but then fails on x86 because of the way pmd_val()
>> has been defined there.
> 
> Use instead
> 
> 	*pudp = __pud(pud_val(*pudp) | RANDOM_NZVALUE);

Agreed.

As I had mentioned before this would have been really the cleanest approach.

> 
> It *should* be more portable.

Not really, because not all the platforms have __pxx() definitions right now.
Going with these will clearly cause build failures on affected platforms. Lets
examine __pud() for instance. It is defined only on these platforms.

arch/arm64/include/asm/pgtable-types.h:		#define __pud(x) ((pud_t) { (x) } )
arch/mips/include/asm/pgtable-64.h:		#define __pud(x) ((pud_t) { (x) })
arch/powerpc/include/asm/pgtable-be-types.h:	#define __pud(x) ((pud_t) { cpu_to_be64(x) })
arch/powerpc/include/asm/pgtable-types.h:	#define __pud(x) ((pud_t) { (x) })
arch/s390/include/asm/page.h:			#define __pud(x) ((pud_t) { (x) } )
arch/sparc/include/asm/page_64.h:		#define __pud(x) ((pud_t) { (x) } )
arch/sparc/include/asm/page_64.h:		#define __pud(x) (x)
arch/x86/include/asm/pgtable.h:			#define __pud(x) native_make_pud(x)

Similarly for __pmd()

arch/alpha/include/asm/page.h:			#define __pmd(x)  ((pmd_t) { (x) } )
arch/arm/include/asm/page-nommu.h:		#define __pmd(x)  (x)
arch/arm/include/asm/pgtable-2level-types.h:	#define __pmd(x)  ((pmd_t) { (x) } )
arch/arm/include/asm/pgtable-2level-types.h:	#define __pmd(x)  (x)
arch/arm/include/asm/pgtable-3level-types.h:	#define __pmd(x)  ((pmd_t) { (x) } )
arch/arm/include/asm/pgtable-3level-types.h:	#define __pmd(x)  (x)
arch/arm64/include/asm/pgtable-types.h:		#define __pmd(x)  ((pmd_t) { (x) } )
arch/m68k/include/asm/page.h:			#define __pmd(x)  ((pmd_t) { { (x) }, })
arch/mips/include/asm/pgtable-64.h:		#define __pmd(x)  ((pmd_t) { (x) } )
arch/nds32/include/asm/page.h:			#define __pmd(x)  (x)
arch/parisc/include/asm/page.h:			#define __pmd(x)  ((pmd_t) { (x) } )
arch/parisc/include/asm/page.h:			#define __pmd(x)  (x)
arch/powerpc/include/asm/pgtable-be-types.h:	#define __pmd(x)  ((pmd_t) { cpu_to_be64(x) })
arch/powerpc/include/asm/pgtable-types.h:	#define __pmd(x)  ((pmd_t) { (x) })
arch/riscv/include/asm/pgtable-64.h:		#define __pmd(x)  ((pmd_t) { (x) })
arch/s390/include/asm/page.h:			#define __pmd(x)  ((pmd_t) { (x) } )
arch/sh/include/asm/pgtable-3level.h:		#define __pmd(x)  ((pmd_t) { (x) } )
arch/sparc/include/asm/page_32.h:		#define __pmd(x)  ((pmd_t) { { (x) }, })
arch/sparc/include/asm/page_32.h:		#define __pmd(x)  ((pmd_t) { { (x) }, })
arch/sparc/include/asm/page_64.h:		#define __pmd(x)  ((pmd_t) { (x) } )
arch/sparc/include/asm/page_64.h:		#define __pmd(x)  (x)
arch/um/include/asm/page.h:			#define __pmd(x)  ((pmd_t) { (x) } )
arch/um/include/asm/page.h:			#define __pmd(x)  ((pmd_t) { (x) } )
arch/x86/include/asm/pgtable.h:			#define __pmd(x)  native_make_pmd(x)

Similarly for __pgd()

arch/alpha/include/asm/page.h:			#define __pgd(x)  ((pgd_t) { (x) } )
arch/alpha/include/asm/page.h:			#define __pgd(x)  (x)
arch/arc/include/asm/page.h:			#define __pgd(x)  ((pgd_t) { (x) })
arch/arc/include/asm/page.h:			#define __pgd(x)  (x)
arch/arm/include/asm/pgtable-3level-types.h:	#define __pgd(x)  ((pgd_t) { (x) } )
arch/arm/include/asm/pgtable-3level-types.h:	#define __pgd(x)  (x)
arch/arm64/include/asm/pgtable-types.h:		#define __pgd(x)  ((pgd_t) { (x) } )
arch/csky/include/asm/page.h:			#define __pgd(x)  ((pgd_t) { (x) })
arch/hexagon/include/asm/page.h:		#define __pgd(x)  ((pgd_t) { (x) })
arch/m68k/include/asm/page.h:			#define __pgd(x)  ((pgd_t) { (x) } )
arch/mips/include/asm/page.h:			#define __pgd(x)  ((pgd_t) { (x) } )
arch/nds32/include/asm/page.h:			#define __pgd(x)  (x)
arch/nios2/include/asm/page.h:			#define __pgd(x)  ((pgd_t) { (x) })
arch/openrisc/include/asm/page.h:		#define __pgd(x)  ((pgd_t) { (x) })
arch/parisc/include/asm/page.h:			#define __pgd(x)  ((pgd_t) { (x) } )
arch/parisc/include/asm/page.h:			#define __pgd(x)  (x)
arch/powerpc/include/asm/pgtable-be-types.h:	#define __pgd(x)  ((pgd_t) { cpu_to_be64(x) })
arch/powerpc/include/asm/pgtable-types.h:	#define __pgd(x)  ((pgd_t) { (x) })
arch/riscv/include/asm/page.h:			#define __pgd(x)  ((pgd_t) { (x) })
arch/s390/include/asm/page.h:			#define __pgd(x)  ((pgd_t) { (x) } )
arch/sh/include/asm/page.h:			#define __pgd(x)  ((pgd_t) { (x) } )
arch/sparc/include/asm/page_32.h:		#define __pgd(x)  ((pgd_t) { (x) } )
arch/sparc/include/asm/page_32.h:		#define __pgd(x)  (x)
arch/sparc/include/asm/page_64.h:		#define __pgd(x)  ((pgd_t) { (x) } )
arch/sparc/include/asm/page_64.h:		#define __pgd(x)  (x)
arch/um/include/asm/page.h:			#define __pgd(x)  ((pgd_t) { (x) } )
arch/unicore32/include/asm/page.h:		#define __pgd(x)  ((pgd_t) { (x) })
arch/unicore32/include/asm/page.h:		#define __pgd(x)  (x)
arch/x86/include/asm/pgtable.h:			#define __pgd(x)  native_make_pgd(x)
arch/xtensa/include/asm/page.h:			#define __pgd(x)  ((pgd_t) { (x) } )

Similarly for __p4d()

arch/s390/include/asm/page.h:			#define __p4d(x)  ((p4d_t) { (x) } )
arch/x86/include/asm/pgtable.h:			#define __p4d(x)  native_make_p4d(x)

The search pattern here has been "#define __pxx(". Unless I am missing something,
I dont see how we can use these without risking build failures.
Christophe Leroy Sept. 10, 2019, 4:45 a.m. UTC | #16
On 09/10/2019 03:56 AM, Anshuman Khandual wrote:
> 
> 
> On 09/09/2019 08:43 PM, Kirill A. Shutemov wrote:
>> On Mon, Sep 09, 2019 at 11:56:50AM +0530, Anshuman Khandual wrote:
>>>
>>>
>>> On 09/07/2019 12:33 AM, Gerald Schaefer wrote:
>>>> On Fri, 6 Sep 2019 11:58:59 +0530
>>>> Anshuman Khandual <anshuman.khandual@arm.com> wrote:
>>>>
>>>>> On 09/05/2019 10:36 PM, Gerald Schaefer wrote:
>>>>>> On Thu, 5 Sep 2019 14:48:14 +0530
>>>>>> Anshuman Khandual <anshuman.khandual@arm.com> wrote:
>>>>>>    
>>>>>>>> [...]
>>>>>>>>> +
>>>>>>>>> +#if !defined(__PAGETABLE_PMD_FOLDED) && !defined(__ARCH_HAS_4LEVEL_HACK)
>>>>>>>>> +static void pud_clear_tests(pud_t *pudp)
>>>>>>>>> +{
>>>>>>>>> +	memset(pudp, RANDOM_NZVALUE, sizeof(pud_t));
>>>>>>>>> +	pud_clear(pudp);
>>>>>>>>> +	WARN_ON(!pud_none(READ_ONCE(*pudp)));
>>>>>>>>> +}
>>>>>>>>
>>>>>>>> For pgd/p4d/pud_clear(), we only clear if the page table level is present
>>>>>>>> and not folded. The memset() here overwrites the table type bits, so
>>>>>>>> pud_clear() will not clear anything on s390 and the pud_none() check will
>>>>>>>> fail.
>>>>>>>> Would it be possible to OR a (larger) random value into the table, so that
>>>>>>>> the lower 12 bits would be preserved?
>>>>>>>
>>>>>>> So the suggestion is instead of doing memset() on entry with RANDOM_NZVALUE,
>>>>>>> it should OR a large random value preserving lower 12 bits. Hmm, this should
>>>>>>> still do the trick for other platforms, they just need non zero value. So on
>>>>>>> s390, the lower 12 bits on the page table entry already has valid value while
>>>>>>> entering this function which would make sure that pud_clear() really does
>>>>>>> clear the entry ?
>>>>>>
>>>>>> Yes, in theory the table entry on s390 would have the type set in the last
>>>>>> 4 bits, so preserving those would be enough. If it does not conflict with
>>>>>> others, I would still suggest preserving all 12 bits since those would contain
>>>>>> arch-specific flags in general, just to be sure. For s390, the pte/pmd tests
>>>>>> would also work with the memset, but for consistency I think the same logic
>>>>>> should be used in all pxd_clear_tests.
>>>>>
>>>>> Makes sense but..
>>>>>
>>>>> There is a small challenge with this. Modifying individual bits on a given
>>>>> page table entry from generic code like this test case is bit tricky. That
>>>>> is because there are not enough helpers to create entries with an absolute
>>>>> value. This would have been easier if all the platforms provided functions
>>>>> like __pxx() which is not the case now. Otherwise something like this should
>>>>> have worked.
>>>>>
>>>>>
>>>>> pud_t pud = READ_ONCE(*pudp);
>>>>> pud = __pud(pud_val(pud) | RANDOM_VALUE (keeping lower 12 bits 0))
>>>>> WRITE_ONCE(*pudp, pud);
>>>>>
>>>>> But __pud() will fail to build in many platforms.
>>>>
>>>> Hmm, I simply used this on my system to make pud_clear_tests() work, not
>>>> sure if it works on all archs:
>>>>
>>>> pud_val(*pudp) |= RANDOM_NZVALUE;
>>>
>>> Which compiles on arm64 but then fails on x86 because of the way pmd_val()
>>> has been defined there.
>>
>> Use instead
>>
>> 	*pudp = __pud(pud_val(*pudp) | RANDOM_NZVALUE);
> 
> Agreed.
> 
> As I had mentioned before this would have been really the cleanest approach.
> 
>>
>> It *should* be more portable.
> 
> Not really, because not all the platforms have __pxx() definitions right now.
> Going with these will clearly cause build failures on affected platforms. Lets
> examine __pud() for instance. It is defined only on these platforms.
> 
> arch/arm64/include/asm/pgtable-types.h:		#define __pud(x) ((pud_t) { (x) } )
> arch/mips/include/asm/pgtable-64.h:		#define __pud(x) ((pud_t) { (x) })
> arch/powerpc/include/asm/pgtable-be-types.h:	#define __pud(x) ((pud_t) { cpu_to_be64(x) })
> arch/powerpc/include/asm/pgtable-types.h:	#define __pud(x) ((pud_t) { (x) })
> arch/s390/include/asm/page.h:			#define __pud(x) ((pud_t) { (x) } )
> arch/sparc/include/asm/page_64.h:		#define __pud(x) ((pud_t) { (x) } )
> arch/sparc/include/asm/page_64.h:		#define __pud(x) (x)
> arch/x86/include/asm/pgtable.h:			#define __pud(x) native_make_pud(x)

You missed:
arch/x86/include/asm/paravirt.h:static inline pud_t __pud(pudval_t val)
include/asm-generic/pgtable-nop4d-hack.h:#define __pud(x) 
                ((pud_t) { __pgd(x) })
include/asm-generic/pgtable-nopud.h:#define __pud(x) 
        ((pud_t) { __p4d(x) })

> 
> Similarly for __pmd()
> 
> arch/alpha/include/asm/page.h:			#define __pmd(x)  ((pmd_t) { (x) } )
> arch/arm/include/asm/page-nommu.h:		#define __pmd(x)  (x)
> arch/arm/include/asm/pgtable-2level-types.h:	#define __pmd(x)  ((pmd_t) { (x) } )
> arch/arm/include/asm/pgtable-2level-types.h:	#define __pmd(x)  (x)
> arch/arm/include/asm/pgtable-3level-types.h:	#define __pmd(x)  ((pmd_t) { (x) } )
> arch/arm/include/asm/pgtable-3level-types.h:	#define __pmd(x)  (x)
> arch/arm64/include/asm/pgtable-types.h:		#define __pmd(x)  ((pmd_t) { (x) } )
> arch/m68k/include/asm/page.h:			#define __pmd(x)  ((pmd_t) { { (x) }, })
> arch/mips/include/asm/pgtable-64.h:		#define __pmd(x)  ((pmd_t) { (x) } )
> arch/nds32/include/asm/page.h:			#define __pmd(x)  (x)
> arch/parisc/include/asm/page.h:			#define __pmd(x)  ((pmd_t) { (x) } )
> arch/parisc/include/asm/page.h:			#define __pmd(x)  (x)
> arch/powerpc/include/asm/pgtable-be-types.h:	#define __pmd(x)  ((pmd_t) { cpu_to_be64(x) })
> arch/powerpc/include/asm/pgtable-types.h:	#define __pmd(x)  ((pmd_t) { (x) })
> arch/riscv/include/asm/pgtable-64.h:		#define __pmd(x)  ((pmd_t) { (x) })
> arch/s390/include/asm/page.h:			#define __pmd(x)  ((pmd_t) { (x) } )
> arch/sh/include/asm/pgtable-3level.h:		#define __pmd(x)  ((pmd_t) { (x) } )
> arch/sparc/include/asm/page_32.h:		#define __pmd(x)  ((pmd_t) { { (x) }, })
> arch/sparc/include/asm/page_32.h:		#define __pmd(x)  ((pmd_t) { { (x) }, })
> arch/sparc/include/asm/page_64.h:		#define __pmd(x)  ((pmd_t) { (x) } )
> arch/sparc/include/asm/page_64.h:		#define __pmd(x)  (x)
> arch/um/include/asm/page.h:			#define __pmd(x)  ((pmd_t) { (x) } )
> arch/um/include/asm/page.h:			#define __pmd(x)  ((pmd_t) { (x) } )
> arch/x86/include/asm/pgtable.h:			#define __pmd(x)  native_make_pmd(x)

You missed:
arch/x86/include/asm/paravirt.h:static inline pmd_t __pmd(pmdval_t val)
include/asm-generic/page.h:#define __pmd(x)     ((pmd_t) { (x) } )
include/asm-generic/pgtable-nopmd.h:#define __pmd(x) 
        ((pmd_t) { __pud(x) } )


> 
> Similarly for __pgd()
> 
> arch/alpha/include/asm/page.h:			#define __pgd(x)  ((pgd_t) { (x) } )
> arch/alpha/include/asm/page.h:			#define __pgd(x)  (x)
> arch/arc/include/asm/page.h:			#define __pgd(x)  ((pgd_t) { (x) })
> arch/arc/include/asm/page.h:			#define __pgd(x)  (x)
> arch/arm/include/asm/pgtable-3level-types.h:	#define __pgd(x)  ((pgd_t) { (x) } )
> arch/arm/include/asm/pgtable-3level-types.h:	#define __pgd(x)  (x)
> arch/arm64/include/asm/pgtable-types.h:		#define __pgd(x)  ((pgd_t) { (x) } )
> arch/csky/include/asm/page.h:			#define __pgd(x)  ((pgd_t) { (x) })
> arch/hexagon/include/asm/page.h:		#define __pgd(x)  ((pgd_t) { (x) })
> arch/m68k/include/asm/page.h:			#define __pgd(x)  ((pgd_t) { (x) } )
> arch/mips/include/asm/page.h:			#define __pgd(x)  ((pgd_t) { (x) } )
> arch/nds32/include/asm/page.h:			#define __pgd(x)  (x)
> arch/nios2/include/asm/page.h:			#define __pgd(x)  ((pgd_t) { (x) })
> arch/openrisc/include/asm/page.h:		#define __pgd(x)  ((pgd_t) { (x) })
> arch/parisc/include/asm/page.h:			#define __pgd(x)  ((pgd_t) { (x) } )
> arch/parisc/include/asm/page.h:			#define __pgd(x)  (x)
> arch/powerpc/include/asm/pgtable-be-types.h:	#define __pgd(x)  ((pgd_t) { cpu_to_be64(x) })
> arch/powerpc/include/asm/pgtable-types.h:	#define __pgd(x)  ((pgd_t) { (x) })
> arch/riscv/include/asm/page.h:			#define __pgd(x)  ((pgd_t) { (x) })
> arch/s390/include/asm/page.h:			#define __pgd(x)  ((pgd_t) { (x) } )
> arch/sh/include/asm/page.h:			#define __pgd(x)  ((pgd_t) { (x) } )
> arch/sparc/include/asm/page_32.h:		#define __pgd(x)  ((pgd_t) { (x) } )
> arch/sparc/include/asm/page_32.h:		#define __pgd(x)  (x)
> arch/sparc/include/asm/page_64.h:		#define __pgd(x)  ((pgd_t) { (x) } )
> arch/sparc/include/asm/page_64.h:		#define __pgd(x)  (x)
> arch/um/include/asm/page.h:			#define __pgd(x)  ((pgd_t) { (x) } )
> arch/unicore32/include/asm/page.h:		#define __pgd(x)  ((pgd_t) { (x) })
> arch/unicore32/include/asm/page.h:		#define __pgd(x)  (x)
> arch/x86/include/asm/pgtable.h:			#define __pgd(x)  native_make_pgd(x)
> arch/xtensa/include/asm/page.h:			#define __pgd(x)  ((pgd_t) { (x) } )

You missed:
arch/x86/include/asm/paravirt.h:static inline pgd_t __pgd(pgdval_t val)
include/asm-generic/page.h:#define __pgd(x)     ((pgd_t) { (x) } )


> 
> Similarly for __p4d()
> 
> arch/s390/include/asm/page.h:			#define __p4d(x)  ((p4d_t) { (x) } )
> arch/x86/include/asm/pgtable.h:			#define __p4d(x)  native_make_p4d(x)

You missed:
arch/x86/include/asm/paravirt.h:static inline p4d_t __p4d(p4dval_t val)
include/asm-generic/5level-fixup.h:#define __p4d(x) 
__pgd(x)
include/asm-generic/pgtable-nop4d.h:#define __p4d(x) 
        ((p4d_t) { __pgd(x) })


> 
> The search pattern here has been "#define __pxx(". Unless I am missing something,
> I dont see how we can use these without risking build failures.
> 

I guess you missed that arches not defining them fall back on the 
definitions in include/asm-generic

Christophe
Anshuman Khandual Sept. 10, 2019, 5:43 a.m. UTC | #17
On 09/10/2019 10:15 AM, Christophe Leroy wrote:
> 
> 
> On 09/10/2019 03:56 AM, Anshuman Khandual wrote:
>>
>>
>> On 09/09/2019 08:43 PM, Kirill A. Shutemov wrote:
>>> On Mon, Sep 09, 2019 at 11:56:50AM +0530, Anshuman Khandual wrote:
>>>>
>>>>
>>>> On 09/07/2019 12:33 AM, Gerald Schaefer wrote:
>>>>> On Fri, 6 Sep 2019 11:58:59 +0530
>>>>> Anshuman Khandual <anshuman.khandual@arm.com> wrote:
>>>>>
>>>>>> On 09/05/2019 10:36 PM, Gerald Schaefer wrote:
>>>>>>> On Thu, 5 Sep 2019 14:48:14 +0530
>>>>>>> Anshuman Khandual <anshuman.khandual@arm.com> wrote:
>>>>>>>   
>>>>>>>>> [...]
>>>>>>>>>> +
>>>>>>>>>> +#if !defined(__PAGETABLE_PMD_FOLDED) && !defined(__ARCH_HAS_4LEVEL_HACK)
>>>>>>>>>> +static void pud_clear_tests(pud_t *pudp)
>>>>>>>>>> +{
>>>>>>>>>> +    memset(pudp, RANDOM_NZVALUE, sizeof(pud_t));
>>>>>>>>>> +    pud_clear(pudp);
>>>>>>>>>> +    WARN_ON(!pud_none(READ_ONCE(*pudp)));
>>>>>>>>>> +}
>>>>>>>>>
>>>>>>>>> For pgd/p4d/pud_clear(), we only clear if the page table level is present
>>>>>>>>> and not folded. The memset() here overwrites the table type bits, so
>>>>>>>>> pud_clear() will not clear anything on s390 and the pud_none() check will
>>>>>>>>> fail.
>>>>>>>>> Would it be possible to OR a (larger) random value into the table, so that
>>>>>>>>> the lower 12 bits would be preserved?
>>>>>>>>
>>>>>>>> So the suggestion is instead of doing memset() on entry with RANDOM_NZVALUE,
>>>>>>>> it should OR a large random value preserving lower 12 bits. Hmm, this should
>>>>>>>> still do the trick for other platforms, they just need non zero value. So on
>>>>>>>> s390, the lower 12 bits on the page table entry already has valid value while
>>>>>>>> entering this function which would make sure that pud_clear() really does
>>>>>>>> clear the entry ?
>>>>>>>
>>>>>>> Yes, in theory the table entry on s390 would have the type set in the last
>>>>>>> 4 bits, so preserving those would be enough. If it does not conflict with
>>>>>>> others, I would still suggest preserving all 12 bits since those would contain
>>>>>>> arch-specific flags in general, just to be sure. For s390, the pte/pmd tests
>>>>>>> would also work with the memset, but for consistency I think the same logic
>>>>>>> should be used in all pxd_clear_tests.
>>>>>>
>>>>>> Makes sense but..
>>>>>>
>>>>>> There is a small challenge with this. Modifying individual bits on a given
>>>>>> page table entry from generic code like this test case is bit tricky. That
>>>>>> is because there are not enough helpers to create entries with an absolute
>>>>>> value. This would have been easier if all the platforms provided functions
>>>>>> like __pxx() which is not the case now. Otherwise something like this should
>>>>>> have worked.
>>>>>>
>>>>>>
>>>>>> pud_t pud = READ_ONCE(*pudp);
>>>>>> pud = __pud(pud_val(pud) | RANDOM_VALUE (keeping lower 12 bits 0))
>>>>>> WRITE_ONCE(*pudp, pud);
>>>>>>
>>>>>> But __pud() will fail to build in many platforms.
>>>>>
>>>>> Hmm, I simply used this on my system to make pud_clear_tests() work, not
>>>>> sure if it works on all archs:
>>>>>
>>>>> pud_val(*pudp) |= RANDOM_NZVALUE;
>>>>
>>>> Which compiles on arm64 but then fails on x86 because of the way pmd_val()
>>>> has been defined there.
>>>
>>> Use instead
>>>
>>>     *pudp = __pud(pud_val(*pudp) | RANDOM_NZVALUE);
>>
>> Agreed.
>>
>> As I had mentioned before this would have been really the cleanest approach.
>>
>>>
>>> It *should* be more portable.
>>
>> Not really, because not all the platforms have __pxx() definitions right now.
>> Going with these will clearly cause build failures on affected platforms. Lets
>> examine __pud() for instance. It is defined only on these platforms.
>>
>> arch/arm64/include/asm/pgtable-types.h:        #define __pud(x) ((pud_t) { (x) } )
>> arch/mips/include/asm/pgtable-64.h:        #define __pud(x) ((pud_t) { (x) })
>> arch/powerpc/include/asm/pgtable-be-types.h:    #define __pud(x) ((pud_t) { cpu_to_be64(x) })
>> arch/powerpc/include/asm/pgtable-types.h:    #define __pud(x) ((pud_t) { (x) })
>> arch/s390/include/asm/page.h:            #define __pud(x) ((pud_t) { (x) } )
>> arch/sparc/include/asm/page_64.h:        #define __pud(x) ((pud_t) { (x) } )
>> arch/sparc/include/asm/page_64.h:        #define __pud(x) (x)
>> arch/x86/include/asm/pgtable.h:            #define __pud(x) native_make_pud(x)
> 
> You missed:
> arch/x86/include/asm/paravirt.h:static inline pud_t __pud(pudval_t val)
> include/asm-generic/pgtable-nop4d-hack.h:#define __pud(x)                ((pud_t) { __pgd(x) })
> include/asm-generic/pgtable-nopud.h:#define __pud(x)        ((pud_t) { __p4d(x) })
> 
>>
>> Similarly for __pmd()
>>
>> arch/alpha/include/asm/page.h:            #define __pmd(x)  ((pmd_t) { (x) } )
>> arch/arm/include/asm/page-nommu.h:        #define __pmd(x)  (x)
>> arch/arm/include/asm/pgtable-2level-types.h:    #define __pmd(x)  ((pmd_t) { (x) } )
>> arch/arm/include/asm/pgtable-2level-types.h:    #define __pmd(x)  (x)
>> arch/arm/include/asm/pgtable-3level-types.h:    #define __pmd(x)  ((pmd_t) { (x) } )
>> arch/arm/include/asm/pgtable-3level-types.h:    #define __pmd(x)  (x)
>> arch/arm64/include/asm/pgtable-types.h:        #define __pmd(x)  ((pmd_t) { (x) } )
>> arch/m68k/include/asm/page.h:            #define __pmd(x)  ((pmd_t) { { (x) }, })
>> arch/mips/include/asm/pgtable-64.h:        #define __pmd(x)  ((pmd_t) { (x) } )
>> arch/nds32/include/asm/page.h:            #define __pmd(x)  (x)
>> arch/parisc/include/asm/page.h:            #define __pmd(x)  ((pmd_t) { (x) } )
>> arch/parisc/include/asm/page.h:            #define __pmd(x)  (x)
>> arch/powerpc/include/asm/pgtable-be-types.h:    #define __pmd(x)  ((pmd_t) { cpu_to_be64(x) })
>> arch/powerpc/include/asm/pgtable-types.h:    #define __pmd(x)  ((pmd_t) { (x) })
>> arch/riscv/include/asm/pgtable-64.h:        #define __pmd(x)  ((pmd_t) { (x) })
>> arch/s390/include/asm/page.h:            #define __pmd(x)  ((pmd_t) { (x) } )
>> arch/sh/include/asm/pgtable-3level.h:        #define __pmd(x)  ((pmd_t) { (x) } )
>> arch/sparc/include/asm/page_32.h:        #define __pmd(x)  ((pmd_t) { { (x) }, })
>> arch/sparc/include/asm/page_32.h:        #define __pmd(x)  ((pmd_t) { { (x) }, })
>> arch/sparc/include/asm/page_64.h:        #define __pmd(x)  ((pmd_t) { (x) } )
>> arch/sparc/include/asm/page_64.h:        #define __pmd(x)  (x)
>> arch/um/include/asm/page.h:            #define __pmd(x)  ((pmd_t) { (x) } )
>> arch/um/include/asm/page.h:            #define __pmd(x)  ((pmd_t) { (x) } )
>> arch/x86/include/asm/pgtable.h:            #define __pmd(x)  native_make_pmd(x)
> 
> You missed:
> arch/x86/include/asm/paravirt.h:static inline pmd_t __pmd(pmdval_t val)
> include/asm-generic/page.h:#define __pmd(x)     ((pmd_t) { (x) } )
> include/asm-generic/pgtable-nopmd.h:#define __pmd(x)        ((pmd_t) { __pud(x) } )
> 
> 
>>
>> Similarly for __pgd()
>>
>> arch/alpha/include/asm/page.h:            #define __pgd(x)  ((pgd_t) { (x) } )
>> arch/alpha/include/asm/page.h:            #define __pgd(x)  (x)
>> arch/arc/include/asm/page.h:            #define __pgd(x)  ((pgd_t) { (x) })
>> arch/arc/include/asm/page.h:            #define __pgd(x)  (x)
>> arch/arm/include/asm/pgtable-3level-types.h:    #define __pgd(x)  ((pgd_t) { (x) } )
>> arch/arm/include/asm/pgtable-3level-types.h:    #define __pgd(x)  (x)
>> arch/arm64/include/asm/pgtable-types.h:        #define __pgd(x)  ((pgd_t) { (x) } )
>> arch/csky/include/asm/page.h:            #define __pgd(x)  ((pgd_t) { (x) })
>> arch/hexagon/include/asm/page.h:        #define __pgd(x)  ((pgd_t) { (x) })
>> arch/m68k/include/asm/page.h:            #define __pgd(x)  ((pgd_t) { (x) } )
>> arch/mips/include/asm/page.h:            #define __pgd(x)  ((pgd_t) { (x) } )
>> arch/nds32/include/asm/page.h:            #define __pgd(x)  (x)
>> arch/nios2/include/asm/page.h:            #define __pgd(x)  ((pgd_t) { (x) })
>> arch/openrisc/include/asm/page.h:        #define __pgd(x)  ((pgd_t) { (x) })
>> arch/parisc/include/asm/page.h:            #define __pgd(x)  ((pgd_t) { (x) } )
>> arch/parisc/include/asm/page.h:            #define __pgd(x)  (x)
>> arch/powerpc/include/asm/pgtable-be-types.h:    #define __pgd(x)  ((pgd_t) { cpu_to_be64(x) })
>> arch/powerpc/include/asm/pgtable-types.h:    #define __pgd(x)  ((pgd_t) { (x) })
>> arch/riscv/include/asm/page.h:            #define __pgd(x)  ((pgd_t) { (x) })
>> arch/s390/include/asm/page.h:            #define __pgd(x)  ((pgd_t) { (x) } )
>> arch/sh/include/asm/page.h:            #define __pgd(x)  ((pgd_t) { (x) } )
>> arch/sparc/include/asm/page_32.h:        #define __pgd(x)  ((pgd_t) { (x) } )
>> arch/sparc/include/asm/page_32.h:        #define __pgd(x)  (x)
>> arch/sparc/include/asm/page_64.h:        #define __pgd(x)  ((pgd_t) { (x) } )
>> arch/sparc/include/asm/page_64.h:        #define __pgd(x)  (x)
>> arch/um/include/asm/page.h:            #define __pgd(x)  ((pgd_t) { (x) } )
>> arch/unicore32/include/asm/page.h:        #define __pgd(x)  ((pgd_t) { (x) })
>> arch/unicore32/include/asm/page.h:        #define __pgd(x)  (x)
>> arch/x86/include/asm/pgtable.h:            #define __pgd(x)  native_make_pgd(x)
>> arch/xtensa/include/asm/page.h:            #define __pgd(x)  ((pgd_t) { (x) } )
> 
> You missed:
> arch/x86/include/asm/paravirt.h:static inline pgd_t __pgd(pgdval_t val)
> include/asm-generic/page.h:#define __pgd(x)     ((pgd_t) { (x) } )
> 
> 
>>
>> Similarly for __p4d()
>>
>> arch/s390/include/asm/page.h:            #define __p4d(x)  ((p4d_t) { (x) } )
>> arch/x86/include/asm/pgtable.h:            #define __p4d(x)  native_make_p4d(x)
> 
> You missed:
> arch/x86/include/asm/paravirt.h:static inline p4d_t __p4d(p4dval_t val)
> include/asm-generic/5level-fixup.h:#define __p4d(x) __pgd(x)
> include/asm-generic/pgtable-nop4d.h:#define __p4d(x)        ((p4d_t) { __pgd(x) })
> 
> 
>>
>> The search pattern here has been "#define __pxx(". Unless I am missing something,
>> I dont see how we can use these without risking build failures.
>>
> 
> I guess you missed that arches not defining them fall back on the definitions in include/asm-generic

You are right. I was confused whether these generic definitions were really
applicable for all those platforms as fallback (with so many page table
level folding combinations available) when they dont define. Sure will take
this approach and try to build them on multiple platforms.
diff mbox series

Patch

diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug
index 327b3ebf23bf..ce9c397f7b07 100644
--- a/mm/Kconfig.debug
+++ b/mm/Kconfig.debug
@@ -117,3 +117,17 @@  config DEBUG_RODATA_TEST
     depends on STRICT_KERNEL_RWX
     ---help---
       This option enables a testcase for the setting rodata read-only.
+
+config DEBUG_ARCH_PGTABLE_TEST
+	bool "Test arch page table helpers for semantics compliance"
+	depends on MMU
+	depends on DEBUG_KERNEL
+	help
+	  This options provides a kernel module which can be used to test
+	  architecture page table helper functions on various platform in
+	  verifying if they comply with expected generic MM semantics. This
+	  will help architectures code in making sure that any changes or
+	  new additions of these helpers will still conform to generic MM
+	  expected semantics.
+
+	  If unsure, say N.
diff --git a/mm/Makefile b/mm/Makefile
index d996846697ef..bb572c5aa8c5 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -86,6 +86,7 @@  obj-$(CONFIG_HWPOISON_INJECT) += hwpoison-inject.o
 obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o
 obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak-test.o
 obj-$(CONFIG_DEBUG_RODATA_TEST) += rodata_test.o
+obj-$(CONFIG_DEBUG_ARCH_PGTABLE_TEST) += arch_pgtable_test.o
 obj-$(CONFIG_PAGE_OWNER) += page_owner.o
 obj-$(CONFIG_CLEANCACHE) += cleancache.o
 obj-$(CONFIG_MEMORY_ISOLATION) += page_isolation.o
diff --git a/mm/arch_pgtable_test.c b/mm/arch_pgtable_test.c
new file mode 100644
index 000000000000..f15be8a73723
--- /dev/null
+++ b/mm/arch_pgtable_test.c
@@ -0,0 +1,425 @@ 
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * This kernel module validates architecture page table helpers &
+ * accessors and helps in verifying their continued compliance with
+ * generic MM semantics.
+ *
+ * Copyright (C) 2019 ARM Ltd.
+ *
+ * Author: Anshuman Khandual <anshuman.khandual@arm.com>
+ */
+#define pr_fmt(fmt) "arch_pgtable_test: %s " fmt, __func__
+
+#include <linux/kernel.h>
+#include <linux/hugetlb.h>
+#include <linux/mm.h>
+#include <linux/mman.h>
+#include <linux/mm_types.h>
+#include <linux/module.h>
+#include <linux/printk.h>
+#include <linux/swap.h>
+#include <linux/swapops.h>
+#include <linux/pfn_t.h>
+#include <linux/gfp.h>
+#include <linux/spinlock.h>
+#include <linux/sched/mm.h>
+#include <asm/pgalloc.h>
+#include <asm/pgtable.h>
+
+/*
+ * Basic operations
+ *
+ * mkold(entry)			= An old and not a young entry
+ * mkyoung(entry)		= A young and not an old entry
+ * mkdirty(entry)		= A dirty and not a clean entry
+ * mkclean(entry)		= A clean and not a dirty entry
+ * mkwrite(entry)		= A write and not a write protected entry
+ * wrprotect(entry)		= A write protected and not a write entry
+ * pxx_bad(entry)		= A mapped and non-table entry
+ * pxx_same(entry1, entry2)	= Both entries hold the exact same value
+ */
+#define VADDR_TEST	(PGDIR_SIZE + PUD_SIZE + PMD_SIZE + PAGE_SIZE)
+#define VMA_TEST_FLAGS	(VM_READ|VM_WRITE|VM_EXEC)
+#define RANDOM_NZVALUE	(0xbe)
+
+static bool pud_aligned;
+static bool pmd_aligned;
+
+extern struct mm_struct *mm_alloc(void);
+
+static void pte_basic_tests(struct page *page, pgprot_t prot)
+{
+	pte_t pte = mk_pte(page, prot);
+
+	WARN_ON(!pte_same(pte, pte));
+	WARN_ON(!pte_young(pte_mkyoung(pte)));
+	WARN_ON(!pte_dirty(pte_mkdirty(pte)));
+	WARN_ON(!pte_write(pte_mkwrite(pte)));
+	WARN_ON(pte_young(pte_mkold(pte)));
+	WARN_ON(pte_dirty(pte_mkclean(pte)));
+	WARN_ON(pte_write(pte_wrprotect(pte)));
+}
+
+#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE
+static void pmd_basic_tests(struct page *page, pgprot_t prot)
+{
+	pmd_t pmd;
+
+	/*
+	 * Memory block here must be PMD_SIZE aligned. Abort this
+	 * test in case we could not allocate such a memory block.
+	 */
+	if (!pmd_aligned) {
+		pr_warn("Could not proceed with PMD tests\n");
+		return;
+	}
+
+	pmd = mk_pmd(page, prot);
+	WARN_ON(!pmd_same(pmd, pmd));
+	WARN_ON(!pmd_young(pmd_mkyoung(pmd)));
+	WARN_ON(!pmd_dirty(pmd_mkdirty(pmd)));
+	WARN_ON(!pmd_write(pmd_mkwrite(pmd)));
+	WARN_ON(pmd_young(pmd_mkold(pmd)));
+	WARN_ON(pmd_dirty(pmd_mkclean(pmd)));
+	WARN_ON(pmd_write(pmd_wrprotect(pmd)));
+	/*
+	 * A huge page does not point to next level page table
+	 * entry. Hence this must qualify as pmd_bad().
+	 */
+	WARN_ON(!pmd_bad(pmd_mkhuge(pmd)));
+}
+#else
+static void pmd_basic_tests(struct page *page, pgprot_t prot) { }
+#endif
+
+#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
+static void pud_basic_tests(struct page *page, pgprot_t prot)
+{
+	pud_t pud;
+
+	/*
+	 * Memory block here must be PUD_SIZE aligned. Abort this
+	 * test in case we could not allocate such a memory block.
+	 */
+	if (!pud_aligned) {
+		pr_warn("Could not proceed with PUD tests\n");
+		return;
+	}
+
+	pud = pfn_pud(page_to_pfn(page), prot);
+	WARN_ON(!pud_same(pud, pud));
+	WARN_ON(!pud_young(pud_mkyoung(pud)));
+	WARN_ON(!pud_write(pud_mkwrite(pud)));
+	WARN_ON(pud_write(pud_wrprotect(pud)));
+	WARN_ON(pud_young(pud_mkold(pud)));
+
+#if !defined(__PAGETABLE_PMD_FOLDED) && !defined(__ARCH_HAS_4LEVEL_HACK)
+	/*
+	 * A huge page does not point to next level page table
+	 * entry. Hence this must qualify as pud_bad().
+	 */
+	WARN_ON(!pud_bad(pud_mkhuge(pud)));
+#endif
+}
+#else
+static void pud_basic_tests(struct page *page, pgprot_t prot) { }
+#endif
+
+static void p4d_basic_tests(struct page *page, pgprot_t prot)
+{
+	p4d_t p4d;
+
+	memset(&p4d, RANDOM_NZVALUE, sizeof(p4d_t));
+	WARN_ON(!p4d_same(p4d, p4d));
+}
+
+static void pgd_basic_tests(struct page *page, pgprot_t prot)
+{
+	pgd_t pgd;
+
+	memset(&pgd, RANDOM_NZVALUE, sizeof(pgd_t));
+	WARN_ON(!pgd_same(pgd, pgd));
+}
+
+#if !defined(__PAGETABLE_PMD_FOLDED) && !defined(__ARCH_HAS_4LEVEL_HACK)
+static void pud_clear_tests(pud_t *pudp)
+{
+	memset(pudp, RANDOM_NZVALUE, sizeof(pud_t));
+	pud_clear(pudp);
+	WARN_ON(!pud_none(READ_ONCE(*pudp)));
+}
+
+static void pud_populate_tests(struct mm_struct *mm, pud_t *pudp, pmd_t *pmdp)
+{
+	/*
+	 * This entry points to next level page table page.
+	 * Hence this must not qualify as pud_bad().
+	 */
+	pmd_clear(pmdp);
+	pud_clear(pudp);
+	pud_populate(mm, pudp, pmdp);
+	WARN_ON(pud_bad(READ_ONCE(*pudp)));
+}
+#else
+static void pud_clear_tests(pud_t *pudp) { }
+static void pud_populate_tests(struct mm_struct *mm, pud_t *pudp, pmd_t *pmdp)
+{
+}
+#endif
+
+#if !defined(__PAGETABLE_PUD_FOLDED) && !defined(__ARCH_HAS_5LEVEL_HACK)
+static void p4d_clear_tests(p4d_t *p4dp)
+{
+	memset(p4dp, RANDOM_NZVALUE, sizeof(p4d_t));
+	p4d_clear(p4dp);
+	WARN_ON(!p4d_none(READ_ONCE(*p4dp)));
+}
+
+static void p4d_populate_tests(struct mm_struct *mm, p4d_t *p4dp, pud_t *pudp)
+{
+	/*
+	 * This entry points to next level page table page.
+	 * Hence this must not qualify as p4d_bad().
+	 */
+	pud_clear(pudp);
+	p4d_clear(p4dp);
+	p4d_populate(mm, p4dp, pudp);
+	WARN_ON(p4d_bad(READ_ONCE(*p4dp)));
+}
+#else
+static void p4d_clear_tests(p4d_t *p4dp) { }
+static void p4d_populate_tests(struct mm_struct *mm, p4d_t *p4dp, pud_t *pudp)
+{
+}
+#endif
+
+#ifndef __PAGETABLE_P4D_FOLDED
+static void pgd_clear_tests(pgd_t *pgdp)
+{
+	memset(pgdp, RANDOM_NZVALUE, sizeof(pgd_t));
+	pgd_clear(pgdp);
+	WARN_ON(!pgd_none(READ_ONCE(*pgdp)));
+}
+
+static void pgd_populate_tests(struct mm_struct *mm, pgd_t *pgdp, p4d_t *p4dp)
+{
+	/*
+	 * This entry points to next level page table page.
+	 * Hence this must not qualify as pgd_bad().
+	 */
+	p4d_clear(p4dp);
+	pgd_clear(pgdp);
+	pgd_populate(mm, pgdp, p4dp);
+	WARN_ON(pgd_bad(READ_ONCE(*pgdp)));
+}
+#else
+static void pgd_clear_tests(pgd_t *pgdp) { }
+static void pgd_populate_tests(struct mm_struct *mm, pgd_t *pgdp, p4d_t *p4dp)
+{
+}
+#endif
+
+static void pte_clear_tests(pte_t *ptep)
+{
+	memset(ptep, RANDOM_NZVALUE, sizeof(pte_t));
+	pte_clear(NULL, 0, ptep);
+	WARN_ON(!pte_none(READ_ONCE(*ptep)));
+}
+
+static void pmd_clear_tests(pmd_t *pmdp)
+{
+	memset(pmdp, RANDOM_NZVALUE, sizeof(pmd_t));
+	pmd_clear(pmdp);
+	WARN_ON(!pmd_none(READ_ONCE(*pmdp)));
+}
+
+static void pmd_populate_tests(struct mm_struct *mm, pmd_t *pmdp,
+			       pgtable_t pgtable)
+{
+	/*
+	 * This entry points to next level page table page.
+	 * Hence this must not qualify as pmd_bad().
+	 */
+	pmd_clear(pmdp);
+	pmd_populate(mm, pmdp, pgtable);
+	WARN_ON(pmd_bad(READ_ONCE(*pmdp)));
+}
+
+static bool pfn_range_valid(struct zone *z, unsigned long start_pfn,
+			    unsigned long nr_pages)
+{
+	unsigned long i, end_pfn = start_pfn + nr_pages;
+	struct page *page;
+
+	for (i = start_pfn; i < end_pfn; i++) {
+		if (!pfn_valid(i))
+			return false;
+
+		page = pfn_to_page(i);
+
+		if (page_zone(page) != z)
+			return false;
+
+		if (PageReserved(page))
+			return false;
+
+		if (page_count(page) > 0)
+			return false;
+
+		if (PageHuge(page))
+			return false;
+	}
+	return true;
+}
+
+static struct page *alloc_gigantic_page(nodemask_t *nodemask,
+					int nid, gfp_t gfp_mask, int order)
+{
+	struct zonelist *zonelist;
+	struct zone *zone;
+	struct zoneref *z;
+	enum zone_type zonesel;
+	unsigned long ret, pfn, flags, nr_pages;
+
+	nr_pages = 1UL << order;
+	zonesel = gfp_zone(gfp_mask);
+	zonelist = node_zonelist(nid, gfp_mask);
+	for_each_zone_zonelist_nodemask(zone, z, zonelist, zonesel, nodemask) {
+		spin_lock_irqsave(&zone->lock, flags);
+		pfn = ALIGN(zone->zone_start_pfn, nr_pages);
+		while (zone_spans_pfn(zone, pfn + nr_pages - 1)) {
+			if (pfn_range_valid(zone, pfn, nr_pages)) {
+				spin_unlock_irqrestore(&zone->lock, flags);
+				ret = alloc_contig_range(pfn, pfn + nr_pages,
+							 MIGRATE_MOVABLE,
+							 gfp_mask);
+				if (!ret)
+					return pfn_to_page(pfn);
+				spin_lock_irqsave(&zone->lock, flags);
+			}
+			pfn += nr_pages;
+		}
+		spin_unlock_irqrestore(&zone->lock, flags);
+	}
+	return NULL;
+}
+
+static struct page *alloc_mapped_page(void)
+{
+	gfp_t gfp_mask = GFP_KERNEL | __GFP_ZERO;
+	struct page *page = NULL;
+
+	page = alloc_gigantic_page(&node_states[N_MEMORY], first_memory_node,
+				   gfp_mask, get_order(PUD_SIZE));
+	if (page) {
+		pud_aligned = true;
+		pmd_aligned = true;
+		return page;
+	}
+
+	page = alloc_pages(gfp_mask, get_order(PMD_SIZE));
+	if (page) {
+		pmd_aligned = true;
+		return page;
+	}
+	return alloc_page(gfp_mask);
+}
+
+static void free_mapped_page(struct page *page)
+{
+	if (pud_aligned) {
+		unsigned long pfn = page_to_pfn(page);
+
+		free_contig_range(pfn, 1ULL << get_order(PUD_SIZE));
+		return;
+	}
+
+	if (pmd_aligned) {
+		int order = get_order(PMD_SIZE);
+
+		free_pages((unsigned long)page_address(page), order);
+		return;
+	}
+	free_page((unsigned long)page_address(page));
+}
+
+static int __init arch_pgtable_tests_init(void)
+{
+	struct mm_struct *mm;
+	struct page *page;
+	pgd_t *pgdp;
+	p4d_t *p4dp, *saved_p4dp;
+	pud_t *pudp, *saved_pudp;
+	pmd_t *pmdp, *saved_pmdp;
+	pte_t *ptep, *saved_ptep;
+	pgprot_t prot = vm_get_page_prot(VMA_TEST_FLAGS);
+	unsigned long vaddr = VADDR_TEST;
+
+	mm = mm_alloc();
+	if (!mm) {
+		pr_err("mm_struct allocation failed\n");
+		return 1;
+	}
+
+	page = alloc_mapped_page();
+	if (!page) {
+		pr_err("memory allocation failed\n");
+		return 1;
+	}
+
+	pgdp = pgd_offset(mm, vaddr);
+	p4dp = p4d_alloc(mm, pgdp, vaddr);
+	pudp = pud_alloc(mm, p4dp, vaddr);
+	pmdp = pmd_alloc(mm, pudp, vaddr);
+	ptep = pte_alloc_map(mm, pmdp, vaddr);
+
+	/*
+	 * Save all the page table page addresses as the page table
+	 * entries will be used for testing with random or garbage
+	 * values. These saved addresses will be used for freeing
+	 * page table pages.
+	 */
+	saved_p4dp = p4d_offset(pgdp, 0UL);
+	saved_pudp = pud_offset(p4dp, 0UL);
+	saved_pmdp = pmd_offset(pudp, 0UL);
+	saved_ptep = pte_offset_map(pmdp, 0UL);
+
+	pte_basic_tests(page, prot);
+	pmd_basic_tests(page, prot);
+	pud_basic_tests(page, prot);
+	p4d_basic_tests(page, prot);
+	pgd_basic_tests(page, prot);
+
+	pte_clear_tests(ptep);
+	pmd_clear_tests(pmdp);
+	pud_clear_tests(pudp);
+	p4d_clear_tests(p4dp);
+	pgd_clear_tests(pgdp);
+
+	pmd_populate_tests(mm, pmdp, (pgtable_t) page);
+	pud_populate_tests(mm, pudp, pmdp);
+	p4d_populate_tests(mm, p4dp, pudp);
+	pgd_populate_tests(mm, pgdp, p4dp);
+
+	p4d_free(mm, saved_p4dp);
+	pud_free(mm, saved_pudp);
+	pmd_free(mm, saved_pmdp);
+	pte_free(mm, (pgtable_t) virt_to_page(saved_ptep));
+
+	mm_dec_nr_puds(mm);
+	mm_dec_nr_pmds(mm);
+	mm_dec_nr_ptes(mm);
+	__mmdrop(mm);
+
+	free_mapped_page(page);
+	return 0;
+}
+
+static void __exit arch_pgtable_tests_exit(void) { }
+
+module_init(arch_pgtable_tests_init);
+module_exit(arch_pgtable_tests_exit);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Anshuman Khandual <anshuman.khandual@arm.com>");
+MODULE_DESCRIPTION("Test archicture page table helpers");