From patchwork Tue Oct 15 09:21:41 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anshuman Khandual X-Patchwork-Id: 11189913 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 12396139A for ; Tue, 15 Oct 2019 09:21:47 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C061E217F9 for ; Tue, 15 Oct 2019 09:21:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C061E217F9 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id ECCEA8E0006; Tue, 15 Oct 2019 05:21:45 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id E57018E0001; Tue, 15 Oct 2019 05:21:45 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D1D798E0006; Tue, 15 Oct 2019 05:21:45 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0152.hostedemail.com [216.40.44.152]) by kanga.kvack.org (Postfix) with ESMTP id AB6048E0001 for ; Tue, 15 Oct 2019 05:21:45 -0400 (EDT) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id 5005B182104C9 for ; Tue, 15 Oct 2019 09:21:45 +0000 (UTC) X-FDA: 76045476570.10.song26_2c58e54b5f13c X-Spam-Summary: 2,0,0,7fc928d32624b85a,d41d8cd98f00b204,anshuman.khandual@arm.com,::anshuman.khandual@arm.com:akpm@linux-foundation.org:vbabka@suse.cz:gregkh@linuxfoundation.org:tglx@linutronix.de:rppt@linux.vnet.ibm.com:mike.kravetz@oracle.com:jgg@ziepe.ca:dan.j.williams@intel.com:peterz@infradead.org:mhocko@kernel.org:mark.rutland@arm.com:broonie@kernel.org:steven.price@arm.com:ard.biesheuvel@linaro.org:yamada.masahiro@socionext.com:keescook@chromium.org:penguin-kernel@i-love.sakura.ne.jp:willy@infradead.org:schowdary@nvidia.com:dave.hansen@intel.com:linux@armlinux.org.uk:mpe@ellerman.id.au:paulus@samba.org:schwidefsky@de.ibm.com:heiko.carstens@de.ibm.com:davem@davemloft.net:vgupta@synopsys.com:jhogan@kernel.org:paul.burton@mips.com:ralf@linux-mips.org:kirill@shutemov.name:gerald.schaefer@de.ibm.com:christophe.leroy@c-s.fr:rientjes@google.com:aarcange@redhat.com:osalvador@suse.de:mgorman@techsingularity.net:linux-snps-arc@lists.infradead.org:linux-mips@vger.kernel.org:linux -arm-ker X-HE-Tag: song26_2c58e54b5f13c X-Filterd-Recvd-Size: 12459 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf42.hostedemail.com (Postfix) with ESMTP for ; Tue, 15 Oct 2019 09:21:44 +0000 (UTC) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 89C7828; Tue, 15 Oct 2019 02:21:43 -0700 (PDT) Received: from p8cg001049571a15.blr.arm.com (p8cg001049571a15.blr.arm.com [10.162.42.142]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 5D3623F718; Tue, 15 Oct 2019 02:21:31 -0700 (PDT) From: Anshuman Khandual To: linux-mm@kvack.org Cc: Anshuman Khandual , Andrew Morton , Vlastimil Babka , Greg Kroah-Hartman , Thomas Gleixner , Mike Rapoport , Mike Kravetz , Jason Gunthorpe , Dan Williams , Peter Zijlstra , Michal Hocko , Mark Rutland , Mark Brown , Steven Price , Ard Biesheuvel , Masahiro Yamada , Kees Cook , Tetsuo Handa , Matthew Wilcox , Sri Krishna chowdary , Dave Hansen , Russell King - ARM Linux , Michael Ellerman , Paul Mackerras , Martin Schwidefsky , Heiko Carstens , "David S. Miller" , Vineet Gupta , James Hogan , Paul Burton , Ralf Baechle , "Kirill A . Shutemov" , Gerald Schaefer , Christophe Leroy , David Rientjes , Andrea Arcangeli , Oscar Salvador , Mel Gorman , linux-snps-arc@lists.infradead.org, linux-mips@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-ia64@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH V6 1/2] mm/page_alloc: Make alloc_gigantic_page() available for general use Date: Tue, 15 Oct 2019 14:51:41 +0530 Message-Id: <1571131302-32290-2-git-send-email-anshuman.khandual@arm.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1571131302-32290-1-git-send-email-anshuman.khandual@arm.com> References: <1571131302-32290-1-git-send-email-anshuman.khandual@arm.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: alloc_gigantic_page() implements an allocation method where it scans over various zones looking for a large contiguous memory block which could not have been allocated through the buddy allocator. A subsequent patch which tests arch page table helpers needs such a method to allocate PUD_SIZE sized memory block. In the future such methods might have other use cases as well. So alloc_gigantic_page() has been split carving out actual memory allocation method and made available via new alloc_gigantic_page_order() which is wrapped under CONFIG_CONTIG_ALLOC. Cc: Andrew Morton Cc: Vlastimil Babka Cc: Greg Kroah-Hartman Cc: Thomas Gleixner Cc: Mike Rapoport Cc: Mike Kravetz Cc: Jason Gunthorpe Cc: Dan Williams Cc: Peter Zijlstra Cc: Michal Hocko Cc: Mark Rutland Cc: Mark Brown Cc: Steven Price Cc: Ard Biesheuvel Cc: Masahiro Yamada Cc: Kees Cook Cc: Tetsuo Handa Cc: Matthew Wilcox Cc: Sri Krishna chowdary Cc: Dave Hansen Cc: Russell King - ARM Linux Cc: Michael Ellerman Cc: Paul Mackerras Cc: Martin Schwidefsky Cc: Heiko Carstens Cc: "David S. Miller" Cc: Vineet Gupta Cc: James Hogan Cc: Paul Burton Cc: Ralf Baechle Cc: Kirill A. Shutemov Cc: Gerald Schaefer Cc: Christophe Leroy Cc: David Rientjes Cc: Andrea Arcangeli Cc: Oscar Salvador Cc: Mel Gorman Cc: linux-snps-arc@lists.infradead.org Cc: linux-mips@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-ia64@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-s390@vger.kernel.org Cc: linux-sh@vger.kernel.org Cc: sparclinux@vger.kernel.org Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual --- include/linux/gfp.h | 3 ++ mm/hugetlb.c | 76 +---------------------------------- mm/page_alloc.c | 98 +++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 102 insertions(+), 75 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index fb07b503dc45..379ad23437d1 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -589,6 +589,9 @@ static inline bool pm_suspended_storage(void) /* The below functions must be run on a range from a single zone. */ extern int alloc_contig_range(unsigned long start, unsigned long end, unsigned migratetype, gfp_t gfp_mask); +extern struct page *alloc_gigantic_page_order(unsigned int order, + gfp_t gfp_mask, int nid, + nodemask_t *nodemask); #endif void free_contig_range(unsigned long pfn, unsigned int nr_pages); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 977f9a323a7a..d199556a4a2c 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1023,86 +1023,12 @@ static void free_gigantic_page(struct page *page, unsigned int order) } #ifdef CONFIG_CONTIG_ALLOC -static int __alloc_gigantic_page(unsigned long start_pfn, - unsigned long nr_pages, gfp_t gfp_mask) -{ - unsigned long end_pfn = start_pfn + nr_pages; - return alloc_contig_range(start_pfn, end_pfn, MIGRATE_MOVABLE, - gfp_mask); -} - -static bool pfn_range_valid_gigantic(struct zone *z, - unsigned long start_pfn, unsigned long nr_pages) -{ - unsigned long i, end_pfn = start_pfn + nr_pages; - struct page *page; - - for (i = start_pfn; i < end_pfn; i++) { - if (!pfn_valid(i)) - return false; - - page = pfn_to_page(i); - - if (page_zone(page) != z) - return false; - - if (PageReserved(page)) - return false; - - if (page_count(page) > 0) - return false; - - if (PageHuge(page)) - return false; - } - - return true; -} - -static bool zone_spans_last_pfn(const struct zone *zone, - unsigned long start_pfn, unsigned long nr_pages) -{ - unsigned long last_pfn = start_pfn + nr_pages - 1; - return zone_spans_pfn(zone, last_pfn); -} - static struct page *alloc_gigantic_page(struct hstate *h, gfp_t gfp_mask, int nid, nodemask_t *nodemask) { unsigned int order = huge_page_order(h); - unsigned long nr_pages = 1 << order; - unsigned long ret, pfn, flags; - struct zonelist *zonelist; - struct zone *zone; - struct zoneref *z; - - zonelist = node_zonelist(nid, gfp_mask); - for_each_zone_zonelist_nodemask(zone, z, zonelist, gfp_zone(gfp_mask), nodemask) { - spin_lock_irqsave(&zone->lock, flags); - pfn = ALIGN(zone->zone_start_pfn, nr_pages); - while (zone_spans_last_pfn(zone, pfn, nr_pages)) { - if (pfn_range_valid_gigantic(zone, pfn, nr_pages)) { - /* - * We release the zone lock here because - * alloc_contig_range() will also lock the zone - * at some point. If there's an allocation - * spinning on this lock, it may win the race - * and cause alloc_contig_range() to fail... - */ - spin_unlock_irqrestore(&zone->lock, flags); - ret = __alloc_gigantic_page(pfn, nr_pages, gfp_mask); - if (!ret) - return pfn_to_page(pfn); - spin_lock_irqsave(&zone->lock, flags); - } - pfn += nr_pages; - } - - spin_unlock_irqrestore(&zone->lock, flags); - } - - return NULL; + return alloc_gigantic_page_order(order, gfp_mask, nid, nodemask); } static void prep_new_huge_page(struct hstate *h, struct page *page, int nid); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 6ab8eb670fd3..0f67367213c6 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -8497,6 +8497,104 @@ int alloc_contig_range(unsigned long start, unsigned long end, pfn_max_align_up(end), migratetype); return ret; } + +static int __alloc_gigantic_page(unsigned long start_pfn, + unsigned long nr_pages, gfp_t gfp_mask) +{ + unsigned long end_pfn = start_pfn + nr_pages; + + return alloc_contig_range(start_pfn, end_pfn, MIGRATE_MOVABLE, + gfp_mask); +} + +static bool pfn_range_valid_gigantic(struct zone *z, unsigned long start_pfn, + unsigned long nr_pages) +{ + unsigned long i, end_pfn = start_pfn + nr_pages; + struct page *page; + + for (i = start_pfn; i < end_pfn; i++) { + if (!pfn_valid(i)) + return false; + + page = pfn_to_page(i); + + if (page_zone(page) != z) + return false; + + if (PageReserved(page)) + return false; + + if (page_count(page) > 0) + return false; + + if (PageHuge(page)) + return false; + } + return true; +} + +static bool zone_spans_last_pfn(const struct zone *zone, + unsigned long start_pfn, unsigned long nr_pages) +{ + unsigned long last_pfn = start_pfn + nr_pages - 1; + + return zone_spans_pfn(zone, last_pfn); +} + +/** + * alloc_gigantic_page_order() -- tries to allocate given order of pages + * @order: allocation order (greater than MAX_ORDER) + * @gfp_mask: GFP mask to use during compaction + * @nid: allocation node + * @nodemask: allocation nodemask + * + * This routine is an wrapper around alloc_contig_range() which scans over + * all zones on an applicable zonelist to find a contiguous pfn range which + * can the be allocated with alloc_contig_range(). This routine is intended + * to be used for allocations greater than MAX_ORDER. + * + * Return: page on success or NULL on failure. On success a memory block + * of 'order' starting with 'page' has been allocated successfully. Memory + * allocated here needs to be freed with free_contig_range(). + */ +struct page *alloc_gigantic_page_order(unsigned int order, gfp_t gfp_mask, + int nid, nodemask_t *nodemask) +{ + unsigned long nr_pages = 1 << order; + unsigned long ret, pfn, flags; + struct zonelist *zonelist; + struct zone *zone; + struct zoneref *z; + + zonelist = node_zonelist(nid, gfp_mask); + for_each_zone_zonelist_nodemask(zone, z, zonelist, + gfp_zone(gfp_mask), nodemask) { + spin_lock_irqsave(&zone->lock, flags); + + pfn = ALIGN(zone->zone_start_pfn, nr_pages); + while (zone_spans_last_pfn(zone, pfn, nr_pages)) { + if (pfn_range_valid_gigantic(zone, pfn, nr_pages)) { + /* + * We release the zone lock here because + * alloc_contig_range() will also lock the zone + * at some point. If there's an allocation + * spinning on this lock, it may win the race + * and cause alloc_contig_range() to fail... + */ + spin_unlock_irqrestore(&zone->lock, flags); + ret = __alloc_gigantic_page(pfn, nr_pages, + gfp_mask); + if (!ret) + return pfn_to_page(pfn); + spin_lock_irqsave(&zone->lock, flags); + } + pfn += nr_pages; + } + spin_unlock_irqrestore(&zone->lock, flags); + } + return NULL; +} #endif /* CONFIG_CONTIG_ALLOC */ void free_contig_range(unsigned long pfn, unsigned int nr_pages) From patchwork Tue Oct 15 09:21:42 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anshuman Khandual X-Patchwork-Id: 11189919 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E08AA139A for ; Tue, 15 Oct 2019 09:21:58 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8EBBF21A49 for ; Tue, 15 Oct 2019 09:21:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8EBBF21A49 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id AD5568E0007; Tue, 15 Oct 2019 05:21:57 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id A5EC88E0001; Tue, 15 Oct 2019 05:21:57 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 928538E0007; Tue, 15 Oct 2019 05:21:57 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0036.hostedemail.com [216.40.44.36]) by kanga.kvack.org (Postfix) with ESMTP id 66FF48E0001 for ; Tue, 15 Oct 2019 05:21:57 -0400 (EDT) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with SMTP id 85B0CF048 for ; Tue, 15 Oct 2019 09:21:56 +0000 (UTC) X-FDA: 76045477032.24.pig68_2df358c247d4a X-Spam-Summary: 2,0,0,31c95afae846f22f,d41d8cd98f00b204,anshuman.khandual@arm.com,::anshuman.khandual@arm.com:akpm@linux-foundation.org:vbabka@suse.cz:gregkh@linuxfoundation.org:tglx@linutronix.de:rppt@linux.vnet.ibm.com:jgg@ziepe.ca:dan.j.williams@intel.com:peterz@infradead.org:mhocko@kernel.org:mark.rutland@arm.com:broonie@kernel.org:steven.price@arm.com:ard.biesheuvel@linaro.org:yamada.masahiro@socionext.com:keescook@chromium.org:penguin-kernel@i-love.sakura.ne.jp:willy@infradead.org:schowdary@nvidia.com:dave.hansen@intel.com:linux@armlinux.org.uk:mpe@ellerman.id.au:paulus@samba.org:schwidefsky@de.ibm.com:heiko.carstens@de.ibm.com:davem@davemloft.net:vgupta@synopsys.com:jhogan@kernel.org:paul.burton@mips.com:ralf@linux-mips.org:kirill@shutemov.name:gerald.schaefer@de.ibm.com:christophe.leroy@c-s.fr:linux-snps-arc@lists.infradead.org:linux-mips@vger.kernel.org:linux-arm-kernel@lists.infradead.org:linux-ia64@vger.kernel.org:linuxppc-dev@lists.ozlabs.org:linux-s390@vger.kerne l.org:li X-HE-Tag: pig68_2df358c247d4a X-Filterd-Recvd-Size: 23700 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf06.hostedemail.com (Postfix) with ESMTP for ; Tue, 15 Oct 2019 09:21:55 +0000 (UTC) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 463581000; Tue, 15 Oct 2019 02:21:55 -0700 (PDT) Received: from p8cg001049571a15.blr.arm.com (p8cg001049571a15.blr.arm.com [10.162.42.142]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 1054F3F718; Tue, 15 Oct 2019 02:21:43 -0700 (PDT) From: Anshuman Khandual To: linux-mm@kvack.org Cc: Anshuman Khandual , Andrew Morton , Vlastimil Babka , Greg Kroah-Hartman , Thomas Gleixner , Mike Rapoport , Jason Gunthorpe , Dan Williams , Peter Zijlstra , Michal Hocko , Mark Rutland , Mark Brown , Steven Price , Ard Biesheuvel , Masahiro Yamada , Kees Cook , Tetsuo Handa , Matthew Wilcox , Sri Krishna chowdary , Dave Hansen , Russell King - ARM Linux , Michael Ellerman , Paul Mackerras , Martin Schwidefsky , Heiko Carstens , "David S. Miller" , Vineet Gupta , James Hogan , Paul Burton , Ralf Baechle , "Kirill A . Shutemov" , Gerald Schaefer , Christophe Leroy , linux-snps-arc@lists.infradead.org, linux-mips@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-ia64@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH V6 2/2] mm/debug: Add tests validating architecture page table helpers Date: Tue, 15 Oct 2019 14:51:42 +0530 Message-Id: <1571131302-32290-3-git-send-email-anshuman.khandual@arm.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1571131302-32290-1-git-send-email-anshuman.khandual@arm.com> References: <1571131302-32290-1-git-send-email-anshuman.khandual@arm.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This adds tests which will validate architecture page table helpers and other accessors in their compliance with expected generic MM semantics. This will help various architectures in validating changes to existing page table helpers or addition of new ones. Test page table and memory pages creating it's entries at various level are all allocated from system memory with required size and alignments. But if memory pages with required size and alignment could not be allocated, then all depending individual tests are just skipped afterwards. This test gets called right after init_mm_internals() required for alloc_contig_range() to work correctly. This gets build and run when CONFIG_DEBUG_VM_PGTABLE is selected along with CONFIG_VM_DEBUG. Architectures willing to subscribe this test also need to select CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE which for now is limited to x86 and arm64. Going forward, other architectures too can enable this after fixing build or runtime problems (if any) with their page table helpers. Cc: Andrew Morton Cc: Vlastimil Babka Cc: Greg Kroah-Hartman Cc: Thomas Gleixner Cc: Mike Rapoport Cc: Jason Gunthorpe Cc: Dan Williams Cc: Peter Zijlstra Cc: Michal Hocko Cc: Mark Rutland Cc: Mark Brown Cc: Steven Price Cc: Ard Biesheuvel Cc: Masahiro Yamada Cc: Kees Cook Cc: Tetsuo Handa Cc: Matthew Wilcox Cc: Sri Krishna chowdary Cc: Dave Hansen Cc: Russell King - ARM Linux Cc: Michael Ellerman Cc: Paul Mackerras Cc: Martin Schwidefsky Cc: Heiko Carstens Cc: "David S. Miller" Cc: Vineet Gupta Cc: James Hogan Cc: Paul Burton Cc: Ralf Baechle Cc: Kirill A. Shutemov Cc: Gerald Schaefer Cc: Christophe Leroy Cc: linux-snps-arc@lists.infradead.org Cc: linux-mips@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-ia64@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-s390@vger.kernel.org Cc: linux-sh@vger.kernel.org Cc: sparclinux@vger.kernel.org Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Tested-by: Christophe Leroy #PPC32 Suggested-by: Catalin Marinas Signed-off-by: Andrew Morton Signed-off-by: Christophe Leroy Signed-off-by: Anshuman Khandual --- .../debug/debug-vm-pgtable/arch-support.txt | 34 ++ arch/arm64/Kconfig | 1 + arch/x86/Kconfig | 1 + arch/x86/include/asm/pgtable_64.h | 6 + include/asm-generic/pgtable.h | 6 + init/main.c | 1 + lib/Kconfig.debug | 21 + mm/Makefile | 1 + mm/debug_vm_pgtable.c | 450 ++++++++++++++++++ 9 files changed, 521 insertions(+) create mode 100644 Documentation/features/debug/debug-vm-pgtable/arch-support.txt create mode 100644 mm/debug_vm_pgtable.c diff --git a/Documentation/features/debug/debug-vm-pgtable/arch-support.txt b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt new file mode 100644 index 000000000000..d6b8185dcf1e --- /dev/null +++ b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt @@ -0,0 +1,34 @@ +# +# Feature name: debug-vm-pgtable +# Kconfig: ARCH_HAS_DEBUG_VM_PGTABLE +# description: arch supports pgtable tests for semantics compliance +# + ----------------------- + | arch |status| + ----------------------- + | alpha: | TODO | + | arc: | TODO | + | arm: | TODO | + | arm64: | ok | + | c6x: | TODO | + | csky: | TODO | + | h8300: | TODO | + | hexagon: | TODO | + | ia64: | TODO | + | m68k: | TODO | + | microblaze: | TODO | + | mips: | TODO | + | nds32: | TODO | + | nios2: | TODO | + | openrisc: | TODO | + | parisc: | TODO | + | powerpc: | TODO | + | riscv: | TODO | + | s390: | TODO | + | sh: | TODO | + | sparc: | TODO | + | um: | TODO | + | unicore32: | TODO | + | x86: | ok | + | xtensa: | TODO | + ----------------------- diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 950a56b71ff0..8a3b3eaa49e9 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -11,6 +11,7 @@ config ARM64 select ACPI_PPTT if ACPI select ARCH_CLOCKSOURCE_DATA select ARCH_HAS_DEBUG_VIRTUAL + select ARCH_HAS_DEBUG_VM_PGTABLE select ARCH_HAS_DEVMEM_IS_ALLOWED select ARCH_HAS_DMA_COHERENT_TO_PFN select ARCH_HAS_DMA_PREP_COHERENT diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index abe822d52167..13c9bd950256 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -61,6 +61,7 @@ config X86 select ARCH_CLOCKSOURCE_INIT select ARCH_HAS_ACPI_TABLE_UPGRADE if ACPI select ARCH_HAS_DEBUG_VIRTUAL + select ARCH_HAS_DEBUG_VM_PGTABLE select ARCH_HAS_DEVMEM_IS_ALLOWED select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_FAST_MULTIPLIER diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h index 0b6c4042942a..fb0e76d254b3 100644 --- a/arch/x86/include/asm/pgtable_64.h +++ b/arch/x86/include/asm/pgtable_64.h @@ -53,6 +53,12 @@ static inline void sync_initial_page_table(void) { } struct mm_struct; +#define mm_p4d_folded mm_p4d_folded +static inline bool mm_p4d_folded(struct mm_struct *mm) +{ + return !pgtable_l5_enabled(); +} + void set_pte_vaddr_p4d(p4d_t *p4d_page, unsigned long vaddr, pte_t new_pte); void set_pte_vaddr_pud(pud_t *pud_page, unsigned long vaddr, pte_t new_pte); diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index 818691846c90..7f97b7a4a9e2 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -1157,6 +1157,12 @@ static inline bool arch_has_pfn_modify_check(void) # define PAGE_KERNEL_EXEC PAGE_KERNEL #endif +#ifdef CONFIG_DEBUG_VM_PGTABLE +extern void debug_vm_pgtable(void); +#else +static inline void debug_vm_pgtable(void) { } +#endif + #endif /* !__ASSEMBLY__ */ #ifndef io_remap_pfn_range diff --git a/init/main.c b/init/main.c index 91f6ebb30ef0..676d8020dd29 100644 --- a/init/main.c +++ b/init/main.c @@ -1177,6 +1177,7 @@ static noinline void __init kernel_init_freeable(void) workqueue_init(); init_mm_internals(); + debug_vm_pgtable(); do_pre_smp_initcalls(); lockup_detector_init(); diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index cb1fc48e61e3..b9b8fe1e2180 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -698,6 +698,27 @@ config DEBUG_VM_PGFLAGS If unsure, say N. +config ARCH_HAS_DEBUG_VM_PGTABLE + bool + help + An architecture should select this when it can successfully + build and run DEBUG_VM_PGTABLE. + +config DEBUG_VM_PGTABLE + bool "Debug arch page table for semantics compliance" + depends on MMU + depends on DEBUG_VM + depends on ARCH_HAS_DEBUG_VM_PGTABLE + help + This option provides a debug method which can be used to test + architecture page table helper functions on various platforms in + verifying if they comply with expected generic MM semantics. This + will help architecture code in making sure that any changes or + new additions of these helpers still conform to expected + semantics of the generic MM. + + If unsure, say N. + config ARCH_HAS_DEBUG_VIRTUAL bool diff --git a/mm/Makefile b/mm/Makefile index d996846697ef..2f085b971d34 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -86,6 +86,7 @@ obj-$(CONFIG_HWPOISON_INJECT) += hwpoison-inject.o obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak-test.o obj-$(CONFIG_DEBUG_RODATA_TEST) += rodata_test.o +obj-$(CONFIG_DEBUG_VM_PGTABLE) += debug_vm_pgtable.o obj-$(CONFIG_PAGE_OWNER) += page_owner.o obj-$(CONFIG_CLEANCACHE) += cleancache.o obj-$(CONFIG_MEMORY_ISOLATION) += page_isolation.o diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c new file mode 100644 index 000000000000..24da76d8b343 --- /dev/null +++ b/mm/debug_vm_pgtable.c @@ -0,0 +1,450 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * This kernel module validates architecture page table helpers & + * accessors and helps in verifying their continued compliance with + * generic MM semantics. + * + * Copyright (C) 2019 ARM Ltd. + * + * Author: Anshuman Khandual + */ +#define pr_fmt(fmt) "arch_pgtable_test: %s " fmt, __func__ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* + * Basic operations + * + * mkold(entry) = An old and not a young entry + * mkyoung(entry) = A young and not an old entry + * mkdirty(entry) = A dirty and not a clean entry + * mkclean(entry) = A clean and not a dirty entry + * mkwrite(entry) = A write and not a write protected entry + * wrprotect(entry) = A write protected and not a write entry + * pxx_bad(entry) = A mapped and non-table entry + * pxx_same(entry1, entry2) = Both entries hold the exact same value + */ +#define VMFLAGS (VM_READ|VM_WRITE|VM_EXEC) + +/* + * On s390 platform, the lower 12 bits are used to identify given page table + * entry type and for other arch specific requirements. But these bits might + * affect the ability to clear entries with pxx_clear(). So while loading up + * the entries skip all lower 12 bits in order to accommodate s390 platform. + * It does not have affect any other platform. + */ +#define RANDOM_ORVALUE (0xfffffffffffff000UL) +#define RANDOM_NZVALUE (0xff) + +static bool pud_aligned __initdata; +static bool pmd_aligned __initdata; + +static void __init pte_basic_tests(struct page *page, pgprot_t prot) +{ + pte_t pte = mk_pte(page, prot); + + WARN_ON(!pte_same(pte, pte)); + WARN_ON(!pte_young(pte_mkyoung(pte))); + WARN_ON(!pte_dirty(pte_mkdirty(pte))); + WARN_ON(!pte_write(pte_mkwrite(pte))); + WARN_ON(pte_young(pte_mkold(pte))); + WARN_ON(pte_dirty(pte_mkclean(pte))); + WARN_ON(pte_write(pte_wrprotect(pte))); +} + +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE +static void __init pmd_basic_tests(struct page *page, pgprot_t prot) +{ + pmd_t pmd; + + /* + * Memory block here must be PMD_SIZE aligned. Abort this + * test in case we could not allocate such a memory block. + */ + if (!pmd_aligned) { + pr_warn("Could not proceed with PMD tests\n"); + return; + } + + pmd = mk_pmd(page, prot); + WARN_ON(!pmd_same(pmd, pmd)); + WARN_ON(!pmd_young(pmd_mkyoung(pmd))); + WARN_ON(!pmd_dirty(pmd_mkdirty(pmd))); + WARN_ON(!pmd_write(pmd_mkwrite(pmd))); + WARN_ON(pmd_young(pmd_mkold(pmd))); + WARN_ON(pmd_dirty(pmd_mkclean(pmd))); + WARN_ON(pmd_write(pmd_wrprotect(pmd))); + /* + * A huge page does not point to next level page table + * entry. Hence this must qualify as pmd_bad(). + */ + WARN_ON(!pmd_bad(pmd_mkhuge(pmd))); +} +#else +static void __init pmd_basic_tests(struct page *page, pgprot_t prot) { } +#endif + +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD +static void __init pud_basic_tests(struct page *page, pgprot_t prot) +{ + pud_t pud; + + /* + * Memory block here must be PUD_SIZE aligned. Abort this + * test in case we could not allocate such a memory block. + */ + if (!pud_aligned) { + pr_warn("Could not proceed with PUD tests\n"); + return; + } + + pud = pfn_pud(page_to_pfn(page), prot); + WARN_ON(!pud_same(pud, pud)); + WARN_ON(!pud_young(pud_mkyoung(pud))); + WARN_ON(!pud_write(pud_mkwrite(pud))); + WARN_ON(pud_write(pud_wrprotect(pud))); + WARN_ON(pud_young(pud_mkold(pud))); + + if (mm_pmd_folded(mm) || __is_defined(ARCH_HAS_4LEVEL_HACK)) + return; + + /* + * A huge page does not point to next level page table + * entry. Hence this must qualify as pud_bad(). + */ + WARN_ON(!pud_bad(pud_mkhuge(pud))); +} +#else +static void __init pud_basic_tests(struct page *page, pgprot_t prot) { } +#endif + +static void __init p4d_basic_tests(struct page *page, pgprot_t prot) +{ + p4d_t p4d; + + memset(&p4d, RANDOM_NZVALUE, sizeof(p4d_t)); + WARN_ON(!p4d_same(p4d, p4d)); +} + +static void __init pgd_basic_tests(struct page *page, pgprot_t prot) +{ + pgd_t pgd; + + memset(&pgd, RANDOM_NZVALUE, sizeof(pgd_t)); + WARN_ON(!pgd_same(pgd, pgd)); +} + +#ifndef __ARCH_HAS_4LEVEL_HACK +static void __init pud_clear_tests(struct mm_struct *mm, pud_t *pudp) +{ + pud_t pud = READ_ONCE(*pudp); + + if (mm_pmd_folded(mm)) + return; + + pud = __pud(pud_val(pud) | RANDOM_ORVALUE); + WRITE_ONCE(*pudp, pud); + pud_clear(pudp); + pud = READ_ONCE(*pudp); + WARN_ON(!pud_none(pud)); +} + +static void __init pud_populate_tests(struct mm_struct *mm, pud_t *pudp, + pmd_t *pmdp) +{ + pud_t pud; + + if (mm_pmd_folded(mm)) + return; + /* + * This entry points to next level page table page. + * Hence this must not qualify as pud_bad(). + */ + pmd_clear(pmdp); + pud_clear(pudp); + pud_populate(mm, pudp, pmdp); + pud = READ_ONCE(*pudp); + WARN_ON(pud_bad(pud)); +} +#else +static void __init pud_clear_tests(struct mm_struct *mm, pud_t *pudp) { } +static void __init pud_populate_tests(struct mm_struct *mm, pud_t *pudp, + pmd_t *pmdp) +{ +} +#endif + +#ifndef __ARCH_HAS_5LEVEL_HACK +static void __init p4d_clear_tests(struct mm_struct *mm, p4d_t *p4dp) +{ + p4d_t p4d = READ_ONCE(*p4dp); + + if (mm_pud_folded(mm)) + return; + + p4d = __p4d(p4d_val(p4d) | RANDOM_ORVALUE); + WRITE_ONCE(*p4dp, p4d); + p4d_clear(p4dp); + p4d = READ_ONCE(*p4dp); + WARN_ON(!p4d_none(p4d)); +} + +static void __init p4d_populate_tests(struct mm_struct *mm, p4d_t *p4dp, + pud_t *pudp) +{ + p4d_t p4d; + + if (mm_pud_folded(mm)) + return; + + /* + * This entry points to next level page table page. + * Hence this must not qualify as p4d_bad(). + */ + pud_clear(pudp); + p4d_clear(p4dp); + p4d_populate(mm, p4dp, pudp); + p4d = READ_ONCE(*p4dp); + WARN_ON(p4d_bad(p4d)); +} + +static void __init pgd_clear_tests(struct mm_struct *mm, pgd_t *pgdp) +{ + pgd_t pgd = READ_ONCE(*pgdp); + + if (mm_p4d_folded(mm)) + return; + + pgd = __pgd(pgd_val(pgd) | RANDOM_ORVALUE); + WRITE_ONCE(*pgdp, pgd); + pgd_clear(pgdp); + pgd = READ_ONCE(*pgdp); + WARN_ON(!pgd_none(pgd)); +} + +static void __init pgd_populate_tests(struct mm_struct *mm, pgd_t *pgdp, + p4d_t *p4dp) +{ + pgd_t pgd; + + if (mm_p4d_folded(mm)) + return; + + /* + * This entry points to next level page table page. + * Hence this must not qualify as pgd_bad(). + */ + p4d_clear(p4dp); + pgd_clear(pgdp); + pgd_populate(mm, pgdp, p4dp); + pgd = READ_ONCE(*pgdp); + WARN_ON(pgd_bad(pgd)); +} +#else +static void __init p4d_clear_tests(struct mm_struct *mm, p4d_t *p4dp) { } +static void __init pgd_clear_tests(struct mm_struct *mm, pgd_t *pgdp) { } +static void __init p4d_populate_tests(struct mm_struct *mm, p4d_t *p4dp, + pud_t *pudp) +{ +} +static void __init pgd_populate_tests(struct mm_struct *mm, pgd_t *pgdp, + p4d_t *p4dp) +{ +} +#endif + +static void __init pte_clear_tests(struct mm_struct *mm, pte_t *ptep) +{ + pte_t pte = READ_ONCE(*ptep); + + pte = __pte(pte_val(pte) | RANDOM_ORVALUE); + WRITE_ONCE(*ptep, pte); + pte_clear(mm, 0, ptep); + pte = READ_ONCE(*ptep); + WARN_ON(!pte_none(pte)); +} + +static void __init pmd_clear_tests(struct mm_struct *mm, pmd_t *pmdp) +{ + pmd_t pmd = READ_ONCE(*pmdp); + + pmd = __pmd(pmd_val(pmd) | RANDOM_ORVALUE); + WRITE_ONCE(*pmdp, pmd); + pmd_clear(pmdp); + pmd = READ_ONCE(*pmdp); + WARN_ON(!pmd_none(pmd)); +} + +static void __init pmd_populate_tests(struct mm_struct *mm, pmd_t *pmdp, + pgtable_t pgtable) +{ + pmd_t pmd; + + /* + * This entry points to next level page table page. + * Hence this must not qualify as pmd_bad(). + */ + pmd_clear(pmdp); + pmd_populate(mm, pmdp, pgtable); + pmd = READ_ONCE(*pmdp); + WARN_ON(pmd_bad(pmd)); +} + +#ifdef CONFIG_CONTIG_ALLOC +static struct page * __init alloc_pud_page(gfp_t gfp_mask) +{ + return alloc_gigantic_page_order(get_order(PUD_SIZE), gfp_mask, + first_memory_node, &node_states[N_MEMORY]); +} +#else +static struct page * __init alloc_pud_page(gfp_t pfp_mask) +{ + return NULL; +} +#endif + +static struct page * __init alloc_mapped_page(void) +{ + struct page *page; + gfp_t gfp_mask = GFP_KERNEL | __GFP_ZERO; + + page = alloc_pud_page(gfp_mask); + if (page) { + pud_aligned = true; + pmd_aligned = true; + return page; + } + + page = alloc_pages(gfp_mask, get_order(PMD_SIZE)); + if (page) { + pmd_aligned = true; + return page; + } + return alloc_page(gfp_mask); +} + +static void __init free_mapped_page(struct page *page) +{ + if (pud_aligned) { + unsigned long pfn = page_to_pfn(page); + + free_contig_range(pfn, 1ULL << get_order(PUD_SIZE)); + return; + } + + if (pmd_aligned) { + int order = get_order(PMD_SIZE); + + free_pages((unsigned long)page_address(page), order); + return; + } + free_page((unsigned long)page_address(page)); +} + +static unsigned long __init get_random_vaddr(void) +{ + unsigned long random_vaddr, random_pages, total_user_pages; + + total_user_pages = (TASK_SIZE - FIRST_USER_ADDRESS) / PAGE_SIZE; + + random_pages = get_random_long() % total_user_pages; + random_vaddr = FIRST_USER_ADDRESS + random_pages * PAGE_SIZE; + + WARN_ON(random_vaddr > TASK_SIZE); + WARN_ON(random_vaddr < FIRST_USER_ADDRESS); + return random_vaddr; +} + +void __init debug_vm_pgtable(void) +{ + struct mm_struct *mm; + struct page *page; + pgd_t *pgdp; + p4d_t *p4dp, *saved_p4dp; + pud_t *pudp, *saved_pudp; + pmd_t *pmdp, *saved_pmdp, pmd; + pte_t *ptep; + pgtable_t saved_ptep; + pgprot_t prot; + unsigned long vaddr; + + prot = vm_get_page_prot(VMFLAGS); + vaddr = get_random_vaddr(); + mm = mm_alloc(); + if (!mm) { + pr_err("mm_struct allocation failed\n"); + return; + } + + page = alloc_mapped_page(); + if (!page) { + pr_err("memory allocation failed\n"); + return; + } + + pgdp = pgd_offset(mm, vaddr); + p4dp = p4d_alloc(mm, pgdp, vaddr); + pudp = pud_alloc(mm, p4dp, vaddr); + pmdp = pmd_alloc(mm, pudp, vaddr); + ptep = pte_alloc_map(mm, pmdp, vaddr); + + /* + * Save all the page table page addresses as the page table + * entries will be used for testing with random or garbage + * values. These saved addresses will be used for freeing + * page table pages. + */ + pmd = READ_ONCE(*pmdp); + saved_p4dp = p4d_offset(pgdp, 0UL); + saved_pudp = pud_offset(p4dp, 0UL); + saved_pmdp = pmd_offset(pudp, 0UL); + saved_ptep = pmd_pgtable(pmd); + + pte_basic_tests(page, prot); + pmd_basic_tests(page, prot); + pud_basic_tests(page, prot); + p4d_basic_tests(page, prot); + pgd_basic_tests(page, prot); + + pte_clear_tests(mm, ptep); + pmd_clear_tests(mm, pmdp); + pud_clear_tests(mm, pudp); + p4d_clear_tests(mm, p4dp); + pgd_clear_tests(mm, pgdp); + + pte_unmap(ptep); + + pmd_populate_tests(mm, pmdp, saved_ptep); + pud_populate_tests(mm, pudp, saved_pmdp); + p4d_populate_tests(mm, p4dp, saved_pudp); + pgd_populate_tests(mm, pgdp, saved_p4dp); + + p4d_free(mm, saved_p4dp); + pud_free(mm, saved_pudp); + pmd_free(mm, saved_pmdp); + pte_free(mm, saved_ptep); + + mm_dec_nr_puds(mm); + mm_dec_nr_pmds(mm); + mm_dec_nr_ptes(mm); + __mmdrop(mm); + + free_mapped_page(page); +}