From patchwork Mon Apr 13 12:53:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicholas Piggin X-Patchwork-Id: 11485531 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B8EC414B4 for ; Mon, 13 Apr 2020 12:54:26 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 77EB720732 for ; Mon, 13 Apr 2020 12:54:26 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="L5S1BJRQ" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 77EB720732 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A9E258E010D; Mon, 13 Apr 2020 08:54:25 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id A4E518E0104; Mon, 13 Apr 2020 08:54:25 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9638F8E010D; Mon, 13 Apr 2020 08:54:25 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0088.hostedemail.com [216.40.44.88]) by kanga.kvack.org (Postfix) with ESMTP id 78E3E8E0104 for ; Mon, 13 Apr 2020 08:54:25 -0400 (EDT) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 3EE003ABB for ; Mon, 13 Apr 2020 12:54:25 +0000 (UTC) X-FDA: 76702825290.01.earth98_43a3d470c1e2d X-Spam-Summary: 2,0,0,472e4374d85e49da,d41d8cd98f00b204,npiggin@gmail.com,,RULES_HIT:41:69:355:379:541:800:960:973:988:989:1260:1311:1314:1345:1359:1431:1437:1515:1535:1543:1711:1730:1747:1777:1792:2393:2553:2559:2562:2693:3138:3139:3140:3141:3142:3308:3355:3865:3867:3868:3871:3872:3874:4117:4250:5007:6119:6261:6653:6742:7514:7903:9036:9413:10004:11026:11232:11473:11658:11914:12043:12219:12291:12295:12296:12297:12438:12517:12519:12555:12683:12895:13161:13229:13894:14181:14394:14687:14721:21080:21433:21444:21451:21627:21666:21990:30003:30054:30062:30070:30074:30090,0,RBL:209.85.214.196:@gmail.com:.lbl8.mailshell.net-62.50.0.100 66.100.201.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:29,LUA_SUMMARY:none X-HE-Tag: earth98_43a3d470c1e2d X-Filterd-Recvd-Size: 6452 Received: from mail-pl1-f196.google.com (mail-pl1-f196.google.com [209.85.214.196]) by imf50.hostedemail.com (Postfix) with ESMTP for ; Mon, 13 Apr 2020 12:54:24 +0000 (UTC) Received: by mail-pl1-f196.google.com with SMTP id ay1so3369713plb.0 for ; Mon, 13 Apr 2020 05:54:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=k49Z1eeIe8RAePIWxlY/V+bEMZoWrI+Axi8oPgRhVtE=; b=L5S1BJRQ9w88zx9XItTQeKB1SWsCw9jN3PuCuTfVxmuOCKWgUewZNKRsVyQjwxHom4 ryvIxTIOl9WF3LHatElNK8cNsWnR1tE7/N5ODwBzb5+SkQvox0yCH821WVCcfqm495pa LK65a2FSueaENFn4ZvlWg3x+AHB5L2wFPRT41i9dbpsLWS2dWFfpccRmZEmhKUWPEymx WyI5x9Acu8dyiVuDncUPJrEQb0Sf9Z9Fwppf9ivQJDnmtOAgeNFzkaQYfo3pypCDwIVI ITWDYFpdOCKbVXLiHa2pbhcRH0bGRxSg+GfkJc2W005zzgPh63LtWhzwzkFkwciB+nEy N5WA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=k49Z1eeIe8RAePIWxlY/V+bEMZoWrI+Axi8oPgRhVtE=; b=rjPQ+wu/04mQrGkVhsmThvYCFORMxjWW78+RlZlCk2Q4YXIow5D7T9f+rIIIQq6aVK AXOyvlLOeNSZ9HOT/CsYJz2ytNJ0RUOdQrvcWo7fOUVbfvQz9YRZAdegu4dP0bihjbwe XgOSUKVqtYD9oAt6xWGdbgmxvnV4nTILY1thbJ2rE4Ou7ML7/EbPr592SdcDazCSZgyE cYSCvQW6bBrZHZTnOvgA+gb+JE/UXLyCExOVHhQKgQbiNrTZm12kiGU58VbgkjGtoSPf lpkQP+pX/3zJd/eB4YdFUH9eJGunqQGYGTjPufFSxEKz/Lrz1ETI2P1mpUrAKcP4vLNn RonQ== X-Gm-Message-State: AGi0Puaft1yHOT9vWCfuXo1WCUj9M7WTiy+XJ09q0j4onc78Wam/Dmfl prTwrH4iQWZYsbm12o/TPFHC21Tn X-Google-Smtp-Source: APiQypKN1B5uL3kKzB/lOZ19Pz9DdktaX2dQyGsTq0k0RExJ7gXb2rmJrqHeHeQ2h0CyWmOAFzHcaA== X-Received: by 2002:a17:90a:368d:: with SMTP id t13mr20937061pjb.175.1586782463462; Mon, 13 Apr 2020 05:54:23 -0700 (PDT) Received: from bobo.ibm.com (60-241-117-97.tpgi.com.au. [60.241.117.97]) by smtp.gmail.com with ESMTPSA id j24sm9235610pji.20.2020.04.13.05.54.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 13 Apr 2020 05:54:22 -0700 (PDT) From: Nicholas Piggin To: linux-mm@kvack.org Cc: Nicholas Piggin , linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, Catalin Marinas , Will Deacon , linux-arm-kernel@lists.infradead.org, Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" Subject: [PATCH v2 1/4] mm/vmalloc: fix vmalloc_to_page for huge vmap mappings Date: Mon, 13 Apr 2020 22:53:00 +1000 Message-Id: <20200413125303.423864-2-npiggin@gmail.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20200413125303.423864-1-npiggin@gmail.com> References: <20200413125303.423864-1-npiggin@gmail.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: vmalloc_to_page returns NULL for addresses mapped by larger pages[*]. Whether or not a vmap is huge depends on the architecture details, alignments, boot options, etc., which the caller can not be expected to know. Therefore HUGE_VMAP is a regression for vmalloc_to_page. This change teaches vmalloc_to_page about larger pages, and returns the struct page that corresponds to the offset within the large page. This makes the API agnostic to mapping implementation details. [*] As explained by commit 029c54b095995 ("mm/vmalloc.c: huge-vmap: fail gracefully on unexpected huge vmap mappings") Signed-off-by: Nicholas Piggin --- mm/vmalloc.c | 40 ++++++++++++++++++++++++++-------------- 1 file changed, 26 insertions(+), 14 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 399f219544f7..1afec7def23f 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -36,6 +36,7 @@ #include #include +#include #include #include @@ -272,7 +273,9 @@ int is_vmalloc_or_module_addr(const void *x) } /* - * Walk a vmap address to the struct page it maps. + * Walk a vmap address to the struct page it maps. Huge vmap mappings will + * return the tail page that corresponds to the base page address, which + * matches small vmap mappings. */ struct page *vmalloc_to_page(const void *vmalloc_addr) { @@ -292,25 +295,33 @@ struct page *vmalloc_to_page(const void *vmalloc_addr) if (pgd_none(*pgd)) return NULL; + if (WARN_ON_ONCE(pgd_leaf(*pgd))) + return NULL; /* XXX: no allowance for huge pgd */ + if (WARN_ON_ONCE(pgd_bad(*pgd))) + return NULL; + p4d = p4d_offset(pgd, addr); if (p4d_none(*p4d)) return NULL; - pud = pud_offset(p4d, addr); + if (p4d_leaf(*p4d)) + return p4d_page(*p4d) + ((addr & ~P4D_MASK) >> PAGE_SHIFT); + if (WARN_ON_ONCE(p4d_bad(*p4d))) + return NULL; - /* - * Don't dereference bad PUD or PMD (below) entries. This will also - * identify huge mappings, which we may encounter on architectures - * that define CONFIG_HAVE_ARCH_HUGE_VMAP=y. Such regions will be - * identified as vmalloc addresses by is_vmalloc_addr(), but are - * not [unambiguously] associated with a struct page, so there is - * no correct value to return for them. - */ - WARN_ON_ONCE(pud_bad(*pud)); - if (pud_none(*pud) || pud_bad(*pud)) + pud = pud_offset(p4d, addr); + if (pud_none(*pud)) + return NULL; + if (pud_leaf(*pud)) + return pud_page(*pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT); + if (WARN_ON_ONCE(pud_bad(*pud))) return NULL; + pmd = pmd_offset(pud, addr); - WARN_ON_ONCE(pmd_bad(*pmd)); - if (pmd_none(*pmd) || pmd_bad(*pmd)) + if (pmd_none(*pmd)) + return NULL; + if (pmd_leaf(*pmd)) + return pmd_page(*pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT); + if (WARN_ON_ONCE(pmd_bad(*pmd))) return NULL; ptep = pte_offset_map(pmd, addr); @@ -318,6 +329,7 @@ struct page *vmalloc_to_page(const void *vmalloc_addr) if (pte_present(pte)) page = pte_page(pte); pte_unmap(ptep); + return page; } EXPORT_SYMBOL(vmalloc_to_page); From patchwork Mon Apr 13 12:53:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicholas Piggin X-Patchwork-Id: 11485535 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0624514B4 for ; Mon, 13 Apr 2020 12:54:34 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9DC3D20732 for ; Mon, 13 Apr 2020 12:54:33 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="GVv3ucsL" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9DC3D20732 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C23808E010E; Mon, 13 Apr 2020 08:54:32 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id BD4D98E0104; Mon, 13 Apr 2020 08:54:32 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A74208E010E; Mon, 13 Apr 2020 08:54:32 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0131.hostedemail.com [216.40.44.131]) by kanga.kvack.org (Postfix) with ESMTP id 884168E0104 for ; Mon, 13 Apr 2020 08:54:32 -0400 (EDT) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 44AE6824556B for ; Mon, 13 Apr 2020 12:54:32 +0000 (UTC) X-FDA: 76702825584.11.low67_44a8862374551 X-Spam-Summary: 2,0,0,e77c7b17c9cbfbd4,d41d8cd98f00b204,npiggin@gmail.com,,RULES_HIT:41:69:327:355:379:541:800:960:966:973:988:989:1260:1311:1314:1345:1359:1437:1515:1605:1730:1747:1777:1792:1981:2194:2196:2199:2200:2393:2559:2562:2898:3138:3139:3140:3141:3142:3865:3866:3867:3868:3871:3872:4321:4385:4605:5007:6119:6261:6653:6742:7514:7875:7903:8603:9036:9413:9592:10004:11026:11232:11473:11658:11914:12043:12291:12296:12297:12438:12517:12519:12555:12683:12895:12986:13894:13972:14096:14394:14687:21080:21324:21433:21444:21451:21627:21666:21990:30054:30070,0,RBL:209.85.216.66:@gmail.com:.lbl8.mailshell.net-66.100.201.100 62.50.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: low67_44a8862374551 X-Filterd-Recvd-Size: 20387 Received: from mail-pj1-f66.google.com (mail-pj1-f66.google.com [209.85.216.66]) by imf46.hostedemail.com (Postfix) with ESMTP for ; Mon, 13 Apr 2020 12:54:31 +0000 (UTC) Received: by mail-pj1-f66.google.com with SMTP id np9so3790617pjb.4 for ; Mon, 13 Apr 2020 05:54:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=X7MVUjdxoTYLktr/8CAz1umz/rNBVR6DTMKwArT/SjI=; b=GVv3ucsLdjLUrSZaDItIqGtF0WWKcVpzdbpq18nnkIs2OAT34acUNvX86wGgS5qUPt n6eXXfGcFLXIPOrj5zWoabSdRtTKYBMOewFnwpkFojN4v/klGSSHHlHARq0Rs04rdwqF 6et6QgG8LMT11/1pt6TtwEcXb87MNDRoOswAzhtHJFzxutN8GEbIra19aeo6HYzC4FUg MVZDNpVwlRwzs8UdEL8banprE+O7Fbox0giyXbjLs2KUjtsq7nEO9psIbdXed/PTdfg+ 088Ra4JqNWxxV7Uvgr5IJdRTFaJcz43ERTPzE2NIV0C09uILsPK1RCZYZuejPI9i/fjv 3fYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=X7MVUjdxoTYLktr/8CAz1umz/rNBVR6DTMKwArT/SjI=; b=VidnElH5ITWhDoHWokZUSM7ua/VKi6Sa32KuVD9qSWFtkrgH0AcR9UqqUchklW6L/M t74eaLHybt/CxNZBsd25EW+Rx68OQ8rFCT9x4eC4NqRibW608Ys8o3i6fRnYjOQDF5pB QFU8807hTGckvK4CU6scKGab60x+cHB2n6aBXUpiS+8S+LcTRos9mstyJRN6KcPwJUCX eu7MVKyATP+EV91tkzDkJbX7135OYcEOzfq8Mz25TB/EvX6TiLsUcNpWXUEdvP67Jg9Q z+zwg1xXmgJZdr9wAu5If6//8ufXW73Y4Fe0SAiftSbcV9bjT5DcZyclBdanAWNK6pfA DOsw== X-Gm-Message-State: AGi0PuaFkDyN70zHGqIwL5hkR+T3vSpU4emrPb4AM0eJt+hlxXhrXaid 9i5GMyTXoPeQ8qLWJLYnagTeI2kZ X-Google-Smtp-Source: APiQypJEopX9J7e6sNl1kmYurlQKu4voKqn4JL7aajl4MYhYcb+7jAwZXAvFZLoBz74E1dpgMe756w== X-Received: by 2002:a17:902:5a0b:: with SMTP id q11mr17051128pli.23.1586782470530; Mon, 13 Apr 2020 05:54:30 -0700 (PDT) Received: from bobo.ibm.com (60-241-117-97.tpgi.com.au. [60.241.117.97]) by smtp.gmail.com with ESMTPSA id j24sm9235610pji.20.2020.04.13.05.54.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 13 Apr 2020 05:54:30 -0700 (PDT) From: Nicholas Piggin To: linux-mm@kvack.org Cc: Nicholas Piggin , linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, Catalin Marinas , Will Deacon , linux-arm-kernel@lists.infradead.org, Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" Subject: [PATCH v2 2/4] mm: Move ioremap page table mapping function to mm/ Date: Mon, 13 Apr 2020 22:53:01 +1000 Message-Id: <20200413125303.423864-3-npiggin@gmail.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20200413125303.423864-1-npiggin@gmail.com> References: <20200413125303.423864-1-npiggin@gmail.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: ioremap_page_range is a generic function to create a kernel virtual mapping, move it to mm/vmalloc.c and rename it vmap_range. For clarity with this move, also: - Rename vunmap_page_range (vmap_range's inverse) to vunmap_range. - Rename vmap_pages_range (which takes a page array) to vmap_pages. Signed-off-by: Nicholas Piggin --- include/linux/vmalloc.h | 3 + lib/ioremap.c | 182 +++--------------------------- mm/vmalloc.c | 239 ++++++++++++++++++++++++++++++++++++---- 3 files changed, 239 insertions(+), 185 deletions(-) diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 0507a162ccd0..eb8a5080e472 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -173,6 +173,9 @@ extern struct vm_struct *find_vm_area(const void *addr); extern int map_vm_area(struct vm_struct *area, pgprot_t prot, struct page **pages); #ifdef CONFIG_MMU +int vmap_range(unsigned long addr, + unsigned long end, phys_addr_t phys_addr, pgprot_t prot, + unsigned int max_page_shift); extern int map_kernel_range_noflush(unsigned long start, unsigned long size, pgprot_t prot, struct page **pages); extern void unmap_kernel_range_noflush(unsigned long addr, unsigned long size); diff --git a/lib/ioremap.c b/lib/ioremap.c index 3f0e18543de8..7e383bdc51ad 100644 --- a/lib/ioremap.c +++ b/lib/ioremap.c @@ -60,176 +60,26 @@ static inline int ioremap_pud_enabled(void) { return 0; } static inline int ioremap_pmd_enabled(void) { return 0; } #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */ -static int ioremap_pte_range(pmd_t *pmd, unsigned long addr, - unsigned long end, phys_addr_t phys_addr, pgprot_t prot) -{ - pte_t *pte; - u64 pfn; - - pfn = phys_addr >> PAGE_SHIFT; - pte = pte_alloc_kernel(pmd, addr); - if (!pte) - return -ENOMEM; - do { - BUG_ON(!pte_none(*pte)); - set_pte_at(&init_mm, addr, pte, pfn_pte(pfn, prot)); - pfn++; - } while (pte++, addr += PAGE_SIZE, addr != end); - return 0; -} - -static int ioremap_try_huge_pmd(pmd_t *pmd, unsigned long addr, - unsigned long end, phys_addr_t phys_addr, - pgprot_t prot) -{ - if (!ioremap_pmd_enabled()) - return 0; - - if ((end - addr) != PMD_SIZE) - return 0; - - if (!IS_ALIGNED(addr, PMD_SIZE)) - return 0; - - if (!IS_ALIGNED(phys_addr, PMD_SIZE)) - return 0; - - if (pmd_present(*pmd) && !pmd_free_pte_page(pmd, addr)) - return 0; - - return pmd_set_huge(pmd, phys_addr, prot); -} - -static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr, - unsigned long end, phys_addr_t phys_addr, pgprot_t prot) -{ - pmd_t *pmd; - unsigned long next; - - pmd = pmd_alloc(&init_mm, pud, addr); - if (!pmd) - return -ENOMEM; - do { - next = pmd_addr_end(addr, end); - - if (ioremap_try_huge_pmd(pmd, addr, next, phys_addr, prot)) - continue; - - if (ioremap_pte_range(pmd, addr, next, phys_addr, prot)) - return -ENOMEM; - } while (pmd++, phys_addr += (next - addr), addr = next, addr != end); - return 0; -} - -static int ioremap_try_huge_pud(pud_t *pud, unsigned long addr, - unsigned long end, phys_addr_t phys_addr, - pgprot_t prot) -{ - if (!ioremap_pud_enabled()) - return 0; - - if ((end - addr) != PUD_SIZE) - return 0; - - if (!IS_ALIGNED(addr, PUD_SIZE)) - return 0; - - if (!IS_ALIGNED(phys_addr, PUD_SIZE)) - return 0; - - if (pud_present(*pud) && !pud_free_pmd_page(pud, addr)) - return 0; - - return pud_set_huge(pud, phys_addr, prot); -} - -static inline int ioremap_pud_range(p4d_t *p4d, unsigned long addr, - unsigned long end, phys_addr_t phys_addr, pgprot_t prot) -{ - pud_t *pud; - unsigned long next; - - pud = pud_alloc(&init_mm, p4d, addr); - if (!pud) - return -ENOMEM; - do { - next = pud_addr_end(addr, end); - - if (ioremap_try_huge_pud(pud, addr, next, phys_addr, prot)) - continue; - - if (ioremap_pmd_range(pud, addr, next, phys_addr, prot)) - return -ENOMEM; - } while (pud++, phys_addr += (next - addr), addr = next, addr != end); - return 0; -} - -static int ioremap_try_huge_p4d(p4d_t *p4d, unsigned long addr, - unsigned long end, phys_addr_t phys_addr, - pgprot_t prot) -{ - if (!ioremap_p4d_enabled()) - return 0; - - if ((end - addr) != P4D_SIZE) - return 0; - - if (!IS_ALIGNED(addr, P4D_SIZE)) - return 0; - - if (!IS_ALIGNED(phys_addr, P4D_SIZE)) - return 0; - - if (p4d_present(*p4d) && !p4d_free_pud_page(p4d, addr)) - return 0; - - return p4d_set_huge(p4d, phys_addr, prot); -} - -static inline int ioremap_p4d_range(pgd_t *pgd, unsigned long addr, - unsigned long end, phys_addr_t phys_addr, pgprot_t prot) -{ - p4d_t *p4d; - unsigned long next; - - p4d = p4d_alloc(&init_mm, pgd, addr); - if (!p4d) - return -ENOMEM; - do { - next = p4d_addr_end(addr, end); - - if (ioremap_try_huge_p4d(p4d, addr, next, phys_addr, prot)) - continue; - - if (ioremap_pud_range(p4d, addr, next, phys_addr, prot)) - return -ENOMEM; - } while (p4d++, phys_addr += (next - addr), addr = next, addr != end); - return 0; -} - int ioremap_page_range(unsigned long addr, unsigned long end, phys_addr_t phys_addr, pgprot_t prot) { - pgd_t *pgd; - unsigned long start; - unsigned long next; - int err; - - might_sleep(); - BUG_ON(addr >= end); - - start = addr; - pgd = pgd_offset_k(addr); - do { - next = pgd_addr_end(addr, end); - err = ioremap_p4d_range(pgd, addr, next, phys_addr, prot); - if (err) - break; - } while (pgd++, phys_addr += (next - addr), addr = next, addr != end); - - flush_cache_vmap(start, end); + unsigned int max_page_shift = PAGE_SHIFT; + + /* + * Due to the max_page_shift parameter to vmap_range, platforms must + * enable all smaller sizes to take advantage of a given size, + * otherwise fall back to small pages. + */ + if (ioremap_pmd_enabled()) { + max_page_shift = PMD_SHIFT; + if (ioremap_pud_enabled()) { + max_page_shift = PUD_SHIFT; + if (ioremap_p4d_enabled()) + max_page_shift = P4D_SHIFT; + } + } - return err; + return vmap_range(addr, end, phys_addr, prot, max_page_shift); } #ifdef CONFIG_GENERIC_IOREMAP diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 1afec7def23f..b1bc2fcae4e0 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -128,7 +128,7 @@ static void vunmap_p4d_range(pgd_t *pgd, unsigned long addr, unsigned long end) } while (p4d++, addr = next, addr != end); } -static void vunmap_page_range(unsigned long addr, unsigned long end) +static void vunmap_range(unsigned long addr, unsigned long end) { pgd_t *pgd; unsigned long next; @@ -143,7 +143,208 @@ static void vunmap_page_range(unsigned long addr, unsigned long end) } while (pgd++, addr = next, addr != end); } -static int vmap_pte_range(pmd_t *pmd, unsigned long addr, +static int vmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, + phys_addr_t phys_addr, pgprot_t prot) +{ + pte_t *pte; + u64 pfn; + + pfn = phys_addr >> PAGE_SHIFT; + pte = pte_alloc_kernel(pmd, addr); + if (!pte) + return -ENOMEM; + do { + BUG_ON(!pte_none(*pte)); + set_pte_at(&init_mm, addr, pte, pfn_pte(pfn, prot)); + pfn++; + } while (pte++, addr += PAGE_SIZE, addr != end); + return 0; +} + +static int vmap_try_huge_pmd(pmd_t *pmd, unsigned long addr, unsigned long end, + phys_addr_t phys_addr, pgprot_t prot, + unsigned int max_page_shift) +{ + if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP)) + return 0; + + if (max_page_shift < PMD_SHIFT) + return 0; + + if ((end - addr) != PMD_SIZE) + return 0; + + if (!IS_ALIGNED(addr, PMD_SIZE)) + return 0; + + if (!IS_ALIGNED(phys_addr, PMD_SIZE)) + return 0; + + if (pmd_present(*pmd) && !pmd_free_pte_page(pmd, addr)) + return 0; + + return pmd_set_huge(pmd, phys_addr, prot); +} + +static inline int vmap_pmd_range(pud_t *pud, unsigned long addr, + unsigned long end, phys_addr_t phys_addr, pgprot_t prot, + unsigned int max_page_shift) +{ + pmd_t *pmd; + unsigned long next; + + pmd = pmd_alloc(&init_mm, pud, addr); + if (!pmd) + return -ENOMEM; + do { + next = pmd_addr_end(addr, end); + + if (vmap_try_huge_pmd(pmd, addr, next, phys_addr, prot, + max_page_shift)) + continue; + + if (vmap_pte_range(pmd, addr, next, phys_addr, prot)) + return -ENOMEM; + } while (pmd++, phys_addr += (next - addr), addr = next, addr != end); + return 0; +} + +static int vmap_try_huge_pud(pud_t *pud, unsigned long addr, + unsigned long end, phys_addr_t phys_addr, pgprot_t prot, + unsigned int max_page_shift) +{ + if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP)) + return 0; + + if (max_page_shift < PUD_SHIFT) + return 0; + + if ((end - addr) != PUD_SIZE) + return 0; + + if (!IS_ALIGNED(addr, PUD_SIZE)) + return 0; + + if (!IS_ALIGNED(phys_addr, PUD_SIZE)) + return 0; + + if (pud_present(*pud) && !pud_free_pmd_page(pud, addr)) + return 0; + + return pud_set_huge(pud, phys_addr, prot); +} + +static inline int vmap_pud_range(p4d_t *p4d, unsigned long addr, + unsigned long end, phys_addr_t phys_addr, pgprot_t prot, + unsigned int max_page_shift) +{ + pud_t *pud; + unsigned long next; + + pud = pud_alloc(&init_mm, p4d, addr); + if (!pud) + return -ENOMEM; + do { + next = pud_addr_end(addr, end); + + if (vmap_try_huge_pud(pud, addr, next, phys_addr, prot, + max_page_shift)) + continue; + + if (vmap_pmd_range(pud, addr, next, phys_addr, prot, + max_page_shift)) + return -ENOMEM; + } while (pud++, phys_addr += (next - addr), addr = next, addr != end); + return 0; +} + +static int vmap_try_huge_p4d(p4d_t *p4d, unsigned long addr, + unsigned long end, phys_addr_t phys_addr, pgprot_t prot, + unsigned int max_page_shift) +{ + if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP)) + return 0; + + if (max_page_shift < P4D_SHIFT) + return 0; + + if ((end - addr) != P4D_SIZE) + return 0; + + if (!IS_ALIGNED(addr, P4D_SIZE)) + return 0; + + if (!IS_ALIGNED(phys_addr, P4D_SIZE)) + return 0; + + if (p4d_present(*p4d) && !p4d_free_pud_page(p4d, addr)) + return 0; + + return p4d_set_huge(p4d, phys_addr, prot); +} + +static inline int vmap_p4d_range(pgd_t *pgd, unsigned long addr, + unsigned long end, phys_addr_t phys_addr, pgprot_t prot, + unsigned int max_page_shift) +{ + p4d_t *p4d; + unsigned long next; + + p4d = p4d_alloc(&init_mm, pgd, addr); + if (!p4d) + return -ENOMEM; + do { + next = p4d_addr_end(addr, end); + + if (vmap_try_huge_p4d(p4d, addr, next, phys_addr, prot, + max_page_shift)) + continue; + + if (vmap_pud_range(p4d, addr, next, phys_addr, prot, + max_page_shift)) + return -ENOMEM; + } while (p4d++, phys_addr += (next - addr), addr = next, addr != end); + return 0; +} + +static int vmap_range_noflush(unsigned long addr, + unsigned long end, phys_addr_t phys_addr, pgprot_t prot, + unsigned int max_page_shift) +{ + pgd_t *pgd; + unsigned long start; + unsigned long next; + int err; + + might_sleep(); + BUG_ON(addr >= end); + + start = addr; + pgd = pgd_offset_k(addr); + do { + next = pgd_addr_end(addr, end); + err = vmap_p4d_range(pgd, addr, next, phys_addr, prot, + max_page_shift); + if (err) + break; + } while (pgd++, phys_addr += (next - addr), addr = next, addr != end); + + return err; +} + +int vmap_range(unsigned long addr, + unsigned long end, phys_addr_t phys_addr, pgprot_t prot, + unsigned int max_page_shift) +{ + int ret; + + ret = vmap_range_noflush(addr, end, phys_addr, prot, max_page_shift); + flush_cache_vmap(addr, end); + + return ret; +} + +static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, pgprot_t prot, struct page **pages, int *nr) { pte_t *pte; @@ -169,7 +370,7 @@ static int vmap_pte_range(pmd_t *pmd, unsigned long addr, return 0; } -static int vmap_pmd_range(pud_t *pud, unsigned long addr, +static int vmap_pages_pmd_range(pud_t *pud, unsigned long addr, unsigned long end, pgprot_t prot, struct page **pages, int *nr) { pmd_t *pmd; @@ -180,13 +381,13 @@ static int vmap_pmd_range(pud_t *pud, unsigned long addr, return -ENOMEM; do { next = pmd_addr_end(addr, end); - if (vmap_pte_range(pmd, addr, next, prot, pages, nr)) + if (vmap_pages_pte_range(pmd, addr, next, prot, pages, nr)) return -ENOMEM; } while (pmd++, addr = next, addr != end); return 0; } -static int vmap_pud_range(p4d_t *p4d, unsigned long addr, +static int vmap_pages_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end, pgprot_t prot, struct page **pages, int *nr) { pud_t *pud; @@ -197,13 +398,13 @@ static int vmap_pud_range(p4d_t *p4d, unsigned long addr, return -ENOMEM; do { next = pud_addr_end(addr, end); - if (vmap_pmd_range(pud, addr, next, prot, pages, nr)) + if (vmap_pages_pmd_range(pud, addr, next, prot, pages, nr)) return -ENOMEM; } while (pud++, addr = next, addr != end); return 0; } -static int vmap_p4d_range(pgd_t *pgd, unsigned long addr, +static int vmap_pages_p4d_range(pgd_t *pgd, unsigned long addr, unsigned long end, pgprot_t prot, struct page **pages, int *nr) { p4d_t *p4d; @@ -214,7 +415,7 @@ static int vmap_p4d_range(pgd_t *pgd, unsigned long addr, return -ENOMEM; do { next = p4d_addr_end(addr, end); - if (vmap_pud_range(p4d, addr, next, prot, pages, nr)) + if (vmap_pages_pud_range(p4d, addr, next, prot, pages, nr)) return -ENOMEM; } while (p4d++, addr = next, addr != end); return 0; @@ -226,7 +427,7 @@ static int vmap_p4d_range(pgd_t *pgd, unsigned long addr, * * Ie. pte at addr+N*PAGE_SIZE shall point to pfn corresponding to pages[N] */ -static int vmap_page_range_noflush(unsigned long start, unsigned long end, +static int vmap_pages_range_noflush(unsigned long start, unsigned long end, pgprot_t prot, struct page **pages) { pgd_t *pgd; @@ -239,7 +440,7 @@ static int vmap_page_range_noflush(unsigned long start, unsigned long end, pgd = pgd_offset_k(addr); do { next = pgd_addr_end(addr, end); - err = vmap_p4d_range(pgd, addr, next, prot, pages, &nr); + err = vmap_pages_p4d_range(pgd, addr, next, prot, pages, &nr); if (err) return err; } while (pgd++, addr = next, addr != end); @@ -247,12 +448,12 @@ static int vmap_page_range_noflush(unsigned long start, unsigned long end, return nr; } -static int vmap_page_range(unsigned long start, unsigned long end, +static int vmap_pages_range(unsigned long start, unsigned long end, pgprot_t prot, struct page **pages) { int ret; - ret = vmap_page_range_noflush(start, end, prot, pages); + ret = vmap_pages_range_noflush(start, end, prot, pages); flush_cache_vmap(start, end); return ret; } @@ -1238,7 +1439,7 @@ EXPORT_SYMBOL_GPL(unregister_vmap_purge_notifier); */ static void unmap_vmap_area(struct vmap_area *va) { - vunmap_page_range(va->va_start, va->va_end); + vunmap_range(va->va_start, va->va_end); } /* @@ -1699,7 +1900,7 @@ static void vb_free(const void *addr, unsigned long size) rcu_read_unlock(); BUG_ON(!vb); - vunmap_page_range((unsigned long)addr, (unsigned long)addr + size); + vunmap_range((unsigned long)addr, (unsigned long)addr + size); if (debug_pagealloc_enabled_static()) flush_tlb_kernel_range((unsigned long)addr, @@ -1854,7 +2055,7 @@ void *vm_map_ram(struct page **pages, unsigned int count, int node, pgprot_t pro kasan_unpoison_vmalloc(mem, size); - if (vmap_page_range(addr, addr + size, prot, pages) < 0) { + if (vmap_pages_range(addr, addr + size, prot, pages) < 0) { vm_unmap_ram(mem, count); return NULL; } @@ -2020,7 +2221,7 @@ void __init vmalloc_init(void) int map_kernel_range_noflush(unsigned long addr, unsigned long size, pgprot_t prot, struct page **pages) { - return vmap_page_range_noflush(addr, addr + size, prot, pages); + return vmap_pages_range_noflush(addr, addr + size, prot, pages); } /** @@ -2039,7 +2240,7 @@ int map_kernel_range_noflush(unsigned long addr, unsigned long size, */ void unmap_kernel_range_noflush(unsigned long addr, unsigned long size) { - vunmap_page_range(addr, addr + size); + vunmap_range(addr, addr + size); } EXPORT_SYMBOL_GPL(unmap_kernel_range_noflush); @@ -2056,7 +2257,7 @@ void unmap_kernel_range(unsigned long addr, unsigned long size) unsigned long end = addr + size; flush_cache_vunmap(addr, end); - vunmap_page_range(addr, end); + vunmap_range(addr, end); flush_tlb_kernel_range(addr, end); } EXPORT_SYMBOL_GPL(unmap_kernel_range); @@ -2067,7 +2268,7 @@ int map_vm_area(struct vm_struct *area, pgprot_t prot, struct page **pages) unsigned long end = addr + get_vm_area_size(area); int err; - err = vmap_page_range(addr, end, prot, pages); + err = vmap_pages_range(addr, end, prot, pages); return err > 0 ? 0 : err; } From patchwork Mon Apr 13 12:53:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicholas Piggin X-Patchwork-Id: 11485537 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1AC1214B4 for ; Mon, 13 Apr 2020 12:54:41 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C0A8720732 for ; Mon, 13 Apr 2020 12:54:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="TgVrUESm" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C0A8720732 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id EE4C98E010F; Mon, 13 Apr 2020 08:54:39 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id EBB378E0104; Mon, 13 Apr 2020 08:54:39 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DA9AD8E010F; Mon, 13 Apr 2020 08:54:39 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0087.hostedemail.com [216.40.44.87]) by kanga.kvack.org (Postfix) with ESMTP id C1BBC8E0104 for ; Mon, 13 Apr 2020 08:54:39 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 7EEC6180AD807 for ; Mon, 13 Apr 2020 12:54:39 +0000 (UTC) X-FDA: 76702825878.20.price85_45b9764038156 X-Spam-Summary: 2,0,0,e9822c86c7c079e4,d41d8cd98f00b204,npiggin@gmail.com,,RULES_HIT:1:2:41:69:355:379:541:800:960:966:973:988:989:1260:1311:1314:1345:1359:1431:1437:1515:1605:1730:1747:1777:1792:1801:2196:2199:2393:2559:2562:2693:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:4050:4321:4385:4605:5007:6119:6261:6653:6742:7514:7875:8603:8660:9413:9592:10004:11026:11473:11657:11658:11914:12043:12296:12297:12438:12517:12519:12555:12895:12986:13148:13161:13229:13230:13894:14096:14394:14687:21080:21444:21451:21627:21666:21966:21990:30012:30029:30054,0,RBL:209.85.216.68:@gmail.com:.lbl8.mailshell.net-66.100.201.100 62.18.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: price85_45b9764038156 X-Filterd-Recvd-Size: 11181 Received: from mail-pj1-f68.google.com (mail-pj1-f68.google.com [209.85.216.68]) by imf43.hostedemail.com (Postfix) with ESMTP for ; Mon, 13 Apr 2020 12:54:39 +0000 (UTC) Received: by mail-pj1-f68.google.com with SMTP id mn19so3764548pjb.0 for ; Mon, 13 Apr 2020 05:54:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=t/wcNzaAB5pAArlx+ZjBI9wsNpeJjlLOlloWkaJo2n4=; b=TgVrUESmDUMV2qkPYNmjlgmMwQVS/rXuX+p4JfyO5oH8F93TSYD1nLlg8YLxzziZCo KvrlLoETWJNRh5ducV8daE6BuUTVDAl1vMXM5Q72je6aKY+eMRSLrxC8RWoQFaQwl4EE 87rU/kiWEXxXduA3cgMoXJvdHG2+ufl7y5bWwp/udEWoMYooNnyNHOalR7+iMriyogMK jQgHAfqvoWEw4jHqF6GPVmrp+9RzUGl0Ca8Q6vKv6ZudkHE8IXEUQ2/WBSZtyYCoReWl fOHmutSGi+FG7GH/xmlRuosY0QpoX2sc+SakFBvE+yTmDI/0yky8hBK2vk7uoffF8TA5 dpUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=t/wcNzaAB5pAArlx+ZjBI9wsNpeJjlLOlloWkaJo2n4=; b=O51GJrsRWMnvyZ+yBZFsMsHzR2vIE2rthmARnzKq3kTBjajQP/JiMcos1HPftnXouw ghsJu9Ce6+ClYU5CucT17Q0dbe3XSgb7/YUJlBVNsRFV8Hnow9zOkVFNCfSLfhCoILUF wNF2fxWn5Cy64WMleNeD8xYJ3oQ79iqcAFwWMWKJLZN/a4xKK8XJ57i5p9rhNciwk6E5 Z12avOD7R45DjlxTfG08MqWNiNh5ZV4snDs+HRN11sYNruyL6uAAMTb0dCr03qp2RxlT 2RVsaQG+c0RIvNZ7Jt47nFVerDss8buiqo/G4PO2WCEn3tS25brn08y3Rvocg1Rb2vWS yKMQ== X-Gm-Message-State: AGi0Pub8+6bIlUCRbWjk5xJFTZy5YboU3//FRlxnHEXuYX5olE94MJeO cYfcX37UZCkhru5LE3j9EWHaJq6Y X-Google-Smtp-Source: APiQypLq3osf0ABPUom7dyWk1j667e+sJMY76Pzt7difwoQJbWgnQtWDrEdJQ9QwzVXtD99r1MM5zw== X-Received: by 2002:a17:90b:3585:: with SMTP id mm5mr21755398pjb.168.1586782477948; Mon, 13 Apr 2020 05:54:37 -0700 (PDT) Received: from bobo.ibm.com (60-241-117-97.tpgi.com.au. [60.241.117.97]) by smtp.gmail.com with ESMTPSA id j24sm9235610pji.20.2020.04.13.05.54.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 13 Apr 2020 05:54:37 -0700 (PDT) From: Nicholas Piggin To: linux-mm@kvack.org Cc: Nicholas Piggin , linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, Catalin Marinas , Will Deacon , linux-arm-kernel@lists.infradead.org, Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" Subject: [PATCH v2 3/4] mm: HUGE_VMAP arch query functions cleanup Date: Mon, 13 Apr 2020 22:53:02 +1000 Message-Id: <20200413125303.423864-4-npiggin@gmail.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20200413125303.423864-1-npiggin@gmail.com> References: <20200413125303.423864-1-npiggin@gmail.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This changes the awkward approach where architectures provide init functions to determine which levels they can provide large mappings for, to one where the arch is queried for each call. This allows odd configurations to be allowed (PUD but not PMD), and will make it easier to constant-fold dead code away if the arch inlines unsupported levels. This also adds a prot argument to the arch query. This is unused currently but could help with some architectures (some powerpc implementations can't map uncacheable memory with large pages for example). The name is changed from ioremap to vmap, as it will be used more generally in the next patch. Signed-off-by: Nicholas Piggin Reported-by: kbuild test robot Reported-by: kbuild test robot Reported-by: kbuild test robot --- arch/arm64/mm/mmu.c | 8 ++-- arch/powerpc/mm/book3s64/radix_pgtable.c | 6 +-- arch/x86/mm/ioremap.c | 6 +-- include/linux/io.h | 3 -- include/linux/vmalloc.h | 10 +++++ lib/ioremap.c | 51 ++---------------------- mm/vmalloc.c | 9 +++++ 7 files changed, 33 insertions(+), 60 deletions(-) diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index a374e4f51a62..b8e381c46fa1 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -1244,12 +1244,12 @@ void *__init fixmap_remap_fdt(phys_addr_t dt_phys, int *size, pgprot_t prot) return dt_virt; } -int __init arch_ioremap_p4d_supported(void) +bool arch_vmap_p4d_supported(pgprot_t prot) { return 0; } -int __init arch_ioremap_pud_supported(void) +bool arch_vmap_pud_supported(pgprot_t prot) { /* * Only 4k granule supports level 1 block mappings. @@ -1259,9 +1259,9 @@ int __init arch_ioremap_pud_supported(void) !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS); } -int __init arch_ioremap_pmd_supported(void) +bool arch_vmap_pmd_supported(pgprot_t prot) { - /* See arch_ioremap_pud_supported() */ + /* See arch_vmap_pud_supported() */ return !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS); } diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c index 8f9edf07063a..5130e7912dd4 100644 --- a/arch/powerpc/mm/book3s64/radix_pgtable.c +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c @@ -1091,13 +1091,13 @@ void radix__ptep_modify_prot_commit(struct vm_area_struct *vma, set_pte_at(mm, addr, ptep, pte); } -int __init arch_ioremap_pud_supported(void) +bool arch_vmap_pud_supported(pgprot_t prot) { /* HPT does not cope with large pages in the vmalloc area */ return radix_enabled(); } -int __init arch_ioremap_pmd_supported(void) +bool arch_vmap_pmd_supported(pgprot_t prot) { return radix_enabled(); } @@ -1191,7 +1191,7 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) return 1; } -int __init arch_ioremap_p4d_supported(void) +bool arch_vmap_p4d_supported(pgprot_t prot) { return 0; } diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c index 18c637c0dc6f..bb4b75c344e4 100644 --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -481,12 +481,12 @@ void iounmap(volatile void __iomem *addr) } EXPORT_SYMBOL(iounmap); -int __init arch_ioremap_p4d_supported(void) +bool arch_vmap_p4d_supported(pgprot_t prot) { return 0; } -int __init arch_ioremap_pud_supported(void) +bool arch_vmap_pud_supported(pgprot_t prot) { #ifdef CONFIG_X86_64 return boot_cpu_has(X86_FEATURE_GBPAGES); @@ -495,7 +495,7 @@ int __init arch_ioremap_pud_supported(void) #endif } -int __init arch_ioremap_pmd_supported(void) +bool arch_vmap_pmd_supported(pgprot_t prot) { return boot_cpu_has(X86_FEATURE_PSE); } diff --git a/include/linux/io.h b/include/linux/io.h index 8394c56babc2..2832e051bc2e 100644 --- a/include/linux/io.h +++ b/include/linux/io.h @@ -33,9 +33,6 @@ static inline int ioremap_page_range(unsigned long addr, unsigned long end, #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP void __init ioremap_huge_init(void); -int arch_ioremap_p4d_supported(void); -int arch_ioremap_pud_supported(void); -int arch_ioremap_pmd_supported(void); #else static inline void ioremap_huge_init(void) { } #endif diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index eb8a5080e472..291313a7e663 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -84,6 +84,16 @@ struct vmap_area { }; }; +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP +bool arch_vmap_p4d_supported(pgprot_t prot); +bool arch_vmap_pud_supported(pgprot_t prot); +bool arch_vmap_pmd_supported(pgprot_t prot); +#else +static inline bool arch_vmap_p4d_supported(pgprot_t prot) { return false; } +static inline bool arch_vmap_pud_supported(pgprot_t prot) { return false; } +static inline bool arch_vmap_pmd_supported(prprot_t prot) { return false; } +#endif + /* * Highlevel APIs for driver use */ diff --git a/lib/ioremap.c b/lib/ioremap.c index 7e383bdc51ad..0a1ddf1a1286 100644 --- a/lib/ioremap.c +++ b/lib/ioremap.c @@ -14,10 +14,9 @@ #include #include +static unsigned int __read_mostly max_page_shift = PAGE_SHIFT; + #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP -static int __read_mostly ioremap_p4d_capable; -static int __read_mostly ioremap_pud_capable; -static int __read_mostly ioremap_pmd_capable; static int __read_mostly ioremap_huge_disabled; static int __init set_nohugeiomap(char *str) @@ -29,56 +28,14 @@ early_param("nohugeiomap", set_nohugeiomap); void __init ioremap_huge_init(void) { - if (!ioremap_huge_disabled) { - if (arch_ioremap_p4d_supported()) - ioremap_p4d_capable = 1; - if (arch_ioremap_pud_supported()) - ioremap_pud_capable = 1; - if (arch_ioremap_pmd_supported()) - ioremap_pmd_capable = 1; - } -} - -static inline int ioremap_p4d_enabled(void) -{ - return ioremap_p4d_capable; -} - -static inline int ioremap_pud_enabled(void) -{ - return ioremap_pud_capable; + if (!ioremap_huge_disabled) + max_page_shift = P4D_SHIFT; } - -static inline int ioremap_pmd_enabled(void) -{ - return ioremap_pmd_capable; -} - -#else /* !CONFIG_HAVE_ARCH_HUGE_VMAP */ -static inline int ioremap_p4d_enabled(void) { return 0; } -static inline int ioremap_pud_enabled(void) { return 0; } -static inline int ioremap_pmd_enabled(void) { return 0; } #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */ int ioremap_page_range(unsigned long addr, unsigned long end, phys_addr_t phys_addr, pgprot_t prot) { - unsigned int max_page_shift = PAGE_SHIFT; - - /* - * Due to the max_page_shift parameter to vmap_range, platforms must - * enable all smaller sizes to take advantage of a given size, - * otherwise fall back to small pages. - */ - if (ioremap_pmd_enabled()) { - max_page_shift = PMD_SHIFT; - if (ioremap_pud_enabled()) { - max_page_shift = PUD_SHIFT; - if (ioremap_p4d_enabled()) - max_page_shift = P4D_SHIFT; - } - } - return vmap_range(addr, end, phys_addr, prot, max_page_shift); } diff --git a/mm/vmalloc.c b/mm/vmalloc.c index b1bc2fcae4e0..c898d16ddd25 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -171,6 +171,9 @@ static int vmap_try_huge_pmd(pmd_t *pmd, unsigned long addr, unsigned long end, if (max_page_shift < PMD_SHIFT) return 0; + if (!arch_vmap_pmd_supported(prot)) + return 0; + if ((end - addr) != PMD_SIZE) return 0; @@ -219,6 +222,9 @@ static int vmap_try_huge_pud(pud_t *pud, unsigned long addr, if (max_page_shift < PUD_SHIFT) return 0; + if (!arch_vmap_pud_supported(prot)) + return 0; + if ((end - addr) != PUD_SIZE) return 0; @@ -268,6 +274,9 @@ static int vmap_try_huge_p4d(p4d_t *p4d, unsigned long addr, if (max_page_shift < P4D_SHIFT) return 0; + if (!arch_vmap_p4d_supported(prot)) + return 0; + if ((end - addr) != P4D_SIZE) return 0; From patchwork Mon Apr 13 12:53:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicholas Piggin X-Patchwork-Id: 11485541 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1FD9515AB for ; Mon, 13 Apr 2020 12:54:49 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C62B220732 for ; Mon, 13 Apr 2020 12:54:48 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Q9cq4BSX" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C62B220732 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E46B28E0110; Mon, 13 Apr 2020 08:54:47 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id DF70B8E0104; Mon, 13 Apr 2020 08:54:47 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CE7458E0110; Mon, 13 Apr 2020 08:54:47 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0057.hostedemail.com [216.40.44.57]) by kanga.kvack.org (Postfix) with ESMTP id B6D4D8E0104 for ; Mon, 13 Apr 2020 08:54:47 -0400 (EDT) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 729B1181AEF1D for ; Mon, 13 Apr 2020 12:54:47 +0000 (UTC) X-FDA: 76702826214.25.bit39_46e0acba0bc5d X-Spam-Summary: 2,0,0,a0dc0b9476272914,d41d8cd98f00b204,npiggin@gmail.com,,RULES_HIT:4:41:69:355:379:541:800:960:966:973:988:989:1260:1311:1314:1345:1359:1437:1515:1605:1730:1747:1777:1792:2194:2196:2198:2199:2200:2201:2393:2553:2559:2562:2637:2689:2899:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:3874:4250:4321:4385:4605:5007:6117:6119:6261:6653:6742:7208:7514:7875:7903:8603:9036:9413:10004:11026:11232:11233:11473:11658:11914:12043:12291:12296:12297:12438:12517:12519:12555:12663:12683:12895:12986:13161:13229:13548:13894:14096:14394:14687:21080:21433:21444:21451:21611:21627:21666:21990:30054:30056:30090,0,RBL:209.85.215.193:@gmail.com:.lbl8.mailshell.net-66.100.201.100 62.50.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: bit39_46e0acba0bc5d X-Filterd-Recvd-Size: 15103 Received: from mail-pg1-f193.google.com (mail-pg1-f193.google.com [209.85.215.193]) by imf28.hostedemail.com (Postfix) with ESMTP for ; Mon, 13 Apr 2020 12:54:46 +0000 (UTC) Received: by mail-pg1-f193.google.com with SMTP id d17so4423794pgo.0 for ; Mon, 13 Apr 2020 05:54:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Qk4un8ULMdXpM0K9LjOM98cSXRzF1nvffzEAv4Kshq8=; b=Q9cq4BSXvpDJkF44OYvCY3c1oNlMk/HQ/nVFmgKXULYf5hnvynyl4LikGYswNKiEGx /yI4NA4BAwec+DJ3xhTEO0gofWhtNsGyf/hrZsBHWCpFKdDzKptE6HacW24fZC43hd2B lG9h6Woq//WusKGad+EiF+BpB0Hhlj0V2OYr6VuW4/mWs8SUEfQvSBaGxx2CwbivYy0P z4B2EE3rOjqIN6Kup7jkFz6VasY4jB7+ESfCMYXSdBdo0bXRccODX4ziXzXrF162FjkF /oxw3WE+k0Q+lf9GxyYv5KWuztzuUHdjSEQWPxQ2yEUHEB6xZtR9h6YclLmVop3r8z9S c/tw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Qk4un8ULMdXpM0K9LjOM98cSXRzF1nvffzEAv4Kshq8=; b=iJHsgVS+KZKJTjCu78jZDPkH69BtXjq7/pmAxzp2ntyiKHb3cIapSITaTNYvTHMlqh lmM+pAMLhPY5d1pkGHmAfqBlIBj+M8mxpPCQsa0F5iSKddsd2sLVCyyLrSFeai1w9o2a 09YfUBuL+2E8dbu2+t2HJzQI4/5JQ0ZgYCHiKRycxtpFsKp7TnABeEBSjzbE3xCj+4HA 8xyG/+b3g0aOKcD1CsJAHkMZzxajCDhKT2DwELZiIpdAMCNJpQloigQusGD+pCxuKgFh eqRg4GX5i3IN0eGzzCIc/u0KIE/3QF0DQFtN2Wm+g4BnzDeWqcP/nqTUhwNttS95X+3Q Cw3Q== X-Gm-Message-State: AGi0PuYY5+Q5+TucfTMsW/sV17YeBG4esZo6tbkvIUnTtqpj67fqeJFl ZVl4cS0aD9pU329j8T4h8lHGcDEq X-Google-Smtp-Source: APiQypIDipqbTZlvQxQvYbcGI5NK+NXEeeCaaI1z/YyLpYU1GKzS0bbZemmOMw2bVFHR03T3+hzcDw== X-Received: by 2002:a62:7b84:: with SMTP id w126mr18057331pfc.202.1586782485609; Mon, 13 Apr 2020 05:54:45 -0700 (PDT) Received: from bobo.ibm.com (60-241-117-97.tpgi.com.au. [60.241.117.97]) by smtp.gmail.com with ESMTPSA id j24sm9235610pji.20.2020.04.13.05.54.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 13 Apr 2020 05:54:45 -0700 (PDT) From: Nicholas Piggin To: linux-mm@kvack.org Cc: Nicholas Piggin , linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, Catalin Marinas , Will Deacon , linux-arm-kernel@lists.infradead.org, Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" Subject: [PATCH v2 4/4] mm/vmalloc: Hugepage vmalloc mappings Date: Mon, 13 Apr 2020 22:53:03 +1000 Message-Id: <20200413125303.423864-5-npiggin@gmail.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20200413125303.423864-1-npiggin@gmail.com> References: <20200413125303.423864-1-npiggin@gmail.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: For platforms that define HAVE_ARCH_HUGE_VMAP and support PMD vmap mappings, have vmalloc attempt to allocate PMD-sized pages first, before falling back to small pages. Allocations which use something other than PAGE_KERNEL protections are not permitted to use huge pages yet, not all callers expect this (e.g., module allocations vs strict module rwx). This gives a 6x reduction in dTLB misses for a `git diff` (of linux), from 45600 to 6500 and a 2.2% reduction in cycles on a 2-node POWER9. This can result in more internal fragmentation and memory overhead for a given allocation. It can also cause greater NUMA unbalance on hashdist allocations. There may be other callers that expect small pages under vmalloc but use PAGE_KERNEL, I'm not sure if it's feasible to catch them all. An alternative would be a new function or flag which enables large mappings, and use that in callers. Signed-off-by: Nicholas Piggin --- include/linux/vmalloc.h | 2 + mm/vmalloc.c | 135 +++++++++++++++++++++++++++++----------- 2 files changed, 102 insertions(+), 35 deletions(-) diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 291313a7e663..853b82eac192 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -24,6 +24,7 @@ struct notifier_block; /* in notifier.h */ #define VM_UNINITIALIZED 0x00000020 /* vm_struct is not fully initialized */ #define VM_NO_GUARD 0x00000040 /* don't add guard page */ #define VM_KASAN 0x00000080 /* has allocated kasan shadow memory */ +#define VM_HUGE_PAGES 0x00000100 /* may use huge pages */ /* * VM_KASAN is used slighly differently depending on CONFIG_KASAN_VMALLOC. @@ -58,6 +59,7 @@ struct vm_struct { unsigned long size; unsigned long flags; struct page **pages; + unsigned int page_order; unsigned int nr_pages; phys_addr_t phys_addr; const void *caller; diff --git a/mm/vmalloc.c b/mm/vmalloc.c index c898d16ddd25..7b7e992c5ff1 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -436,7 +436,7 @@ static int vmap_pages_p4d_range(pgd_t *pgd, unsigned long addr, * * Ie. pte at addr+N*PAGE_SIZE shall point to pfn corresponding to pages[N] */ -static int vmap_pages_range_noflush(unsigned long start, unsigned long end, +static int vmap_small_pages_range_noflush(unsigned long start, unsigned long end, pgprot_t prot, struct page **pages) { pgd_t *pgd; @@ -457,13 +457,44 @@ static int vmap_pages_range_noflush(unsigned long start, unsigned long end, return nr; } +static int vmap_pages_range_noflush(unsigned long start, unsigned long end, + pgprot_t prot, struct page **pages, + unsigned int page_shift) +{ + if (page_shift == PAGE_SIZE) { + return vmap_small_pages_range_noflush(start, end, prot, pages); + } else { + unsigned long addr = start; + unsigned int i, nr = (end - start) >> page_shift; + + for (i = 0; i < nr; i++) { + int err; + + err = vmap_range_noflush(addr, + addr + (1UL << page_shift), + __pa(page_address(pages[i])), prot, + page_shift); + if (err) + return err; + + addr += 1UL << page_shift; + } + + return 0; + } +} + static int vmap_pages_range(unsigned long start, unsigned long end, - pgprot_t prot, struct page **pages) + pgprot_t prot, struct page **pages, + unsigned int page_shift) { int ret; - ret = vmap_pages_range_noflush(start, end, prot, pages); + BUG_ON(page_shift < PAGE_SHIFT); + + ret = vmap_pages_range_noflush(start, end, prot, pages, page_shift); flush_cache_vmap(start, end); + return ret; } @@ -2064,7 +2095,7 @@ void *vm_map_ram(struct page **pages, unsigned int count, int node, pgprot_t pro kasan_unpoison_vmalloc(mem, size); - if (vmap_pages_range(addr, addr + size, prot, pages) < 0) { + if (vmap_pages_range(addr, addr + size, prot, pages, PAGE_SHIFT) < 0) { vm_unmap_ram(mem, count); return NULL; } @@ -2230,7 +2261,7 @@ void __init vmalloc_init(void) int map_kernel_range_noflush(unsigned long addr, unsigned long size, pgprot_t prot, struct page **pages) { - return vmap_pages_range_noflush(addr, addr + size, prot, pages); + return vmap_pages_range_noflush(addr, addr + size, prot, pages, PAGE_SHIFT); } /** @@ -2277,7 +2308,7 @@ int map_vm_area(struct vm_struct *area, pgprot_t prot, struct page **pages) unsigned long end = addr + get_vm_area_size(area); int err; - err = vmap_pages_range(addr, end, prot, pages); + err = vmap_pages_range(addr, end, prot, pages, PAGE_SHIFT); return err > 0 ? 0 : err; } @@ -2325,9 +2356,11 @@ static struct vm_struct *__get_vm_area_node(unsigned long size, if (unlikely(!size)) return NULL; - if (flags & VM_IOREMAP) - align = 1ul << clamp_t(int, get_count_order_long(size), - PAGE_SHIFT, IOREMAP_MAX_ORDER); + if (flags & VM_IOREMAP) { + align = max(align, + 1ul << clamp_t(int, get_count_order_long(size), + PAGE_SHIFT, IOREMAP_MAX_ORDER)); + } area = kzalloc_node(sizeof(*area), gfp_mask & GFP_RECLAIM_MASK, node); if (unlikely(!area)) @@ -2534,7 +2567,7 @@ static void __vunmap(const void *addr, int deallocate_pages) struct page *page = area->pages[i]; BUG_ON(!page); - __free_pages(page, 0); + __free_pages(page, area->page_order); } atomic_long_sub(area->nr_pages, &nr_vmalloc_pages); @@ -2672,26 +2705,29 @@ void *vmap(struct page **pages, unsigned int count, EXPORT_SYMBOL(vmap); static void *__vmalloc_node(unsigned long size, unsigned long align, - gfp_t gfp_mask, pgprot_t prot, - int node, const void *caller); + gfp_t gfp_mask, pgprot_t prot, unsigned long vm_flags, + int node, const void *caller); static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, - pgprot_t prot, int node) + pgprot_t prot, unsigned int page_shift, + int node) { struct page **pages; + unsigned long addr = (unsigned long)area->addr; + unsigned long size = get_vm_area_size(area); + unsigned int page_order = page_shift - PAGE_SHIFT; unsigned int nr_pages, array_size, i; const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO; const gfp_t alloc_mask = gfp_mask | __GFP_NOWARN; const gfp_t highmem_mask = (gfp_mask & (GFP_DMA | GFP_DMA32)) ? - 0 : - __GFP_HIGHMEM; + 0 : __GFP_HIGHMEM; - nr_pages = get_vm_area_size(area) >> PAGE_SHIFT; + nr_pages = size >> page_shift; array_size = (nr_pages * sizeof(struct page *)); /* Please note that the recursion is strictly bounded. */ if (array_size > PAGE_SIZE) { pages = __vmalloc_node(array_size, 1, nested_gfp|highmem_mask, - PAGE_KERNEL, node, area->caller); + PAGE_KERNEL, 0, node, area->caller); } else { pages = kmalloc_node(array_size, nested_gfp, node); } @@ -2704,14 +2740,13 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, area->pages = pages; area->nr_pages = nr_pages; + area->page_order = page_order; for (i = 0; i < area->nr_pages; i++) { struct page *page; - if (node == NUMA_NO_NODE) - page = alloc_page(alloc_mask|highmem_mask); - else - page = alloc_pages_node(node, alloc_mask|highmem_mask, 0); + page = alloc_pages_node(node, + alloc_mask|highmem_mask, page_order); if (unlikely(!page)) { /* Successfully allocated i pages, free them in __vunmap() */ @@ -2725,8 +2760,9 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, } atomic_long_add(area->nr_pages, &nr_vmalloc_pages); - if (map_vm_area(area, prot, pages)) + if (vmap_pages_range(addr, addr + size, prot, pages, page_shift) < 0) goto fail; + return area->addr; fail: @@ -2760,22 +2796,39 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align, pgprot_t prot, unsigned long vm_flags, int node, const void *caller) { - struct vm_struct *area; + struct vm_struct *area = NULL; void *addr; unsigned long real_size = size; + unsigned long real_align = align; + unsigned int shift = PAGE_SHIFT; size = PAGE_ALIGN(size); if (!size || (size >> PAGE_SHIFT) > totalram_pages()) goto fail; - area = __get_vm_area_node(real_size, align, VM_ALLOC | VM_UNINITIALIZED | + if (IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP) && + (vm_flags & VM_HUGE_PAGES)) { + unsigned long size_per_node; + + size_per_node = size; + if (node == NUMA_NO_NODE) + size_per_node /= num_online_nodes(); + if (size_per_node >= PMD_SIZE) + shift = PMD_SHIFT; + } + +again: + align = max(real_align, 1UL << shift); + size = ALIGN(real_size, align); + + area = __get_vm_area_node(size, align, VM_ALLOC | VM_UNINITIALIZED | vm_flags, start, end, node, gfp_mask, caller); if (!area) goto fail; - addr = __vmalloc_area_node(area, gfp_mask, prot, node); + addr = __vmalloc_area_node(area, gfp_mask, prot, shift, node); if (!addr) - return NULL; + goto fail; /* * In this function, newly allocated vm_struct has VM_UNINITIALIZED @@ -2789,8 +2842,16 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align, return addr; fail: - warn_alloc(gfp_mask, NULL, + if (shift > PAGE_SHIFT) { + shift = PAGE_SHIFT; + goto again; + } + + if (!area) { + /* Warn for area allocation, page allocations already warn */ + warn_alloc(gfp_mask, NULL, "vmalloc: allocation failure: %lu bytes", real_size); + } return NULL; } @@ -2825,16 +2886,19 @@ EXPORT_SYMBOL_GPL(__vmalloc_node_range); * Return: pointer to the allocated memory or %NULL on error */ static void *__vmalloc_node(unsigned long size, unsigned long align, - gfp_t gfp_mask, pgprot_t prot, - int node, const void *caller) + gfp_t gfp_mask, pgprot_t prot, unsigned long vm_flags, + int node, const void *caller) { return __vmalloc_node_range(size, align, VMALLOC_START, VMALLOC_END, - gfp_mask, prot, 0, node, caller); + gfp_mask, prot, vm_flags, node, caller); } void *__vmalloc(unsigned long size, gfp_t gfp_mask, pgprot_t prot) { - return __vmalloc_node(size, 1, gfp_mask, prot, NUMA_NO_NODE, + unsigned long vm_flags = 0; + if (pgprot_val(prot) == pgprot_val(PAGE_KERNEL)) + vm_flags |= VM_HUGE_PAGES; + return __vmalloc_node(size, 1, gfp_mask, prot, vm_flags, NUMA_NO_NODE, __builtin_return_address(0)); } EXPORT_SYMBOL(__vmalloc); @@ -2842,7 +2906,7 @@ EXPORT_SYMBOL(__vmalloc); static inline void *__vmalloc_node_flags(unsigned long size, int node, gfp_t flags) { - return __vmalloc_node(size, 1, flags, PAGE_KERNEL, + return __vmalloc_node(size, 1, flags, PAGE_KERNEL, VM_HUGE_PAGES, node, __builtin_return_address(0)); } @@ -2850,7 +2914,8 @@ static inline void *__vmalloc_node_flags(unsigned long size, void *__vmalloc_node_flags_caller(unsigned long size, int node, gfp_t flags, void *caller) { - return __vmalloc_node(size, 1, flags, PAGE_KERNEL, node, caller); + return __vmalloc_node(size, 1, flags, PAGE_KERNEL, VM_HUGE_PAGES, + node, caller); } /** @@ -2925,7 +2990,7 @@ EXPORT_SYMBOL(vmalloc_user); */ void *vmalloc_node(unsigned long size, int node) { - return __vmalloc_node(size, 1, GFP_KERNEL, PAGE_KERNEL, + return __vmalloc_node(size, 1, GFP_KERNEL, PAGE_KERNEL, VM_HUGE_PAGES, node, __builtin_return_address(0)); } EXPORT_SYMBOL(vmalloc_node); @@ -3014,7 +3079,7 @@ void *vmalloc_exec(unsigned long size) */ void *vmalloc_32(unsigned long size) { - return __vmalloc_node(size, 1, GFP_VMALLOC32, PAGE_KERNEL, + return __vmalloc_node(size, 1, GFP_VMALLOC32, PAGE_KERNEL, 0, NUMA_NO_NODE, __builtin_return_address(0)); } EXPORT_SYMBOL(vmalloc_32);