From patchwork Mon Jul 10 16:08:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Aneesh Kumar K.V" X-Patchwork-Id: 13307373 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C551EB64DC for ; Mon, 10 Jul 2023 16:10:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 373D08E0002; Mon, 10 Jul 2023 12:10:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 323E58E0001; Mon, 10 Jul 2023 12:10:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 19E888E0002; Mon, 10 Jul 2023 12:10:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 08A198E0001 for ; Mon, 10 Jul 2023 12:10:23 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id DAE1DC01B7 for ; Mon, 10 Jul 2023 16:10:22 +0000 (UTC) X-FDA: 80996189484.19.83CBCCB Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by imf10.hostedemail.com (Postfix) with ESMTP id 40FABC0010 for ; Mon, 10 Jul 2023 16:10:19 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=J9OjxajT; dmarc=pass (policy=none) header.from=ibm.com; spf=pass (imf10.hostedemail.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689005420; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8eWjFBDf+yRxnTJPxKOQhUOIQyyFB6yshvJb80Wv8oI=; b=0hgjHnIjJ69kWTwaYdmOI0CpZ0Tyqvy+Cq38uK5F3AAAvTFJBSVWQo2iEU8D2InZuhkevL O4i6UZbYhacOB53go6hYzKCbaAWuEON2cmk4d/uB/7sUsFcX6NjYASjrst5jbaPHxSZyN6 YGjN571l6VxEtwnvFZ/ttcOKu/BIGpQ= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=J9OjxajT; dmarc=pass (policy=none) header.from=ibm.com; spf=pass (imf10.hostedemail.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689005420; a=rsa-sha256; cv=none; b=nzgFCkg00ia7B8cUYIW9pLxf4Z+CvyHZkb+rBqsVRv10bTrOeKFwH+z7VLTN7B5B4lbeXJ o7w9nF/UEpuvMTaHVQrF4q503UigxWR7eraE+sWVRDGmNXi9PkPTHiOcgPgMTmyRQkqoae g8DZgeQDIHkRFJ60K2y1M7Q0IdJcjqY= Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 36AG2W2l000361; Mon, 10 Jul 2023 16:10:08 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=8eWjFBDf+yRxnTJPxKOQhUOIQyyFB6yshvJb80Wv8oI=; b=J9OjxajTo61iglypLYbDg6nwWp8fJtVuc2KdB8EtsVmF5YzGCkaDo29cOpb5YEEV+4UT et97iVDsIyH7qjPKOHG9mEl8kCLB5VHsyUgvUoBdnUCGXvP2SQYJbv34vWHMW6jkUyr6 kagNgAK2gCkG/3VirbrXa2oo39jPTy+Fe68/fJ5MCoyZWsuDGiVrQpf/Djv1KEzzzdi+ jfQnDaL7xyk5AXU2LNuIdSGg75Kg74WEFLgQ8+YUFtJfT7/b7+ke8x5KRfqvc9mDLUy+ N5hPUKjQwysvTV4dUzvhw9WrpueFUrNDuKKjnPdkGI2alU11aJfEXANG3kIUttPuif1L 1Q== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3rrn5k08bp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 10 Jul 2023 16:10:08 +0000 Received: from m0360072.ppops.net (m0360072.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 36AG3CmA002316; Mon, 10 Jul 2023 16:10:07 GMT Received: from ppma04wdc.us.ibm.com (1a.90.2fa9.ip4.static.sl-reverse.com [169.47.144.26]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3rrn5k08bb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 10 Jul 2023 16:10:07 +0000 Received: from pps.filterd (ppma04wdc.us.ibm.com [127.0.0.1]) by ppma04wdc.us.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 36AFeVWr009382; Mon, 10 Jul 2023 16:10:06 GMT Received: from smtprelay07.wdc07v.mail.ibm.com ([9.208.129.116]) by ppma04wdc.us.ibm.com (PPS) with ESMTPS id 3rpye5jq2y-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 10 Jul 2023 16:10:06 +0000 Received: from smtpav03.dal12v.mail.ibm.com (smtpav03.dal12v.mail.ibm.com [10.241.53.102]) by smtprelay07.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 36AGA4d350790782 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 10 Jul 2023 16:10:05 GMT Received: from smtpav03.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B6EF258060; Mon, 10 Jul 2023 16:10:04 +0000 (GMT) Received: from smtpav03.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9F4A65803F; Mon, 10 Jul 2023 16:09:58 +0000 (GMT) Received: from skywalker.ibmuc.com (unknown [9.43.9.86]) by smtpav03.dal12v.mail.ibm.com (Postfix) with ESMTP; Mon, 10 Jul 2023 16:09:58 +0000 (GMT) From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, akpm@linux-foundation.org, mpe@ellerman.id.au, linuxppc-dev@lists.ozlabs.org, npiggin@gmail.com, christophe.leroy@csgroup.eu Cc: Oscar Salvador , Mike Kravetz , Dan Williams , Joao Martins , Catalin Marinas , Muchun Song , Will Deacon , "Aneesh Kumar K.V" Subject: [PATCH v4 11/13] powerpc/book3s64/radix: Add support for vmemmap optimization for radix Date: Mon, 10 Jul 2023 21:38:40 +0530 Message-ID: <20230710160842.56300-12-aneesh.kumar@linux.ibm.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230710160842.56300-1-aneesh.kumar@linux.ibm.com> References: <20230710160842.56300-1-aneesh.kumar@linux.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: 1ezUKs4p5mrWP-OhtWV-y-IFI9hLWlnl X-Proofpoint-GUID: 2WTYTIbQhjMx1pcM7ekG2EPTpSFKXuMh X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.591,FMLib:17.11.176.26 definitions=2023-07-10_12,2023-07-06_02,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 impostorscore=0 phishscore=0 suspectscore=0 mlxlogscore=999 spamscore=0 bulkscore=0 lowpriorityscore=0 priorityscore=1501 malwarescore=0 mlxscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2305260000 definitions=main-2307100145 X-Rspam-User: X-Stat-Signature: 9zadkuzjg66epfrimgdj4fpj9ugr18es X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 40FABC0010 X-HE-Tag: 1689005419-769926 X-HE-Meta: U2FsdGVkX1/ZQ1iFIq70ozMO25D45qLkiubQjC57MsGvROA+zRK4jQv7SQeMjdfRhbFILKzBPDHwz1GpvjVcE9dezTO7kbIV7pOVGHwLMF9rw88UW5vi+SDK8UqYVQ1Utbn0pMOalq0eAvdsS6i1Q5B4HmXFMx9nCC5EpfBYBD3HgZQUE5ceoVQKDTtQa/AQGXhF3fqSsO6Yi6yOtx8uABybWCm/fdnP4tu3HOOxh/TX08ZHhwwjxXUd264Lx2HZ37Q4qa7qbtpr09jY3M27enuXcjUyk7pH3Qek55F9C0uOlVpjRoCTjtnHFX+3tNY3w7Q0Q1uB6urdAmlL/deEtK46jkkBhFh5YnAki85GRMOL/BeZIQ58dW8+nn10mS1BTNcyLcvXe7gDxldPV9RLk/JibJJGgB3mavM/SGE31pEHxU119TRQNjTqmq60prRLVGQVFmGOr9CYDG5K9G5L8Pmn2l0jfsD6c7Abq1EaCiE/YM/eazGGRurH8pFA0Fl0g9M/gDS621lzpFAh2ycbABk5Ht53pU9sORocMMznobeeTWcVRRYplP/TIU/YZr+0bWtI514fUFD7QTeZfwwUrhdQWjwdtOp3hhthvOHycQYpzeOBDidx03dgn3Lob306zK57J6QWHkJSREZEQxii/JT7+KrdarZZoASUJFiIhDah1ExEWO9Vg01NqTTmgY6qHjxiNpX3KXjGzd29n4E+bO+9db0vUUZJf0fxFJzBwqsoE/G5TG9RDK6qa0xQMU+zOhY9fE1xYkgp27EHrEEJE+UKzzf/BU17CK+0X/r3NkbyV6zoIt07+AJACPqrMwyvXnHRG7QnG5IwsKm41zjCbUVS2zoIb8ywk5/TBbSOVu1XLc2A7A/A4Dp0lTk4aqe4095Uniw+JiceWMHtVAy8C5iL40SCPTyJqKafyHXJ/H7D3coKQrh3vPX7kFCCFVYXsedetEqldLIf+oyQgpk SaSKO978 PpKYT8IYpZdh/LHpfl1FBVHi6lNNxPjiQ1Fp7VOi9O93iex0idQLD1sisjeAau1ryoIr8E0XHUujfX2c2toIYixcxfftOOEJdlDL00Jb4c8LUtzsDWQoxiUFC50+Xxcdb0DATxXR2qFY9q/A= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: With 2M PMD-level mapping, we require 32 struct pages and a single vmemmap page can contain 1024 struct pages (PAGE_SIZE/sizeof(struct page)). Hence with 64K page size, we don't use vmemmap deduplication for PMD-level mapping. Signed-off-by: Aneesh Kumar K.V --- Documentation/mm/vmemmap_dedup.rst | 1 + Documentation/powerpc/index.rst | 1 + Documentation/powerpc/vmemmap_dedup.rst | 101 ++++++++++ arch/powerpc/Kconfig | 1 + arch/powerpc/include/asm/book3s/64/radix.h | 9 + arch/powerpc/mm/book3s64/radix_pgtable.c | 203 +++++++++++++++++++++ 6 files changed, 316 insertions(+) create mode 100644 Documentation/powerpc/vmemmap_dedup.rst diff --git a/Documentation/mm/vmemmap_dedup.rst b/Documentation/mm/vmemmap_dedup.rst index a4b12ff906c4..c573e08b5043 100644 --- a/Documentation/mm/vmemmap_dedup.rst +++ b/Documentation/mm/vmemmap_dedup.rst @@ -210,6 +210,7 @@ the device (altmap). The following page sizes are supported in DAX: PAGE_SIZE (4K on x86_64), PMD_SIZE (2M on x86_64) and PUD_SIZE (1G on x86_64). +For powerpc equivalent details see Documentation/powerpc/vmemmap_dedup.rst The differences with HugeTLB are relatively minor. diff --git a/Documentation/powerpc/index.rst b/Documentation/powerpc/index.rst index d33b554ca7ba..a50834798454 100644 --- a/Documentation/powerpc/index.rst +++ b/Documentation/powerpc/index.rst @@ -36,6 +36,7 @@ powerpc ultravisor vas-api vcpudispatch_stats + vmemmap_dedup features diff --git a/Documentation/powerpc/vmemmap_dedup.rst b/Documentation/powerpc/vmemmap_dedup.rst new file mode 100644 index 000000000000..dc4db59fdf87 --- /dev/null +++ b/Documentation/powerpc/vmemmap_dedup.rst @@ -0,0 +1,101 @@ +.. SPDX-License-Identifier: GPL-2.0 + +========== +Device DAX +========== + +The device-dax interface uses the tail deduplication technique explained in +Documentation/mm/vmemmap_dedup.rst + +On powerpc, vmemmap deduplication is only used with radix MMU translation. Also +with a 64K page size, only the devdax namespace with 1G alignment uses vmemmap +deduplication. + +With 2M PMD level mapping, we require 32 struct pages and a single 64K vmemmap +page can contain 1024 struct pages (64K/sizeof(struct page)). Hence there is no +vmemmap deduplication possible. + +With 1G PUD level mapping, we require 16384 struct pages and a single 64K +vmemmap page can contain 1024 struct pages (64K/sizeof(struct page)). Hence we +require 16 64K pages in vmemmap to map the struct page for 1G PUD level mapping. + +Here's how things look like on device-dax after the sections are populated:: + +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+ + | | | 0 | -------------> | 0 | + | | +-----------+ +-----------+ + | | | 1 | -------------> | 1 | + | | +-----------+ +-----------+ + | | | 2 | ----------------^ ^ ^ ^ ^ ^ + | | +-----------+ | | | | | + | | | 3 | ------------------+ | | | | + | | +-----------+ | | | | + | | | 4 | --------------------+ | | | + | PUD | +-----------+ | | | + | level | | . | ----------------------+ | | + | mapping | +-----------+ | | + | | | . | ------------------------+ | + | | +-----------+ | + | | | 15 | --------------------------+ + | | +-----------+ + | | + | | + | | + +-----------+ + + +With 4K page size, 2M PMD level mapping requires 512 struct pages and a single +4K vmemmap page contains 64 struct pages(4K/sizeof(struct page)). Hence we +require 8 4K pages in vmemmap to map the struct page for 2M pmd level mapping. + +Here's how things look like on device-dax after the sections are populated:: + + +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+ + | | | 0 | -------------> | 0 | + | | +-----------+ +-----------+ + | | | 1 | -------------> | 1 | + | | +-----------+ +-----------+ + | | | 2 | ----------------^ ^ ^ ^ ^ ^ + | | +-----------+ | | | | | + | | | 3 | ------------------+ | | | | + | | +-----------+ | | | | + | | | 4 | --------------------+ | | | + | PMD | +-----------+ | | | + | level | | 5 | ----------------------+ | | + | mapping | +-----------+ | | + | | | 6 | ------------------------+ | + | | +-----------+ | + | | | 7 | --------------------------+ + | | +-----------+ + | | + | | + | | + +-----------+ + +With 1G PUD level mapping, we require 262144 struct pages and a single 4K +vmemmap page can contain 64 struct pages (4K/sizeof(struct page)). Hence we +require 4096 4K pages in vmemmap to map the struct pages for 1G PUD level +mapping. + +Here's how things look like on device-dax after the sections are populated:: + + +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+ + | | | 0 | -------------> | 0 | + | | +-----------+ +-----------+ + | | | 1 | -------------> | 1 | + | | +-----------+ +-----------+ + | | | 2 | ----------------^ ^ ^ ^ ^ ^ + | | +-----------+ | | | | | + | | | 3 | ------------------+ | | | | + | | +-----------+ | | | | + | | | 4 | --------------------+ | | | + | PUD | +-----------+ | | | + | level | | . | ----------------------+ | | + | mapping | +-----------+ | | + | | | . | ------------------------+ | + | | +-----------+ | + | | | 4095 | --------------------------+ + | | +-----------+ + | | + | | + | | + +-----------+ diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 0b1172cbeccb..116d6add0bb0 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -174,6 +174,7 @@ config PPC select ARCH_WANT_IPC_PARSE_VERSION select ARCH_WANT_IRQS_OFF_ACTIVATE_MM select ARCH_WANT_LD_ORPHAN_WARN + select ARCH_WANT_OPTIMIZE_DAX_VMEMMAP if PPC_RADIX_MMU select ARCH_WANTS_MODULES_DATA_IN_VMALLOC if PPC_BOOK3S_32 || PPC_8xx select ARCH_WEAK_RELEASE_ACQUIRE select BINFMT_ELF diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h index f1461289643a..3195f268ed7f 100644 --- a/arch/powerpc/include/asm/book3s/64/radix.h +++ b/arch/powerpc/include/asm/book3s/64/radix.h @@ -326,6 +326,7 @@ static inline pud_t radix__pud_mkdevmap(pud_t pud) } struct vmem_altmap; +struct dev_pagemap; extern int __meminit radix__vmemmap_create_mapping(unsigned long start, unsigned long page_size, unsigned long phys); @@ -363,5 +364,13 @@ int radix__remove_section_mapping(unsigned long start, unsigned long end); void radix__kernel_map_pages(struct page *page, int numpages, int enable); +#define vmemmap_can_optimize vmemmap_can_optimize +bool vmemmap_can_optimize(struct vmem_altmap *altmap, struct dev_pagemap *pgmap); + +#define vmemmap_populate_compound_pages vmemmap_populate_compound_pages +int __meminit vmemmap_populate_compound_pages(unsigned long start_pfn, + unsigned long start, + unsigned long end, int node, + struct dev_pagemap *pgmap); #endif /* __ASSEMBLY__ */ #endif diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c index 9a7f3707b6fb..b492b67c0b7d 100644 --- a/arch/powerpc/mm/book3s64/radix_pgtable.c +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c @@ -987,6 +987,15 @@ int __meminit radix__vmemmap_create_mapping(unsigned long start, return 0; } + +bool vmemmap_can_optimize(struct vmem_altmap *altmap, struct dev_pagemap *pgmap) +{ + if (radix_enabled()) + return __vmemmap_can_optimize(altmap, pgmap); + + return false; +} + int __meminit vmemmap_check_pmd(pmd_t *pmdp, int node, unsigned long addr, unsigned long next) { @@ -1194,6 +1203,200 @@ int __meminit radix__vmemmap_populate(unsigned long start, unsigned long end, in return 0; } +static pte_t * __meminit radix__vmemmap_populate_address(unsigned long addr, int node, + struct vmem_altmap *altmap, + struct page *reuse) +{ + pgd_t *pgd; + p4d_t *p4d; + pud_t *pud; + pmd_t *pmd; + pte_t *pte; + + pgd = pgd_offset_k(addr); + p4d = p4d_offset(pgd, addr); + pud = vmemmap_pud_alloc(p4d, node, addr); + if (!pud) + return NULL; + pmd = vmemmap_pmd_alloc(pud, node, addr); + if (!pmd) + return NULL; + if (pmd_leaf(*pmd)) + /* + * The second page is mapped as a hugepage due to a nearby request. + * Force our mapping to page size without deduplication + */ + return NULL; + pte = vmemmap_pte_alloc(pmd, node, addr); + if (!pte) + return NULL; + radix__vmemmap_pte_populate(pmd, addr, node, NULL, NULL); + vmemmap_verify(pte, node, addr, addr + PAGE_SIZE); + + return pte; +} + +static pte_t * __meminit vmemmap_compound_tail_page(unsigned long addr, + unsigned long pfn_offset, int node) +{ + pgd_t *pgd; + p4d_t *p4d; + pud_t *pud; + pmd_t *pmd; + pte_t *pte; + unsigned long map_addr; + + /* the second vmemmap page which we use for duplication */ + map_addr = addr - pfn_offset * sizeof(struct page) + PAGE_SIZE; + pgd = pgd_offset_k(map_addr); + p4d = p4d_offset(pgd, map_addr); + pud = vmemmap_pud_alloc(p4d, node, map_addr); + if (!pud) + return NULL; + pmd = vmemmap_pmd_alloc(pud, node, map_addr); + if (!pmd) + return NULL; + if (pmd_leaf(*pmd)) + /* + * The second page is mapped as a hugepage due to a nearby request. + * Force our mapping to page size without deduplication + */ + return NULL; + pte = vmemmap_pte_alloc(pmd, node, map_addr); + if (!pte) + return NULL; + /* + * Check if there exist a mapping to the left + */ + if (pte_none(*pte)) { + /* + * Populate the head page vmemmap page. + * It can fall in different pmd, hence + * vmemmap_populate_address() + */ + pte = radix__vmemmap_populate_address(map_addr - PAGE_SIZE, node, NULL, NULL); + if (!pte) + return NULL; + /* + * Populate the tail pages vmemmap page + */ + pte = radix__vmemmap_pte_populate(pmd, map_addr, node, NULL, NULL); + if (!pte) + return NULL; + vmemmap_verify(pte, node, map_addr, map_addr + PAGE_SIZE); + return pte; + } + return pte; +} + +int __meminit vmemmap_populate_compound_pages(unsigned long start_pfn, + unsigned long start, + unsigned long end, int node, + struct dev_pagemap *pgmap) +{ + /* + * we want to map things as base page size mapping so that + * we can save space in vmemmap. We could have huge mapping + * covering out both edges. + */ + unsigned long addr; + unsigned long addr_pfn = start_pfn; + unsigned long next; + pgd_t *pgd; + p4d_t *p4d; + pud_t *pud; + pmd_t *pmd; + pte_t *pte; + + for (addr = start; addr < end; addr = next) { + + pgd = pgd_offset_k(addr); + p4d = p4d_offset(pgd, addr); + pud = vmemmap_pud_alloc(p4d, node, addr); + if (!pud) + return -ENOMEM; + pmd = vmemmap_pmd_alloc(pud, node, addr); + if (!pmd) + return -ENOMEM; + + if (pmd_leaf(READ_ONCE(*pmd))) { + /* existing huge mapping. Skip the range */ + addr_pfn += (PMD_SIZE >> PAGE_SHIFT); + next = pmd_addr_end(addr, end); + continue; + } + pte = vmemmap_pte_alloc(pmd, node, addr); + if (!pte) + return -ENOMEM; + if (!pte_none(*pte)) { + /* + * This could be because we already have a compound + * page whose VMEMMAP_RESERVE_NR pages were mapped and + * this request fall in those pages. + */ + addr_pfn += 1; + next = addr + PAGE_SIZE; + continue; + } else { + unsigned long nr_pages = pgmap_vmemmap_nr(pgmap); + unsigned long pfn_offset = addr_pfn - ALIGN_DOWN(addr_pfn, nr_pages); + pte_t *tail_page_pte; + + /* + * if the address is aligned to huge page size it is the + * head mapping. + */ + if (pfn_offset == 0) { + /* Populate the head page vmemmap page */ + pte = radix__vmemmap_pte_populate(pmd, addr, node, NULL, NULL); + if (!pte) + return -ENOMEM; + vmemmap_verify(pte, node, addr, addr + PAGE_SIZE); + + /* + * Populate the tail pages vmemmap page + * It can fall in different pmd, hence + * vmemmap_populate_address() + */ + pte = radix__vmemmap_populate_address(addr + PAGE_SIZE, node, NULL, NULL); + if (!pte) + return -ENOMEM; + + addr_pfn += 2; + next = addr + 2 * PAGE_SIZE; + continue; + } + /* + * get the 2nd mapping details + * Also create it if that doesn't exist + */ + tail_page_pte = vmemmap_compound_tail_page(addr, pfn_offset, node); + if (!tail_page_pte) { + + pte = radix__vmemmap_pte_populate(pmd, addr, node, NULL, NULL); + if (!pte) + return -ENOMEM; + vmemmap_verify(pte, node, addr, addr + PAGE_SIZE); + + addr_pfn += 1; + next = addr + PAGE_SIZE; + continue; + } + + pte = radix__vmemmap_pte_populate(pmd, addr, node, NULL, pte_page(*tail_page_pte)); + if (!pte) + return -ENOMEM; + vmemmap_verify(pte, node, addr, addr + PAGE_SIZE); + + addr_pfn += 1; + next = addr + PAGE_SIZE; + continue; + } + } + return 0; +} + + #ifdef CONFIG_MEMORY_HOTPLUG void __meminit radix__vmemmap_remove_mapping(unsigned long start, unsigned long page_size) {