From patchwork Mon Jul 24 19:07:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Aneesh Kumar K.V" X-Patchwork-Id: 13325277 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 992D7EB64DD for ; Mon, 24 Jul 2023 19:10:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 369E16B0078; Mon, 24 Jul 2023 15:10:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 31A396B007E; Mon, 24 Jul 2023 15:10:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 194D66B0080; Mon, 24 Jul 2023 15:10:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 081D16B0078 for ; Mon, 24 Jul 2023 15:10:48 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id CEB4BB21D3 for ; Mon, 24 Jul 2023 19:10:47 +0000 (UTC) X-FDA: 81047447334.20.FFA4EF9 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by imf26.hostedemail.com (Postfix) with ESMTP id 5A6C9140009 for ; Mon, 24 Jul 2023 19:10:45 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=GCTz3fSI; dmarc=pass (policy=none) header.from=ibm.com; spf=pass (imf26.hostedemail.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.156.1 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690225845; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Pf3xwM0gLXSFKTe/bRRBnDec6Iu7xBNkpvjv5p1v/QI=; b=hdRSOytbR0JzudD1mOmq3WOdzlVa4Ak9GD3DEPO1abn6cYMcErjDiU+ui53Y1NuTOWR1l4 Mt3ROG6LrFYMNs/hqYknQv3NFsQGnCyb9vSInZBYpNmwm8uhv7GaLURk2zusG6fx9U7Cg7 rP41HJt5hDA1Dfixo9f+mCnwhlFsKvY= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=GCTz3fSI; dmarc=pass (policy=none) header.from=ibm.com; spf=pass (imf26.hostedemail.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.156.1 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690225845; a=rsa-sha256; cv=none; b=3A1Da+Iyj8euMHIGwEdEJzHQKna9gFCrwiWVl60ZD03A6N8iaTs7owzJfrL/eSTsqh1+UV K5EphAcHkbI4jX+APh0jjXa8WfAn+9WDRKl1k4YN96fgEn9eyVVSjXwU9yRgbyxLcZGwBp ja+rk2JaMXcPR46E8XxpXpVDKj5ahHM= Received: from pps.filterd (m0356517.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 36OIUrbL003100; Mon, 24 Jul 2023 19:10:34 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=Pf3xwM0gLXSFKTe/bRRBnDec6Iu7xBNkpvjv5p1v/QI=; b=GCTz3fSIdd5TpMyjIbi7tXylq9d7AHf+Kjd8g80zc7JwvOXAmYbrS1VJzYg0JVDdt39F G9xaL39cwG437+cOIO5bq6CzeDl/+DqAT8esiG6pRjR6ocxJ5moKxdxFmGygy87LNB+h cfss5UsaNE9xPIkYyUVn9N6a3neFF1n659nvQyhtBmwMIeOdzl9gsnbotY7uaEPMGK6f G5WSHFAJJgrZDuaj6YIAB70ezLGOpzUrPl+CUvu6IN9463s42NyHu8Hplq6DruXNH/6f hMQCzSZyBlZQ5z/+5kpxVRiXnJmsEkouXt6wdVCKNMoMLLGjxY1gHeWaoCzhTX0qEMhy rA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3s1w6d37se-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 24 Jul 2023 19:10:33 +0000 Received: from m0356517.ppops.net (m0356517.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 36OJ4ksS024877; Mon, 24 Jul 2023 19:10:33 GMT Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3s1w6d37s3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 24 Jul 2023 19:10:33 +0000 Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 36OHH0JM014374; Mon, 24 Jul 2023 19:10:31 GMT Received: from smtprelay03.wdc07v.mail.ibm.com ([172.16.1.70]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3s0stxp1nx-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 24 Jul 2023 19:10:31 +0000 Received: from smtpav01.wdc07v.mail.ibm.com (smtpav01.wdc07v.mail.ibm.com [10.39.53.228]) by smtprelay03.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 36OJAVWk52035968 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 24 Jul 2023 19:10:31 GMT Received: from smtpav01.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4121D5805B; Mon, 24 Jul 2023 19:10:31 +0000 (GMT) Received: from smtpav01.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 67B0758059; Mon, 24 Jul 2023 19:10:26 +0000 (GMT) Received: from skywalker.ibmuc.com (unknown [9.177.66.22]) by smtpav01.wdc07v.mail.ibm.com (Postfix) with ESMTP; Mon, 24 Jul 2023 19:10:26 +0000 (GMT) From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, akpm@linux-foundation.org, mpe@ellerman.id.au, linuxppc-dev@lists.ozlabs.org, npiggin@gmail.com, christophe.leroy@csgroup.eu Cc: Oscar Salvador , Mike Kravetz , Dan Williams , Joao Martins , Catalin Marinas , Muchun Song , Will Deacon , "Aneesh Kumar K.V" Subject: [PATCH v6 11/13] powerpc/book3s64/radix: Add support for vmemmap optimization for radix Date: Tue, 25 Jul 2023 00:37:57 +0530 Message-ID: <20230724190759.483013-12-aneesh.kumar@linux.ibm.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230724190759.483013-1-aneesh.kumar@linux.ibm.com> References: <20230724190759.483013-1-aneesh.kumar@linux.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: vIhe8sBbq9Cums3H7h1Ilf0dvNPPDvJB X-Proofpoint-GUID: hNYWefVGD8WvHQi_f6ww-ACI53mpPRiF X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.591,FMLib:17.11.176.26 definitions=2023-07-24_14,2023-07-24_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 phishscore=0 mlxscore=0 malwarescore=0 adultscore=0 clxscore=1015 impostorscore=0 priorityscore=1501 spamscore=0 mlxlogscore=999 lowpriorityscore=0 bulkscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2306200000 definitions=main-2307240168 X-Rspamd-Queue-Id: 5A6C9140009 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: qkbnu7n4z3eyfeq4r8prm9o3o4bxsect X-HE-Tag: 1690225845-145531 X-HE-Meta: U2FsdGVkX1+R8bSePfA29aixvvYQbq98tJyjUBbSWI8LWPw2BsTGeA+xU6ZnoCwZqEvegfGBQaQjWO9u7sUA8UkeOUThGE8e2lJ3VCbF9S+tLI8Nl28TWW5m89DN6D51/HvJoC2/o/PxP6Ahd7JPGPtUEoZYXck5//F6WmVZSJvHX5yux0agqLInTTZujhHmExb3a5+QjigKYQjgk9zlc6QAMLzPCOYY2Z+Oibh7jsXt2OJVDBeBwrZ5Gs1K/85CxwgvLTDSzDOkvf0G0YjL31NxKMHbjjuNg5dCMoKCnaNXYT698T40It792oGNhhO8CmgxThO0TP6d7szApebr/SHUQ7u/VtrHXNZZVWYEVvQVZQa5G+3TwxBACost6oE3rNMT0ue23ZJ0NUibLuw/eTkKp8o+bZHdnnjx08mQ3WFkDiSryIdWzrL/qjFtP8zmRLYiydXV97t/zFenXIQ6ZwdzznTZpXduCqcdRuamOQcQbbErIi1MVs5OT1OxPLl/4PZcGgwF7Bj4vIz2+bEZuyee8A3Fjj6D1fj3VHVmOLK1lwHHPUN69Ywqr52yrm1OWL45UPvOX6V+QMMBanUZVFA/gnM71wa3dVxuVBdf6daFqOV9EfXhUxDQgO0XW1IYQCzuo0FzV3ZgGlxJNauIastgXgKXnlosoNaEYmX88r9u88ZXYm0ob70f29dhXUgbP6/2yi8ARHKrF1JE9vID2r08UjzYdkrxgDAwB3UwwWvbeI1P7FtR+R6q3yVEW8MCxFw92gJyGgDp79HqNyTlGHNaxa5AgpRKX9/lE5lwPp6u47BeJ/N2Z0IalCSKf+KYIY04LLNgLBxJI0tQYLN/hbhA8m7mn7cnwv+LjgxxAgdME6MKeiA98WNjjI8Ahyqw74wh7Sw/aZpBq/uE7iHJC/acV46hovQ+RJ1nm6aTSrZ8fPyLyV9oTwueoR5uKFGj0m2ZQG3xrH2KaPODF4M e5HoWyUe FDQ9mwxFiV+4WOYW8RWzGYeX5OhdSYqOXR6W7EsetWPafhWRZO4xPbiYMDytaH95zr3XaRCAKcnS9hMXiajcgM+A6OR+umDVM/oTGczipy3RIT7RAZgN2S5mmyZvbPAcSG36/Kb+pmOWf4lI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: With 2M PMD-level mapping, we require 32 struct pages and a single vmemmap page can contain 1024 struct pages (PAGE_SIZE/sizeof(struct page)). Hence with 64K page size, we don't use vmemmap deduplication for PMD-level mapping. Signed-off-by: Aneesh Kumar K.V --- Documentation/mm/vmemmap_dedup.rst | 1 + Documentation/powerpc/index.rst | 1 + Documentation/powerpc/vmemmap_dedup.rst | 101 ++++++++++ arch/powerpc/Kconfig | 1 + arch/powerpc/include/asm/book3s/64/radix.h | 9 + arch/powerpc/mm/book3s64/radix_pgtable.c | 203 +++++++++++++++++++++ 6 files changed, 316 insertions(+) create mode 100644 Documentation/powerpc/vmemmap_dedup.rst diff --git a/Documentation/mm/vmemmap_dedup.rst b/Documentation/mm/vmemmap_dedup.rst index a4b12ff906c4..c573e08b5043 100644 --- a/Documentation/mm/vmemmap_dedup.rst +++ b/Documentation/mm/vmemmap_dedup.rst @@ -210,6 +210,7 @@ the device (altmap). The following page sizes are supported in DAX: PAGE_SIZE (4K on x86_64), PMD_SIZE (2M on x86_64) and PUD_SIZE (1G on x86_64). +For powerpc equivalent details see Documentation/powerpc/vmemmap_dedup.rst The differences with HugeTLB are relatively minor. diff --git a/Documentation/powerpc/index.rst b/Documentation/powerpc/index.rst index d33b554ca7ba..a50834798454 100644 --- a/Documentation/powerpc/index.rst +++ b/Documentation/powerpc/index.rst @@ -36,6 +36,7 @@ powerpc ultravisor vas-api vcpudispatch_stats + vmemmap_dedup features diff --git a/Documentation/powerpc/vmemmap_dedup.rst b/Documentation/powerpc/vmemmap_dedup.rst new file mode 100644 index 000000000000..dc4db59fdf87 --- /dev/null +++ b/Documentation/powerpc/vmemmap_dedup.rst @@ -0,0 +1,101 @@ +.. SPDX-License-Identifier: GPL-2.0 + +========== +Device DAX +========== + +The device-dax interface uses the tail deduplication technique explained in +Documentation/mm/vmemmap_dedup.rst + +On powerpc, vmemmap deduplication is only used with radix MMU translation. Also +with a 64K page size, only the devdax namespace with 1G alignment uses vmemmap +deduplication. + +With 2M PMD level mapping, we require 32 struct pages and a single 64K vmemmap +page can contain 1024 struct pages (64K/sizeof(struct page)). Hence there is no +vmemmap deduplication possible. + +With 1G PUD level mapping, we require 16384 struct pages and a single 64K +vmemmap page can contain 1024 struct pages (64K/sizeof(struct page)). Hence we +require 16 64K pages in vmemmap to map the struct page for 1G PUD level mapping. + +Here's how things look like on device-dax after the sections are populated:: + +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+ + | | | 0 | -------------> | 0 | + | | +-----------+ +-----------+ + | | | 1 | -------------> | 1 | + | | +-----------+ +-----------+ + | | | 2 | ----------------^ ^ ^ ^ ^ ^ + | | +-----------+ | | | | | + | | | 3 | ------------------+ | | | | + | | +-----------+ | | | | + | | | 4 | --------------------+ | | | + | PUD | +-----------+ | | | + | level | | . | ----------------------+ | | + | mapping | +-----------+ | | + | | | . | ------------------------+ | + | | +-----------+ | + | | | 15 | --------------------------+ + | | +-----------+ + | | + | | + | | + +-----------+ + + +With 4K page size, 2M PMD level mapping requires 512 struct pages and a single +4K vmemmap page contains 64 struct pages(4K/sizeof(struct page)). Hence we +require 8 4K pages in vmemmap to map the struct page for 2M pmd level mapping. + +Here's how things look like on device-dax after the sections are populated:: + + +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+ + | | | 0 | -------------> | 0 | + | | +-----------+ +-----------+ + | | | 1 | -------------> | 1 | + | | +-----------+ +-----------+ + | | | 2 | ----------------^ ^ ^ ^ ^ ^ + | | +-----------+ | | | | | + | | | 3 | ------------------+ | | | | + | | +-----------+ | | | | + | | | 4 | --------------------+ | | | + | PMD | +-----------+ | | | + | level | | 5 | ----------------------+ | | + | mapping | +-----------+ | | + | | | 6 | ------------------------+ | + | | +-----------+ | + | | | 7 | --------------------------+ + | | +-----------+ + | | + | | + | | + +-----------+ + +With 1G PUD level mapping, we require 262144 struct pages and a single 4K +vmemmap page can contain 64 struct pages (4K/sizeof(struct page)). Hence we +require 4096 4K pages in vmemmap to map the struct pages for 1G PUD level +mapping. + +Here's how things look like on device-dax after the sections are populated:: + + +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+ + | | | 0 | -------------> | 0 | + | | +-----------+ +-----------+ + | | | 1 | -------------> | 1 | + | | +-----------+ +-----------+ + | | | 2 | ----------------^ ^ ^ ^ ^ ^ + | | +-----------+ | | | | | + | | | 3 | ------------------+ | | | | + | | +-----------+ | | | | + | | | 4 | --------------------+ | | | + | PUD | +-----------+ | | | + | level | | . | ----------------------+ | | + | mapping | +-----------+ | | + | | | . | ------------------------+ | + | | +-----------+ | + | | | 4095 | --------------------------+ + | | +-----------+ + | | + | | + | | + +-----------+ diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 9222c138c457..d0497d13f5b4 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -174,6 +174,7 @@ config PPC select ARCH_WANT_IPC_PARSE_VERSION select ARCH_WANT_IRQS_OFF_ACTIVATE_MM select ARCH_WANT_LD_ORPHAN_WARN + select ARCH_WANT_OPTIMIZE_DAX_VMEMMAP if PPC_RADIX_MMU select ARCH_WANTS_MODULES_DATA_IN_VMALLOC if PPC_BOOK3S_32 || PPC_8xx select ARCH_WEAK_RELEASE_ACQUIRE select BINFMT_ELF diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h index f1461289643a..3195f268ed7f 100644 --- a/arch/powerpc/include/asm/book3s/64/radix.h +++ b/arch/powerpc/include/asm/book3s/64/radix.h @@ -326,6 +326,7 @@ static inline pud_t radix__pud_mkdevmap(pud_t pud) } struct vmem_altmap; +struct dev_pagemap; extern int __meminit radix__vmemmap_create_mapping(unsigned long start, unsigned long page_size, unsigned long phys); @@ -363,5 +364,13 @@ int radix__remove_section_mapping(unsigned long start, unsigned long end); void radix__kernel_map_pages(struct page *page, int numpages, int enable); +#define vmemmap_can_optimize vmemmap_can_optimize +bool vmemmap_can_optimize(struct vmem_altmap *altmap, struct dev_pagemap *pgmap); + +#define vmemmap_populate_compound_pages vmemmap_populate_compound_pages +int __meminit vmemmap_populate_compound_pages(unsigned long start_pfn, + unsigned long start, + unsigned long end, int node, + struct dev_pagemap *pgmap); #endif /* __ASSEMBLY__ */ #endif diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c index 53f8340e390c..e5356ac37e99 100644 --- a/arch/powerpc/mm/book3s64/radix_pgtable.c +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c @@ -986,6 +986,15 @@ int __meminit radix__vmemmap_create_mapping(unsigned long start, return 0; } + +bool vmemmap_can_optimize(struct vmem_altmap *altmap, struct dev_pagemap *pgmap) +{ + if (radix_enabled()) + return __vmemmap_can_optimize(altmap, pgmap); + + return false; +} + int __meminit vmemmap_check_pmd(pmd_t *pmdp, int node, unsigned long addr, unsigned long next) { @@ -1193,6 +1202,200 @@ int __meminit radix__vmemmap_populate(unsigned long start, unsigned long end, in return 0; } +static pte_t * __meminit radix__vmemmap_populate_address(unsigned long addr, int node, + struct vmem_altmap *altmap, + struct page *reuse) +{ + pgd_t *pgd; + p4d_t *p4d; + pud_t *pud; + pmd_t *pmd; + pte_t *pte; + + pgd = pgd_offset_k(addr); + p4d = p4d_offset(pgd, addr); + pud = vmemmap_pud_alloc(p4d, node, addr); + if (!pud) + return NULL; + pmd = vmemmap_pmd_alloc(pud, node, addr); + if (!pmd) + return NULL; + if (pmd_leaf(*pmd)) + /* + * The second page is mapped as a hugepage due to a nearby request. + * Force our mapping to page size without deduplication + */ + return NULL; + pte = vmemmap_pte_alloc(pmd, node, addr); + if (!pte) + return NULL; + radix__vmemmap_pte_populate(pmd, addr, node, NULL, NULL); + vmemmap_verify(pte, node, addr, addr + PAGE_SIZE); + + return pte; +} + +static pte_t * __meminit vmemmap_compound_tail_page(unsigned long addr, + unsigned long pfn_offset, int node) +{ + pgd_t *pgd; + p4d_t *p4d; + pud_t *pud; + pmd_t *pmd; + pte_t *pte; + unsigned long map_addr; + + /* the second vmemmap page which we use for duplication */ + map_addr = addr - pfn_offset * sizeof(struct page) + PAGE_SIZE; + pgd = pgd_offset_k(map_addr); + p4d = p4d_offset(pgd, map_addr); + pud = vmemmap_pud_alloc(p4d, node, map_addr); + if (!pud) + return NULL; + pmd = vmemmap_pmd_alloc(pud, node, map_addr); + if (!pmd) + return NULL; + if (pmd_leaf(*pmd)) + /* + * The second page is mapped as a hugepage due to a nearby request. + * Force our mapping to page size without deduplication + */ + return NULL; + pte = vmemmap_pte_alloc(pmd, node, map_addr); + if (!pte) + return NULL; + /* + * Check if there exist a mapping to the left + */ + if (pte_none(*pte)) { + /* + * Populate the head page vmemmap page. + * It can fall in different pmd, hence + * vmemmap_populate_address() + */ + pte = radix__vmemmap_populate_address(map_addr - PAGE_SIZE, node, NULL, NULL); + if (!pte) + return NULL; + /* + * Populate the tail pages vmemmap page + */ + pte = radix__vmemmap_pte_populate(pmd, map_addr, node, NULL, NULL); + if (!pte) + return NULL; + vmemmap_verify(pte, node, map_addr, map_addr + PAGE_SIZE); + return pte; + } + return pte; +} + +int __meminit vmemmap_populate_compound_pages(unsigned long start_pfn, + unsigned long start, + unsigned long end, int node, + struct dev_pagemap *pgmap) +{ + /* + * we want to map things as base page size mapping so that + * we can save space in vmemmap. We could have huge mapping + * covering out both edges. + */ + unsigned long addr; + unsigned long addr_pfn = start_pfn; + unsigned long next; + pgd_t *pgd; + p4d_t *p4d; + pud_t *pud; + pmd_t *pmd; + pte_t *pte; + + for (addr = start; addr < end; addr = next) { + + pgd = pgd_offset_k(addr); + p4d = p4d_offset(pgd, addr); + pud = vmemmap_pud_alloc(p4d, node, addr); + if (!pud) + return -ENOMEM; + pmd = vmemmap_pmd_alloc(pud, node, addr); + if (!pmd) + return -ENOMEM; + + if (pmd_leaf(READ_ONCE(*pmd))) { + /* existing huge mapping. Skip the range */ + addr_pfn += (PMD_SIZE >> PAGE_SHIFT); + next = pmd_addr_end(addr, end); + continue; + } + pte = vmemmap_pte_alloc(pmd, node, addr); + if (!pte) + return -ENOMEM; + if (!pte_none(*pte)) { + /* + * This could be because we already have a compound + * page whose VMEMMAP_RESERVE_NR pages were mapped and + * this request fall in those pages. + */ + addr_pfn += 1; + next = addr + PAGE_SIZE; + continue; + } else { + unsigned long nr_pages = pgmap_vmemmap_nr(pgmap); + unsigned long pfn_offset = addr_pfn - ALIGN_DOWN(addr_pfn, nr_pages); + pte_t *tail_page_pte; + + /* + * if the address is aligned to huge page size it is the + * head mapping. + */ + if (pfn_offset == 0) { + /* Populate the head page vmemmap page */ + pte = radix__vmemmap_pte_populate(pmd, addr, node, NULL, NULL); + if (!pte) + return -ENOMEM; + vmemmap_verify(pte, node, addr, addr + PAGE_SIZE); + + /* + * Populate the tail pages vmemmap page + * It can fall in different pmd, hence + * vmemmap_populate_address() + */ + pte = radix__vmemmap_populate_address(addr + PAGE_SIZE, node, NULL, NULL); + if (!pte) + return -ENOMEM; + + addr_pfn += 2; + next = addr + 2 * PAGE_SIZE; + continue; + } + /* + * get the 2nd mapping details + * Also create it if that doesn't exist + */ + tail_page_pte = vmemmap_compound_tail_page(addr, pfn_offset, node); + if (!tail_page_pte) { + + pte = radix__vmemmap_pte_populate(pmd, addr, node, NULL, NULL); + if (!pte) + return -ENOMEM; + vmemmap_verify(pte, node, addr, addr + PAGE_SIZE); + + addr_pfn += 1; + next = addr + PAGE_SIZE; + continue; + } + + pte = radix__vmemmap_pte_populate(pmd, addr, node, NULL, pte_page(*tail_page_pte)); + if (!pte) + return -ENOMEM; + vmemmap_verify(pte, node, addr, addr + PAGE_SIZE); + + addr_pfn += 1; + next = addr + PAGE_SIZE; + continue; + } + } + return 0; +} + + #ifdef CONFIG_MEMORY_HOTPLUG void __meminit radix__vmemmap_remove_mapping(unsigned long start, unsigned long page_size) {