From patchwork Mon Mar 25 14:55:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christophe Leroy X-Patchwork-Id: 13602367 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49824C54E64 for ; Mon, 25 Mar 2024 14:56:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C44166B0083; Mon, 25 Mar 2024 10:56:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BCB806B0085; Mon, 25 Mar 2024 10:56:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A45026B0087; Mon, 25 Mar 2024 10:56:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 8DC6A6B0083 for ; Mon, 25 Mar 2024 10:56:20 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 59877802E2 for ; Mon, 25 Mar 2024 14:56:20 +0000 (UTC) X-FDA: 81935862120.06.EE8CEDF Received: from pegase1.c-s.fr (pegase1.c-s.fr [93.17.236.30]) by imf29.hostedemail.com (Postfix) with ESMTP id 49DF912000C for ; Mon, 25 Mar 2024 14:56:18 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=csgroup.eu; spf=pass (imf29.hostedemail.com: domain of christophe.leroy@csgroup.eu designates 93.17.236.30 as permitted sender) smtp.mailfrom=christophe.leroy@csgroup.eu ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1711378578; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RD5sLVuXteUovs3ESYCjF+GnYnE0LAegXmf8aiLMCkA=; b=dvrr2gB2ThztfMLthe7f0HHfaj5InzV74ta4DSfZsj7TerDhfxkRnAZ0OtGVvWUVCn+kjl jNeqYMYrR7074EW+Nh9nozSJIM94wYMkS7QmaetX5JhT7FAOI+IT59R9pH81Hs5OJhGokt IL9zceo0bxbCYPLLrbOvfWRPEKxyyeQ= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=csgroup.eu; spf=pass (imf29.hostedemail.com: domain of christophe.leroy@csgroup.eu designates 93.17.236.30 as permitted sender) smtp.mailfrom=christophe.leroy@csgroup.eu ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1711378578; a=rsa-sha256; cv=none; b=pNkDmEurAYlBVX5IEDbR2KtXhD/Nuub4s0p44P05AcJ83CTuKzEpNen5eP0VMOW4sFK9oC u9GgsJO0oA8mXqRMW5ma463r8DEwJeIXtEF0exT3YCLjSe0I7qiSXUsBTaFTjOuL70uVoR GdyG3Rvlk2GjJVCAuDcGcV7G3ReKLWg= Received: from localhost (mailhub3.si.c-s.fr [192.168.12.233]) by localhost (Postfix) with ESMTP id 4V3GGM0mf5z9sTD; Mon, 25 Mar 2024 15:56:11 +0100 (CET) X-Virus-Scanned: amavisd-new at c-s.fr Received: from pegase1.c-s.fr ([192.168.12.234]) by localhost (pegase1.c-s.fr [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id DFWWQuMGrNUw; Mon, 25 Mar 2024 15:56:11 +0100 (CET) Received: from messagerie.si.c-s.fr (messagerie.si.c-s.fr [192.168.25.192]) by pegase1.c-s.fr (Postfix) with ESMTP id 4V3GGK1jq7z9sbF; Mon, 25 Mar 2024 15:56:09 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by messagerie.si.c-s.fr (Postfix) with ESMTP id 37A088B765; Mon, 25 Mar 2024 15:56:09 +0100 (CET) X-Virus-Scanned: amavisd-new at c-s.fr Received: from messagerie.si.c-s.fr ([127.0.0.1]) by localhost (messagerie.si.c-s.fr [127.0.0.1]) (amavisd-new, port 10023) with ESMTP id Izv57FnJsesi; Mon, 25 Mar 2024 15:56:09 +0100 (CET) Received: from PO20335.idsi0.si.c-s.fr (unknown [172.25.230.108]) by messagerie.si.c-s.fr (Postfix) with ESMTP id 131B68B76D; Mon, 25 Mar 2024 15:56:09 +0100 (CET) From: Christophe Leroy To: Andrew Morton , Jason Gunthorpe , Peter Xu Cc: Christophe Leroy , linux-kernel@vger.kernel.org, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org Subject: [RFC PATCH 1/8] mm: Provide pagesize to pmd_populate() Date: Mon, 25 Mar 2024 15:55:54 +0100 Message-ID: <54d78f1b7e7f1c671e40b7c0c637380bcb834326.1711377230.git.christophe.leroy@csgroup.eu> X-Mailer: git-send-email 2.43.0 In-Reply-To: References: MIME-Version: 1.0 X-Developer-Signature: v=1; a=ed25519-sha256; t=1711378567; l=7768; i=christophe.leroy@csgroup.eu; s=20211009; h=from:subject:message-id; bh=ekqDiax6XYjI7QmradPChM7m3NA5stHWTQ4duig8CUY=; b=372K9QS4WRy+tMs3GDykxT5niJJw9B8R1716FMGSolDQuf04emGhfpHWhk/1LFu8r6xDLN8Zv T9yhPnF3XT2CjU1pa0bAYbml2OpX9dP01GcgxUfRygbH9cm4HCInE+n X-Developer-Key: i=christophe.leroy@csgroup.eu; a=ed25519; pk=HIzTzUj91asvincQGOFx6+ZF5AoUuP9GdOtQChs7Mm0= X-Rspamd-Queue-Id: 49DF912000C X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: t64yh79bgmfq1p13rf6z1qo187fux75q X-HE-Tag: 1711378578-162047 X-HE-Meta: U2FsdGVkX1+UYDBrdegrXkvGKrNfxRPhyx/8Z+o3MFz9HC8InXpV7QbYjUA3GKERROj0fwp1Vc3W4VPb95dBkdMz2qR/KP/E9K1SNXdhbipD5m4gzIx/WGLtLxLgqscs66AvmNon5h2UDqCrCzupdI/Tae+GP8iIoVtCTMDYJHjvl6onT4sPbQ8FhfV1pGS+Q6x6viq9ZmIRdTLuvMtUxIxLrldufy8kkEMX0jzwh8Q/w9kPbgCGUGYr50h/wJh/IbkdXzKtr8aRH6vof2F/wikeBMvuZQ7u3i8v/7JuVTXA2kx2UhCS66TuCnmMlR+ggAJ/6WuFA2ZvYOWeIv68cN+3Dc5NT3AnybJqvfR0zYKg1q6U2vFz8N9T2g8aQiflgh8wQoLweaXC0V6ta8t89aOvZbYBKpp2CANyPM5aUXbp+mXrTGABOTVC/ql6yslqTTEMdSq8ha3O5p0aeRLLQgwNfv+cjmHr2EVZsXAAoXL7oqNvauhs9MzcXpz9vDSpfdWAQQ5sAdXxxhB6UYYMJpMEjKe+KNmRAcL0iY/HozRzCt0VjmqE8FS6NrRjUD5hnPwYl6Z2vZc4M/n7JH8dUBJOvlogsjynA0mOH8KMARV37onfzp8BJt2xoY9TphHozv39HXsTzNgOILB+kIo4oNMM9vAtJJdgrtOgoL77YTfhDtqckUSxfacM6KghL6AO0MHBvf8pUqKUoFlW/vWXVJ0WTAyG++Z2zFE4e471Ok8V4A6ZsqUSk0FjN04nQ6noCIsUHjCzEi5OhKceAGaV2TUZk/kZs1da2hjMQEhZ+SShBM/v/w26/pWqR3eTKmP93l4E+sImxkCndJjBLnGdhn25QH5znBJoQAGqvFXzg+k= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Unlike many architectures, powerpc 8xx hardware tablewalk requires a two level process for all page sizes, allthough second level only has one entry when pagesize is 8M. To fit with Linux page table topology and without requiring special page directory layout like hugepd, the page entry will be replicated 1024 times in the standard page table. However for large pages it is necessary to set bits in the level-1 (PMD) entry. At the time being, for 512k pages the flag is kept in the PTE and inserted in the PMD entry at TLB miss exception, that is necessary because we can have pages of different sizes in a page table. However the 12 PTE bits are fully used and there is no room for an additional bit for page size. For 8M pages, there will be only one page per PMD entry, it is therefore possible to flag the pagesize in the PMD entry, with the advantage that the information will already be at the right place for the hardware. To do so, add a new helper called pmd_populate_size() which takes the page size as an additional argument, and modify __pte_alloc() to also take that argument. pte_alloc() is left unmodified in order to reduce churn on callers, and a pte_alloc_size() is added for use by pte_alloc_huge(). When an architecture doesn't provide pmd_populate_size(), pmd_populate() is used as a fallback. Signed-off-by: Christophe Leroy --- include/linux/mm.h | 12 +++++++----- mm/filemap.c | 2 +- mm/internal.h | 2 +- mm/memory.c | 19 ++++++++++++------- mm/pgalloc-track.h | 2 +- mm/userfaultfd.c | 4 ++-- 6 files changed, 24 insertions(+), 17 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 2c0910bc3e4a..6c5c15955d4e 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2801,8 +2801,8 @@ static inline void mm_inc_nr_ptes(struct mm_struct *mm) {} static inline void mm_dec_nr_ptes(struct mm_struct *mm) {} #endif -int __pte_alloc(struct mm_struct *mm, pmd_t *pmd); -int __pte_alloc_kernel(pmd_t *pmd); +int __pte_alloc(struct mm_struct *mm, pmd_t *pmd, unsigned long sz); +int __pte_alloc_kernel(pmd_t *pmd, unsigned long sz); #if defined(CONFIG_MMU) @@ -2987,7 +2987,8 @@ pte_t *pte_offset_map_nolock(struct mm_struct *mm, pmd_t *pmd, pte_unmap(pte); \ } while (0) -#define pte_alloc(mm, pmd) (unlikely(pmd_none(*(pmd))) && __pte_alloc(mm, pmd)) +#define pte_alloc_size(mm, pmd, sz) (unlikely(pmd_none(*(pmd))) && __pte_alloc(mm, pmd, sz)) +#define pte_alloc(mm, pmd) pte_alloc_size(mm, pmd, PAGE_SIZE) #define pte_alloc_map(mm, pmd, address) \ (pte_alloc(mm, pmd) ? NULL : pte_offset_map(pmd, address)) @@ -2996,9 +2997,10 @@ pte_t *pte_offset_map_nolock(struct mm_struct *mm, pmd_t *pmd, (pte_alloc(mm, pmd) ? \ NULL : pte_offset_map_lock(mm, pmd, address, ptlp)) -#define pte_alloc_kernel(pmd, address) \ - ((unlikely(pmd_none(*(pmd))) && __pte_alloc_kernel(pmd))? \ +#define pte_alloc_kernel_size(pmd, address, sz) \ + ((unlikely(pmd_none(*(pmd))) && __pte_alloc_kernel(pmd, sz))? \ NULL: pte_offset_kernel(pmd, address)) +#define pte_alloc_kernel(pmd, address) pte_alloc_kernel_size(pmd, address, PAGE_SIZE) #if USE_SPLIT_PMD_PTLOCKS diff --git a/mm/filemap.c b/mm/filemap.c index 7437b2bd75c1..b013000ea84f 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3428,7 +3428,7 @@ static bool filemap_map_pmd(struct vm_fault *vmf, struct folio *folio, } if (pmd_none(*vmf->pmd) && vmf->prealloc_pte) - pmd_install(mm, vmf->pmd, &vmf->prealloc_pte); + pmd_install(mm, vmf->pmd, &vmf->prealloc_pte, PAGE_SIZE); return false; } diff --git a/mm/internal.h b/mm/internal.h index 7e486f2c502c..b81c3ca59f45 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -206,7 +206,7 @@ void folio_activate(struct folio *folio); void free_pgtables(struct mmu_gather *tlb, struct ma_state *mas, struct vm_area_struct *start_vma, unsigned long floor, unsigned long ceiling, bool mm_wr_locked); -void pmd_install(struct mm_struct *mm, pmd_t *pmd, pgtable_t *pte); +void pmd_install(struct mm_struct *mm, pmd_t *pmd, pgtable_t *pte, unsigned long sz); struct zap_details; void unmap_page_range(struct mmu_gather *tlb, diff --git a/mm/memory.c b/mm/memory.c index f2bc6dd15eb8..c846bb75746b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -409,7 +409,12 @@ void free_pgtables(struct mmu_gather *tlb, struct ma_state *mas, } while (vma); } -void pmd_install(struct mm_struct *mm, pmd_t *pmd, pgtable_t *pte) +#ifndef pmd_populate_size +#define pmd_populate_size(mm, pmdp, pte, sz) pmd_populate(mm, pmdp, pte) +#define pmd_populate_kernel_size(mm, pmdp, pte, sz) pmd_populate_kernel(mm, pmdp, pte) +#endif + +void pmd_install(struct mm_struct *mm, pmd_t *pmd, pgtable_t *pte, unsigned long sz) { spinlock_t *ptl = pmd_lock(mm, pmd); @@ -429,25 +434,25 @@ void pmd_install(struct mm_struct *mm, pmd_t *pmd, pgtable_t *pte) * smp_rmb() barriers in page table walking code. */ smp_wmb(); /* Could be smp_wmb__xxx(before|after)_spin_lock */ - pmd_populate(mm, pmd, *pte); + pmd_populate_size(mm, pmd, *pte, sz); *pte = NULL; } spin_unlock(ptl); } -int __pte_alloc(struct mm_struct *mm, pmd_t *pmd) +int __pte_alloc(struct mm_struct *mm, pmd_t *pmd, unsigned long sz) { pgtable_t new = pte_alloc_one(mm); if (!new) return -ENOMEM; - pmd_install(mm, pmd, &new); + pmd_install(mm, pmd, &new, sz); if (new) pte_free(mm, new); return 0; } -int __pte_alloc_kernel(pmd_t *pmd) +int __pte_alloc_kernel(pmd_t *pmd, unsigned long sz) { pte_t *new = pte_alloc_one_kernel(&init_mm); if (!new) @@ -456,7 +461,7 @@ int __pte_alloc_kernel(pmd_t *pmd) spin_lock(&init_mm.page_table_lock); if (likely(pmd_none(*pmd))) { /* Has another populated it ? */ smp_wmb(); /* See comment in pmd_install() */ - pmd_populate_kernel(&init_mm, pmd, new); + pmd_populate_kernel_size(&init_mm, pmd, new, sz); new = NULL; } spin_unlock(&init_mm.page_table_lock); @@ -4738,7 +4743,7 @@ vm_fault_t finish_fault(struct vm_fault *vmf) } if (vmf->prealloc_pte) - pmd_install(vma->vm_mm, vmf->pmd, &vmf->prealloc_pte); + pmd_install(vma->vm_mm, vmf->pmd, &vmf->prealloc_pte, PAGE_SIZE); else if (unlikely(pte_alloc(vma->vm_mm, vmf->pmd))) return VM_FAULT_OOM; } diff --git a/mm/pgalloc-track.h b/mm/pgalloc-track.h index e9e879de8649..90e37de7ab77 100644 --- a/mm/pgalloc-track.h +++ b/mm/pgalloc-track.h @@ -45,7 +45,7 @@ static inline pmd_t *pmd_alloc_track(struct mm_struct *mm, pud_t *pud, #define pte_alloc_kernel_track(pmd, address, mask) \ ((unlikely(pmd_none(*(pmd))) && \ - (__pte_alloc_kernel(pmd) || ({*(mask)|=PGTBL_PMD_MODIFIED;0;})))?\ + (__pte_alloc_kernel(pmd, PAGE_SIZE) || ({*(mask)|=PGTBL_PMD_MODIFIED;0;})))?\ NULL: pte_offset_kernel(pmd, address)) #endif /* _LINUX_PGALLOC_TRACK_H */ diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 712160cd41ec..9baf507ce193 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -764,7 +764,7 @@ static __always_inline ssize_t mfill_atomic(struct userfaultfd_ctx *ctx, break; } if (unlikely(pmd_none(dst_pmdval)) && - unlikely(__pte_alloc(dst_mm, dst_pmd))) { + unlikely(__pte_alloc(dst_mm, dst_pmd, PAGE_SIZE))) { err = -ENOMEM; break; } @@ -1686,7 +1686,7 @@ ssize_t move_pages(struct userfaultfd_ctx *ctx, unsigned long dst_start, err = -ENOENT; break; } - if (unlikely(__pte_alloc(mm, src_pmd))) { + if (unlikely(__pte_alloc(mm, src_pmd, PAGE_SIZE))) { err = -ENOMEM; break; }