From patchwork Mon May 29 06:22:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13258188 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E1117C77B7A for ; Mon, 29 May 2023 06:23:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:Message-ID: In-Reply-To:Subject:cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=H9106rgyXPSsb0XRtyvtn7EORr/BAeGW7yRPuNZPkRE=; b=Zf6nYWXAEJB22s 8ccQPDDOhNETd7i5SrxowZApRK8yBjGziDS0iZg0hqUIoXvuRBsFnwS6LCDlPQf27f0nO8hn0TNUN ncTkypFARFYBzJimtMOcWIm/Px3GGVLW3dFq54ZaF5vGfGM3bqkNgaS9fX/zGHmaVOAqO+N/N3RpC 1woV+Siy0+xDe80GM0uEbphaiW+U1pd8lEMet9r5ENNjLw5thjT3LIj8duRM9oYIsVzrcP+62Q/cm Y12IviOY/LjXteQ9JoWiu2KotJdoP5ij5alnh9wNlp/K3M6jc4INAQhSxnWF0Kir3CC53tvGxHU69 lWE9LhYj3/HpgdzakigQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1q3WHL-009MOk-2S; Mon, 29 May 2023 06:22:51 +0000 Received: from mail-yw1-x1135.google.com ([2607:f8b0:4864:20::1135]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1q3WHI-009MM9-0X for linux-arm-kernel@lists.infradead.org; Mon, 29 May 2023 06:22:49 +0000 Received: by mail-yw1-x1135.google.com with SMTP id 00721157ae682-565c7399afaso27085247b3.1 for ; Sun, 28 May 2023 23:22:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1685341366; x=1687933366; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=MHjeymamXVEDQ3Kw25JCka7yLmkzZOMAvMVAEydeLzM=; b=JnCv5o3gZCSjIaRpRDj860LJkDOANHqriyLGSv5pPF/RAyg9m3lQdupCXaTyei8DMf e3KqZBCpZIzZeMjk/yaxPctH6c49veiWRbUeGgx17gSNxkwyxqN1QFATHf6q6YutmR3t HvT+zDNBDGXXzqUl/LmcNCNapCFXfgc8yyTJIwdBnMkvFfp9LdRJqcRYSLqq8ynJ3LiF LXxpx3gFYAZnedKKq/SPuYQMj/8ol6u4/3db6BOFo/ME2oJub0w8Z8IEMye4507Zx0Ui smMsg0FV9ON+iRDLGBbSQlzdjRNF2P0/vV32Ee1vXvhsjJhpzkmQMG6hoZRiP5zjDNGY IC9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685341366; x=1687933366; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=MHjeymamXVEDQ3Kw25JCka7yLmkzZOMAvMVAEydeLzM=; b=Ibdt4/H2yr3ojH5AzoQTvBGeesb2Ld43WjFYI2HASXJVSBFotUGbpY2obHjn0uc9kN r21xiPLo8DPl+6tmcgFnoOGkY8wwK/Kb/2b29HauD5QbhNDs8L9zXoTmakEBplVbHbqx frYFBWI3lXxlY68wHbT2MSPkk7sE64YLJg+eDadTbsI9Afmb/coWmI43inxazvb28+Tj fI/Kh2JHl1CUAUUK2LAiDM41oeZ4YFqEOG8JvQNPXKZAEybTdnCddSm7FnAPvXRUkNZA 1ALdDRQUZURyUozwI9YuH5appBKSyxWhx2PAGA0VdHJd6BhkFZVwwqy7/0cGkA7xmK42 s1uw== X-Gm-Message-State: AC+VfDwVSY9+FRKijXWRArDh5exc6x2bYzObLOvZPgF9wEKYENvHTu0T MgkXv+HHdvIPixG8NvDnYqIdhw== X-Google-Smtp-Source: ACHHUZ4SIMI4er/AF0deXgjHIDsOF79qc5oA7wlkFnZQpWYe4q0dmwzIjbUF/Ql+fhbfUc3dtFgx4A== X-Received: by 2002:a81:a043:0:b0:560:beeb:6fc1 with SMTP id x64-20020a81a043000000b00560beeb6fc1mr13114394ywg.16.1685341365991; Sun, 28 May 2023 23:22:45 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id t66-20020a818345000000b00568938ca41bsm405426ywf.53.2023.05.28.23.22.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 28 May 2023 23:22:45 -0700 (PDT) Date: Sun, 28 May 2023 23:22:40 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Jann Horn , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 07/12] s390: add pte_free_defer(), with use of mmdrop_async() In-Reply-To: <35e983f5-7ed3-b310-d949-9ae8b130cdab@google.com> Message-ID: <6dd63b39-e71f-2e8b-7e0-83e02f3bcb39@google.com> References: <35e983f5-7ed3-b310-d949-9ae8b130cdab@google.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230528_232248_214517_5C96FC5A X-CRM114-Status: GOOD ( 21.35 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Add s390-specific pte_free_defer(), to call pte_free() via call_rcu(). pte_free_defer() will be called inside khugepaged's retract_page_tables() loop, where allocating extra memory cannot be relied upon. This precedes the generic version to avoid build breakage from incompatible pgtable_t. This version is more complicated than others: because page_table_free() needs to know which fragment is being freed, and which mm to link it to. page_table_free()'s fragment handling is clever, but I could too easily break it: what's done here in pte_free_defer() and pte_free_now() might be better integrated with page_table_free()'s cleverness, but not by me! By the time that page_table_free() gets called via RCU, it's conceivable that mm would already have been freed: so mmgrab() in pte_free_defer() and mmdrop() in pte_free_now(). No, that is not a good context to call mmdrop() from, so make mmdrop_async() public and use that. Signed-off-by: Hugh Dickins Reviewed-by: Gerald Schaefer --- arch/s390/include/asm/pgalloc.h | 4 ++++ arch/s390/mm/pgalloc.c | 34 +++++++++++++++++++++++++++++++++ include/linux/mm_types.h | 2 +- include/linux/sched/mm.h | 1 + kernel/fork.c | 2 +- 5 files changed, 41 insertions(+), 2 deletions(-) diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgalloc.h index 17eb618f1348..89a9d5ef94f8 100644 --- a/arch/s390/include/asm/pgalloc.h +++ b/arch/s390/include/asm/pgalloc.h @@ -143,6 +143,10 @@ static inline void pmd_populate(struct mm_struct *mm, #define pte_free_kernel(mm, pte) page_table_free(mm, (unsigned long *) pte) #define pte_free(mm, pte) page_table_free(mm, (unsigned long *) pte) +/* arch use pte_free_defer() implementation in arch/s390/mm/pgalloc.c */ +#define pte_free_defer pte_free_defer +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable); + void vmem_map_init(void); void *vmem_crst_alloc(unsigned long val); pte_t *vmem_pte_alloc(void); diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c index 66ab68db9842..0129de9addfd 100644 --- a/arch/s390/mm/pgalloc.c +++ b/arch/s390/mm/pgalloc.c @@ -346,6 +346,40 @@ void page_table_free(struct mm_struct *mm, unsigned long *table) __free_page(page); } +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +static void pte_free_now(struct rcu_head *head) +{ + struct page *page; + unsigned long mm_bit; + struct mm_struct *mm; + unsigned long *table; + + page = container_of(head, struct page, rcu_head); + table = (unsigned long *)page_to_virt(page); + mm_bit = (unsigned long)page->pt_mm; + /* 4K page has only two 2K fragments, but alignment allows eight */ + mm = (struct mm_struct *)(mm_bit & ~7); + table += PTRS_PER_PTE * (mm_bit & 7); + page_table_free(mm, table); + mmdrop_async(mm); +} + +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable) +{ + struct page *page; + unsigned long mm_bit; + + mmgrab(mm); + page = virt_to_page(pgtable); + /* Which 2K page table fragment of a 4K page? */ + mm_bit = ((unsigned long)pgtable & ~PAGE_MASK) / + (PTRS_PER_PTE * sizeof(pte_t)); + mm_bit += (unsigned long)mm; + page->pt_mm = (struct mm_struct *)mm_bit; + call_rcu(&page->rcu_head, pte_free_now); +} +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ + void page_table_free_rcu(struct mmu_gather *tlb, unsigned long *table, unsigned long vmaddr) { diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 306a3d1a0fa6..1667a1bdb8a8 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -146,7 +146,7 @@ struct page { pgtable_t pmd_huge_pte; /* protected by page->ptl */ unsigned long _pt_pad_2; /* mapping */ union { - struct mm_struct *pt_mm; /* x86 pgds only */ + struct mm_struct *pt_mm; /* x86 pgd, s390 */ atomic_t pt_frag_refcount; /* powerpc */ }; #if ALLOC_SPLIT_PTLOCKS diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index 8d89c8c4fac1..a9043d1a0d55 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -41,6 +41,7 @@ static inline void smp_mb__after_mmgrab(void) smp_mb__after_atomic(); } +extern void mmdrop_async(struct mm_struct *mm); extern void __mmdrop(struct mm_struct *mm); static inline void mmdrop(struct mm_struct *mm) diff --git a/kernel/fork.c b/kernel/fork.c index ed4e01daccaa..fa4486b65c56 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -942,7 +942,7 @@ static void mmdrop_async_fn(struct work_struct *work) __mmdrop(mm); } -static void mmdrop_async(struct mm_struct *mm) +void mmdrop_async(struct mm_struct *mm) { if (unlikely(atomic_dec_and_test(&mm->mm_count))) { INIT_WORK(&mm->async_put_work, mmdrop_async_fn);