From patchwork Thu Mar 21 22:08:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13599440 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7448BC6FD1F for ; Thu, 21 Mar 2024 22:10:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=Jvqw5eJEf2fhJ7f6p4STY/a6I19Wa1MCPjNvXu7y+Dg=; b=bTDGKxcHKHO7GS e1kp4zWRPglkARddgKAvnnZLf66F2y637iDZqUJNqIJ1LGfQY9B2X10b1RKB63KwJNV9w2sVN1Vwe fC5tGmUH05oeflcH9GrWq4E+qhjVydHrJ0Jz1Yc5qE6EdUyWcp9XPLoyG2EpneQpf2FdUSvnotqqO 2SLFOCT26aIyIAc5XjVVfSPBS1wSaLzOR485yI7mYgdyJ+4ZyBIk9AiFxoKzaYefk5yh2+TtddHxO FcxPySmIC14XP2lCVu8y7FeRaTASF5PmtzA6OaM97ovKBdo4hShGoXPfBcsT//ajqNb/VF9oGcFWW cKHr3IYbWKN6ut04eXdg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1rnQcR-00000004s0k-2owa; Thu, 21 Mar 2024 22:10:40 +0000 Received: from desiato.infradead.org ([2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1rnQaS-00000004qau-0DCA for linux-riscv@bombadil.infradead.org; Thu, 21 Mar 2024 22:08:37 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:Content-Transfer-Encoding :MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Sender:Reply-To:Content-ID:Content-Description; bh=ESTHlGxPxGuUFdX3p2oamk2h4NMABnT27/X9cpNES3o=; b=ipcCfDPUVdIrc1Wgp/ebe1uNMS DI/wnPBbvaDXGjskzN6BTc9ZMZwKd4wWjs1WyjLXRPLZl5Srcvt32D25rwuG5cB0lY0Azthb/XZ8X p79EQfTs2ZB7xBfVRaGr284gh1xK60GK1UtsXEtPEh4qGfLV935QnuezErIkazV3wlgRQiNdFQ55X KSyY3qqopSN8AorsOzBAXrJxuCDIN4Edzb71bG+MXqTOIzB/KnJEfb/Une6IFdL2SV8VBpCw0ivxj eIvGGcxKvj+VuOqFwSMC3KNnRKzF4b8O2VIHPGyr8EUNsbNsYW4EqWdVhe3nMgR2NpzProkAdvdia idtmHxRQ==; Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by desiato.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1rnQaM-0000000ESbV-224I for linux-riscv@lists.infradead.org; Thu, 21 Mar 2024 22:08:34 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1711058907; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ESTHlGxPxGuUFdX3p2oamk2h4NMABnT27/X9cpNES3o=; b=GVXnj9Ckm80zVguA9+LlKknu6DyFn+RzheXegfbuvO7XNvPDejVroweD0lryLpGf25BYY2 gsZTZBkwKAZsU0m/xqo2y5FUGFXgRfoYd9e+KI2B9Jx1oIQSw85n42J8q3MxdKMy1GPWbL 9yKLMnHcNzZEe8KCt+ysvdswMJQKFoo= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1711058907; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ESTHlGxPxGuUFdX3p2oamk2h4NMABnT27/X9cpNES3o=; b=GVXnj9Ckm80zVguA9+LlKknu6DyFn+RzheXegfbuvO7XNvPDejVroweD0lryLpGf25BYY2 gsZTZBkwKAZsU0m/xqo2y5FUGFXgRfoYd9e+KI2B9Jx1oIQSw85n42J8q3MxdKMy1GPWbL 9yKLMnHcNzZEe8KCt+ysvdswMJQKFoo= Received: from mail-ot1-f71.google.com (mail-ot1-f71.google.com [209.85.210.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-590-SgWLzvrgOfCDBQ6zRZ6Y0A-1; Thu, 21 Mar 2024 18:08:25 -0400 X-MC-Unique: SgWLzvrgOfCDBQ6zRZ6Y0A-1 Received: by mail-ot1-f71.google.com with SMTP id 46e09a7af769-6e691df3cd6so567538a34.0 for ; Thu, 21 Mar 2024 15:08:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711058905; x=1711663705; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ESTHlGxPxGuUFdX3p2oamk2h4NMABnT27/X9cpNES3o=; b=IhK6x+ZKkadcWgGbARy5qKcTtLdxbejZd2FwExckuA9rxNY0l15EPQWJrkYIvw5Wfw p/FEWAdCnEoTJtsnqL54sdpskjUnubzdPnx+9pyxMBaQpDgFQlOEi16fropHNTk7vMpP hrlVhA7LRgEeN0Du7af30sOqcOydx8jFXVGW3yTXuU8AzsIzowcmR7S3/spnjNCIxPna TgAi4tSVrOb9danLhasJoL1z6AhpiGH2D6/Oq3Wc03HHqD+Zm7a8wt3FUtAK6euE6Uuz LK0lypJEdtXObxHYMP2OAalDTBGbpQcw9/g5Cy9hOfdpF7LSTtsLrWCJie4kgjtgBwzp p1oQ== X-Forwarded-Encrypted: i=1; AJvYcCVgucdzyaqzwVAwRBLBvi6uJ01JerTSiQMcKfzGUHMe6uKZboNgBq/dOKMgLX6GRXhtoEsQtRdKpg15iimvNof38g8Vxz9LWsrsygUxtxAk X-Gm-Message-State: AOJu0YzQ/S+uchKewlgK6Th3JS/cp2wVzftB+t/o/xUY+SusadXKPPy9 C4/JcAaynJOmDNdc/J7Haz9t/zaE/zK8SeIm/Z1tRbDJ021zC7/uGRUJjZ6GtladhM2nVXRtdEC AgpVsBK6gpfXxTExNJGf02vh14/7Cy2heRKDseJiUxD/iGPG174Tgcu7flcrVVJnOpw== X-Received: by 2002:a4a:ca85:0:b0:5a5:1645:90f4 with SMTP id x5-20020a4aca85000000b005a5164590f4mr622772ooq.1.1711058904617; Thu, 21 Mar 2024 15:08:24 -0700 (PDT) X-Google-Smtp-Source: AGHT+IE4H/bBBZLy/REcdrgCXmaDlClCIl1Ey/SRhhRBkoU7cfytCZoQGoym/o3YpdiOnUWtv8D4UQ== X-Received: by 2002:a4a:ca85:0:b0:5a5:1645:90f4 with SMTP id x5-20020a4aca85000000b005a5164590f4mr622720ooq.1.1711058904123; Thu, 21 Mar 2024 15:08:24 -0700 (PDT) Received: from x1n.redhat.com ([99.254.121.117]) by smtp.gmail.com with ESMTPSA id o6-20020a0562140e4600b00690baf5cde9sm351663qvc.118.2024.03.21.15.08.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 Mar 2024 15:08:23 -0700 (PDT) From: peterx@redhat.com To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org, Michael Ellerman , Christophe Leroy , Matthew Wilcox , Rik van Riel , Lorenzo Stoakes , Axel Rasmussen , peterx@redhat.com, Yang Shi , John Hubbard , linux-arm-kernel@lists.infradead.org, "Kirill A . Shutemov" , Andrew Jones , Vlastimil Babka , Mike Rapoport , Andrew Morton , Muchun Song , Christoph Hellwig , linux-riscv@lists.infradead.org, James Houghton , David Hildenbrand , Jason Gunthorpe , Andrea Arcangeli , "Aneesh Kumar K . V" , Mike Kravetz Subject: [PATCH v3 10/12] mm/gup: Handle huge pmd for follow_pmd_mask() Date: Thu, 21 Mar 2024 18:08:00 -0400 Message-ID: <20240321220802.679544-11-peterx@redhat.com> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20240321220802.679544-1-peterx@redhat.com> References: <20240321220802.679544-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240321_220831_199020_82EA459B X-CRM114-Status: GOOD ( 23.58 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org From: Peter Xu Replace pmd_trans_huge() with pmd_leaf() to also cover pmd_huge() as long as enabled. FOLL_TOUCH and FOLL_SPLIT_PMD only apply to THP, not yet huge. Since now follow_trans_huge_pmd() can process hugetlb pages, renaming it into follow_huge_pmd() to match what it does. Move it into gup.c so not depend on CONFIG_THP. When at it, move the ctx->page_mask setup into follow_huge_pmd(), only set it when the page is valid. It was not a bug to set it before even if GUP failed (page==NULL), because follow_page_mask() callers always ignores page_mask if so. But doing so makes the code cleaner. Reviewed-by: Jason Gunthorpe Signed-off-by: Peter Xu --- mm/gup.c | 107 ++++++++++++++++++++++++++++++++++++++++++++--- mm/huge_memory.c | 86 +------------------------------------ mm/internal.h | 5 +-- 3 files changed, 105 insertions(+), 93 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index ae21afb9434e..00cdf4cb0cd4 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -580,6 +580,93 @@ static struct page *follow_huge_pud(struct vm_area_struct *vma, return page; } + +/* FOLL_FORCE can write to even unwritable PMDs in COW mappings. */ +static inline bool can_follow_write_pmd(pmd_t pmd, struct page *page, + struct vm_area_struct *vma, + unsigned int flags) +{ + /* If the pmd is writable, we can write to the page. */ + if (pmd_write(pmd)) + return true; + + /* Maybe FOLL_FORCE is set to override it? */ + if (!(flags & FOLL_FORCE)) + return false; + + /* But FOLL_FORCE has no effect on shared mappings */ + if (vma->vm_flags & (VM_MAYSHARE | VM_SHARED)) + return false; + + /* ... or read-only private ones */ + if (!(vma->vm_flags & VM_MAYWRITE)) + return false; + + /* ... or already writable ones that just need to take a write fault */ + if (vma->vm_flags & VM_WRITE) + return false; + + /* + * See can_change_pte_writable(): we broke COW and could map the page + * writable if we have an exclusive anonymous page ... + */ + if (!page || !PageAnon(page) || !PageAnonExclusive(page)) + return false; + + /* ... and a write-fault isn't required for other reasons. */ + if (vma_soft_dirty_enabled(vma) && !pmd_soft_dirty(pmd)) + return false; + return !userfaultfd_huge_pmd_wp(vma, pmd); +} + +static struct page *follow_huge_pmd(struct vm_area_struct *vma, + unsigned long addr, pmd_t *pmd, + unsigned int flags, + struct follow_page_context *ctx) +{ + struct mm_struct *mm = vma->vm_mm; + pmd_t pmdval = *pmd; + struct page *page; + int ret; + + assert_spin_locked(pmd_lockptr(mm, pmd)); + + page = pmd_page(pmdval); + VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page); + + if ((flags & FOLL_WRITE) && + !can_follow_write_pmd(pmdval, page, vma, flags)) + return NULL; + + /* Avoid dumping huge zero page */ + if ((flags & FOLL_DUMP) && is_huge_zero_pmd(pmdval)) + return ERR_PTR(-EFAULT); + + if (pmd_protnone(*pmd) && !gup_can_follow_protnone(vma, flags)) + return NULL; + + if (!pmd_write(pmdval) && gup_must_unshare(vma, flags, page)) + return ERR_PTR(-EMLINK); + + VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) && + !PageAnonExclusive(page), page); + + ret = try_grab_page(page, flags); + if (ret) + return ERR_PTR(ret); + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + if (pmd_trans_huge(pmdval) && (flags & FOLL_TOUCH)) + touch_pmd(vma, addr, pmd, flags & FOLL_WRITE); +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ + + page += (addr & ~HPAGE_PMD_MASK) >> PAGE_SHIFT; + ctx->page_mask = HPAGE_PMD_NR - 1; + VM_BUG_ON_PAGE(!PageCompound(page) && !is_zone_device_page(page), page); + + return page; +} + #else /* CONFIG_PGTABLE_HAS_HUGE_LEAVES */ static struct page *follow_huge_pud(struct vm_area_struct *vma, unsigned long addr, pud_t *pudp, @@ -587,6 +674,14 @@ static struct page *follow_huge_pud(struct vm_area_struct *vma, { return NULL; } + +static struct page *follow_huge_pmd(struct vm_area_struct *vma, + unsigned long addr, pmd_t *pmd, + unsigned int flags, + struct follow_page_context *ctx) +{ + return NULL; +} #endif /* CONFIG_PGTABLE_HAS_HUGE_LEAVES */ static int follow_pfn_pte(struct vm_area_struct *vma, unsigned long address, @@ -784,31 +879,31 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, return page; return no_page_table(vma, flags, address); } - if (likely(!pmd_trans_huge(pmdval))) + if (likely(!pmd_leaf(pmdval))) return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); if (pmd_protnone(pmdval) && !gup_can_follow_protnone(vma, flags)) return no_page_table(vma, flags, address); ptl = pmd_lock(mm, pmd); - if (unlikely(!pmd_present(*pmd))) { + pmdval = *pmd; + if (unlikely(!pmd_present(pmdval))) { spin_unlock(ptl); return no_page_table(vma, flags, address); } - if (unlikely(!pmd_trans_huge(*pmd))) { + if (unlikely(!pmd_leaf(pmdval))) { spin_unlock(ptl); return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); } - if (flags & FOLL_SPLIT_PMD) { + if (pmd_trans_huge(pmdval) && (flags & FOLL_SPLIT_PMD)) { spin_unlock(ptl); split_huge_pmd(vma, pmd, address); /* If pmd was left empty, stuff a page table in there quickly */ return pte_alloc(mm, pmd) ? ERR_PTR(-ENOMEM) : follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); } - page = follow_trans_huge_pmd(vma, address, pmd, flags); + page = follow_huge_pmd(vma, address, pmd, flags, ctx); spin_unlock(ptl); - ctx->page_mask = HPAGE_PMD_NR - 1; return page; } diff --git a/mm/huge_memory.c b/mm/huge_memory.c index f8bd2012bc27..e747dacb5051 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1206,8 +1206,8 @@ vm_fault_t vmf_insert_pfn_pud(struct vm_fault *vmf, pfn_t pfn, bool write) EXPORT_SYMBOL_GPL(vmf_insert_pfn_pud); #endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ -static void touch_pmd(struct vm_area_struct *vma, unsigned long addr, - pmd_t *pmd, bool write) +void touch_pmd(struct vm_area_struct *vma, unsigned long addr, + pmd_t *pmd, bool write) { pmd_t _pmd; @@ -1562,88 +1562,6 @@ static inline bool can_change_pmd_writable(struct vm_area_struct *vma, return pmd_dirty(pmd); } -/* FOLL_FORCE can write to even unwritable PMDs in COW mappings. */ -static inline bool can_follow_write_pmd(pmd_t pmd, struct page *page, - struct vm_area_struct *vma, - unsigned int flags) -{ - /* If the pmd is writable, we can write to the page. */ - if (pmd_write(pmd)) - return true; - - /* Maybe FOLL_FORCE is set to override it? */ - if (!(flags & FOLL_FORCE)) - return false; - - /* But FOLL_FORCE has no effect on shared mappings */ - if (vma->vm_flags & (VM_MAYSHARE | VM_SHARED)) - return false; - - /* ... or read-only private ones */ - if (!(vma->vm_flags & VM_MAYWRITE)) - return false; - - /* ... or already writable ones that just need to take a write fault */ - if (vma->vm_flags & VM_WRITE) - return false; - - /* - * See can_change_pte_writable(): we broke COW and could map the page - * writable if we have an exclusive anonymous page ... - */ - if (!page || !PageAnon(page) || !PageAnonExclusive(page)) - return false; - - /* ... and a write-fault isn't required for other reasons. */ - if (vma_soft_dirty_enabled(vma) && !pmd_soft_dirty(pmd)) - return false; - return !userfaultfd_huge_pmd_wp(vma, pmd); -} - -struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, - unsigned long addr, - pmd_t *pmd, - unsigned int flags) -{ - struct mm_struct *mm = vma->vm_mm; - struct page *page; - int ret; - - assert_spin_locked(pmd_lockptr(mm, pmd)); - - page = pmd_page(*pmd); - VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page); - - if ((flags & FOLL_WRITE) && - !can_follow_write_pmd(*pmd, page, vma, flags)) - return NULL; - - /* Avoid dumping huge zero page */ - if ((flags & FOLL_DUMP) && is_huge_zero_pmd(*pmd)) - return ERR_PTR(-EFAULT); - - if (pmd_protnone(*pmd) && !gup_can_follow_protnone(vma, flags)) - return NULL; - - if (!pmd_write(*pmd) && gup_must_unshare(vma, flags, page)) - return ERR_PTR(-EMLINK); - - VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) && - !PageAnonExclusive(page), page); - - ret = try_grab_page(page, flags); - if (ret) - return ERR_PTR(ret); - - if (flags & FOLL_TOUCH) - touch_pmd(vma, addr, pmd, flags & FOLL_WRITE); - - page += (addr & ~HPAGE_PMD_MASK) >> PAGE_SHIFT; - VM_BUG_ON_PAGE(!PageCompound(page) && !is_zone_device_page(page), page); - - return page; -} - /* NUMA hinting page fault entry point for trans huge pmds */ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf) { diff --git a/mm/internal.h b/mm/internal.h index 63e4f6e001be..d47862e6d968 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1104,9 +1104,8 @@ int __must_check try_grab_page(struct page *page, unsigned int flags); */ void touch_pud(struct vm_area_struct *vma, unsigned long addr, pud_t *pud, bool write); -struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, - unsigned long addr, pmd_t *pmd, - unsigned int flags); +void touch_pmd(struct vm_area_struct *vma, unsigned long addr, + pmd_t *pmd, bool write); #ifdef CONFIG_MEMCG static inline