From patchwork Fri Dec 6 14:49:10 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Guillaume Morin X-Patchwork-Id: 13897274 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6BC58E77179 for ; Fri, 6 Dec 2024 14:49:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DD6D96B028A; Fri, 6 Dec 2024 09:49:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D603D6B028B; Fri, 6 Dec 2024 09:49:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C27FD6B028C; Fri, 6 Dec 2024 09:49:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id A45956B028A for ; Fri, 6 Dec 2024 09:49:23 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 665B942A2C for ; Fri, 6 Dec 2024 14:49:23 +0000 (UTC) X-FDA: 82864817070.11.00C0FC3 Received: from smtp2-g21.free.fr (smtp2-g21.free.fr [212.27.42.2]) by imf13.hostedemail.com (Postfix) with ESMTP id 004172000B for ; Fri, 6 Dec 2024 14:49:02 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=morinfr.org header.s=20170427 header.b=yJ8Im3m5; spf=pass (imf13.hostedemail.com: domain of guillaume@morinfr.org designates 212.27.42.2 as permitted sender) smtp.mailfrom=guillaume@morinfr.org; dmarc=pass (policy=quarantine) header.from=morinfr.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733496550; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=rtW+89OL3kHNY2f3ZFBSLcpZsPRLhfKsgEViwWMhwZg=; b=msalFnHRR+B2yJ2aGI8k3M9n3BrBxU46GZJhB9LQUOgbw1PkDK4t63l4ubs5C4k2z+RBVM 6jt4TIfjGMKaJALzJ2D8UEYJskH2aQg/LIJre1XGXwvtuJ4ARIB7gAKhl7gtK6Y3fsc4Z2 1znGlwIdxI+p0HOLd/YxpEGXpZWzD4Y= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=morinfr.org header.s=20170427 header.b=yJ8Im3m5; spf=pass (imf13.hostedemail.com: domain of guillaume@morinfr.org designates 212.27.42.2 as permitted sender) smtp.mailfrom=guillaume@morinfr.org; dmarc=pass (policy=quarantine) header.from=morinfr.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733496550; a=rsa-sha256; cv=none; b=AQT5gpmm8w7b5rYEa7PVabnagEIdkeYBQkNyA8zg7RGr2z11H2ow9RgrUkPZSSxy4wcC68 oh2eREwnBxByTRlTDVT84WDYODJgEI1WLYld8T8LNbKY6wbomj6l/1roVRs26nlNEcu/RJ 8UnJ4fG/Z6SARvxRKZIj0Jpe+fL/jG0= Received: from bender.morinfr.org (unknown [82.66.66.112]) by smtp2-g21.free.fr (Postfix) with ESMTPS id 22CA92003C3; Fri, 6 Dec 2024 15:49:11 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=morinfr.org ; s=20170427; h=Content-Type:MIME-Version:Message-ID:Subject:Cc:To:From:Date: Sender:Reply-To:Content-Transfer-Encoding:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: In-Reply-To:References:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=rtW+89OL3kHNY2f3ZFBSLcpZsPRLhfKsgEViwWMhwZg=; b=yJ8Im3m5RDuXNWQxJmxR93WRXI JH8TIUStpjIje6UtgeBZFQXmJ8R+TqI1AiDrmt/eppAyMaaNc9CMFgCd+TOhwe8UOFhNr9l/82eJj yjtWd6dqfFPCItc5ctxLR3ULr6zO+EM2LJNWjE7AtHubSvm8sPiOlMeWG8MPNwbnm488=; Received: from guillaum by bender.morinfr.org with local (Exim 4.96) (envelope-from ) id 1tJZdm-002IqN-1W; Fri, 06 Dec 2024 15:49:10 +0100 Date: Fri, 6 Dec 2024 15:49:10 +0100 From: Guillaume Morin To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, Muchun Song , Andrew Morton , Peter Xu , David Hildenbrand , Eric Hagberg Subject: [PATCH v4] mm/hugetlb: support FOLL_FORCE|FOLL_WRITE Message-ID: MIME-Version: 1.0 Content-Disposition: inline X-Rspamd-Server: rspam05 X-Stat-Signature: c7abxotp4edf4kxxioyi9tofm97dn4su X-Rspamd-Queue-Id: 004172000B X-Rspam-User: X-HE-Tag: 1733496542-109285 X-HE-Meta: U2FsdGVkX1/BtMKIbqXw213G3o4smc1oJ+ALwKIGf6UYIOfS3kBUscuMHC/e4az667+P1BHyJ1NHhvLA6afTiP5EuySbkqrrN169AZRAWFQYN01KbXwOT3hQR+QZY+A9mXRQDE1pIqorKThTCPErWa3dXLvwcpxS2K+cy3wDndqoc2+fqGC/zI+0l7hWEV7mDeGPTsF1Q9RKEBlGj3FBbuk77YE7oi4ceMt75FkNOTp/rM4/ZZLghEaYYpdmGkC84ZN91JuEiHHbaKEkR/AvqYX+Cq18FlAFpQjnEzS5dD1O5lxHxjkTdIDiMVOeLZ1sqjxqJvGiWXenFDgVg6w2lW0HK9T/XEtqujOqcfi+tA945/CjjibCHDiZ064D7t8lBP4vsYmVvBi1Jav8kb0AsxFRnaXd8cZ2RxsSKgDEwPQvdG3jQleANYjGwwYTUjE8pPTy1MC9LEIC/zyAdx6adfbBwn1+WejjikLh0uE4sHfl+6x4qCK6riN4HOuyOuy7cskQdquPro19JnbFjZaBUu6XGBIevQYduQb3C/WzalZ3AO5mzCqBkmwkgvKKIDEfvLrg8PD97sQ65nKl9b2UMWWPaSsLs2qr+cukEMumxoX2Ob9/GKUunYYOLN4JJUigqoUyOK5Y1cRIY/xgNZpi7mRxJGLkMrYeCH8g9u1dnFa+uFbLSkNtdn4TGaBp8F4Q/ZSdJTX47ViE7M52suJCQUNpWOndW2cw69/jtkxB47xN4C7fOaJFy7pCcT4G/YcAesmC6AR2D2KXjIY0P9yd52ezecJHR+riP3M0hoWm3Heuq/Z6BPQvRxEDHVsWFQadF6rzzBpCNXkjXs1Xdwmlsu9Rp82+gVoaRruf0a1DHyV09uSHAe6LoJ0/USge64Rot9uZj1qbIMZdV4SP1sG1OV34mYCvNLbAzcnB2+GZNEffILXHsxkNBcjqr3ZiFtQGnNmo81wSfW1eLK8fpf/ p5bf6gDe pNoeMP256hctjJjYkxReVHiTR+RxS+uEtO+NmqTESbnYgwIR60uURy+bmG6H8edqOeHV/OZxISQ14yqxFkh8hintIHjTwRChjNBIV98CKE8b6dbwLa6vf7Ebrai/Vq+JOHGa6yj18pV1X7m+OYEyowkHHKVuZUwoiGaNSCj5KFODk/ysr0Ar0uJGn/K4KvBY/uDja6T8C3Kd6n4DBan67O7I39+vpKAemCOyzHWt4OgSxih0g8wyM1OPaO3h1qEU62s98yCMgx5aryhSmhtTMc5Iv1Ly4X21zywGPJ9zJdrS4h/mxnuDWV4B3lMBrcsz9IV7Av7063kpDxjro338zWyVDULOc9LTTwtrKCJarsmL2RbctrX9QHYpoDm0pO5FBBOFB5UxBXXJODX5dl/b9FqqVyjpDtwA+arc0Bg4I4Y7MLxL2L4fJDTa3zrh/vTCvB/zDnCfJO35rc76gAvjvvPf8InOb4kJWmFNfQlNSb6hBT/4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Eric reported that PTRACE_POKETEXT fails when applications use hugetlb for mapping text using huge pages. Before commit 1d8d14641fd9 ("mm/hugetlb: support write-faults in shared mappings"), PTRACE_POKETEXT worked by accident, but it was buggy and silently ended up mapping pages writable into the page tables even though VM_WRITE was not set. In general, FOLL_FORCE|FOLL_WRITE does currently not work with hugetlb. Let's implement FOLL_FORCE|FOLL_WRITE properly for hugetlb, such that what used to work in the past by accident now properly works, allowing applications using hugetlb for text etc. to get properly debugged. This change might also be required to implement uprobes support for hugetlb [1]. [1] https://lore.kernel.org/lkml/ZiK50qob9yl5e0Xz@bender.morinfr.org/ Cc: Muchun Song Cc: Andrew Morton Cc: Peter Xu Cc: David Hildenbrand Cc: Eric Hagberg Signed-off-by: Guillaume Morin --- Changes in v2: - Improved commit message Changes in v3: - Fix potential unitialized mem access in follow_huge_pud - define pud_soft_dirty when soft dirty is not enabled Changes in v4: - Remove the soft dirty pud check - Remove the pud_soft_dirty added in v3 mm/gup.c | 95 +++++++++++++++++++++++++--------------------------- mm/hugetlb.c | 20 ++++++----- 2 files changed, 57 insertions(+), 58 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index 746070a1d8bf..63c705ff4162 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -587,6 +587,33 @@ static struct folio *try_grab_folio_fast(struct page *page, int refs, } #endif /* CONFIG_HAVE_GUP_FAST */ +/* Common code for can_follow_write_* */ +static inline bool can_follow_write_common(struct page *page, + struct vm_area_struct *vma, unsigned int flags) +{ + /* Maybe FOLL_FORCE is set to override it? */ + if (!(flags & FOLL_FORCE)) + return false; + + /* But FOLL_FORCE has no effect on shared mappings */ + if (vma->vm_flags & (VM_MAYSHARE | VM_SHARED)) + return false; + + /* ... or read-only private ones */ + if (!(vma->vm_flags & VM_MAYWRITE)) + return false; + + /* ... or already writable ones that just need to take a write fault */ + if (vma->vm_flags & VM_WRITE) + return false; + + /* + * See can_change_pte_writable(): we broke COW and could map the page + * writable if we have an exclusive anonymous page ... + */ + return page && PageAnon(page) && PageAnonExclusive(page); +} + static struct page *no_page_table(struct vm_area_struct *vma, unsigned int flags, unsigned long address) { @@ -613,6 +640,18 @@ static struct page *no_page_table(struct vm_area_struct *vma, } #ifdef CONFIG_PGTABLE_HAS_HUGE_LEAVES +/* FOLL_FORCE can write to even unwritable PUDs in COW mappings. */ +static inline bool can_follow_write_pud(pud_t pud, struct page *page, + struct vm_area_struct *vma, + unsigned int flags) +{ + /* If the pud is writable, we can write to the page. */ + if (pud_write(pud)) + return true; + + return can_follow_write_common(page, vma, flags); +} + static struct page *follow_huge_pud(struct vm_area_struct *vma, unsigned long addr, pud_t *pudp, int flags, struct follow_page_context *ctx) @@ -625,13 +664,16 @@ static struct page *follow_huge_pud(struct vm_area_struct *vma, assert_spin_locked(pud_lockptr(mm, pudp)); - if ((flags & FOLL_WRITE) && !pud_write(pud)) + pfn += (addr & ~PUD_MASK) >> PAGE_SHIFT; + page = pfn_to_page(pfn); + + if ((flags & FOLL_WRITE) && + !can_follow_write_pud(pud, page, vma, flags)) return NULL; if (!pud_present(pud)) return NULL; - pfn += (addr & ~PUD_MASK) >> PAGE_SHIFT; if (IS_ENABLED(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) && pud_devmap(pud)) { @@ -653,8 +695,6 @@ static struct page *follow_huge_pud(struct vm_area_struct *vma, return ERR_PTR(-EFAULT); } - page = pfn_to_page(pfn); - if (!pud_devmap(pud) && !pud_write(pud) && gup_must_unshare(vma, flags, page)) return ERR_PTR(-EMLINK); @@ -677,27 +717,7 @@ static inline bool can_follow_write_pmd(pmd_t pmd, struct page *page, if (pmd_write(pmd)) return true; - /* Maybe FOLL_FORCE is set to override it? */ - if (!(flags & FOLL_FORCE)) - return false; - - /* But FOLL_FORCE has no effect on shared mappings */ - if (vma->vm_flags & (VM_MAYSHARE | VM_SHARED)) - return false; - - /* ... or read-only private ones */ - if (!(vma->vm_flags & VM_MAYWRITE)) - return false; - - /* ... or already writable ones that just need to take a write fault */ - if (vma->vm_flags & VM_WRITE) - return false; - - /* - * See can_change_pte_writable(): we broke COW and could map the page - * writable if we have an exclusive anonymous page ... - */ - if (!page || !PageAnon(page) || !PageAnonExclusive(page)) + if (!can_follow_write_common(page, vma, flags)) return false; /* ... and a write-fault isn't required for other reasons. */ @@ -798,27 +818,7 @@ static inline bool can_follow_write_pte(pte_t pte, struct page *page, if (pte_write(pte)) return true; - /* Maybe FOLL_FORCE is set to override it? */ - if (!(flags & FOLL_FORCE)) - return false; - - /* But FOLL_FORCE has no effect on shared mappings */ - if (vma->vm_flags & (VM_MAYSHARE | VM_SHARED)) - return false; - - /* ... or read-only private ones */ - if (!(vma->vm_flags & VM_MAYWRITE)) - return false; - - /* ... or already writable ones that just need to take a write fault */ - if (vma->vm_flags & VM_WRITE) - return false; - - /* - * See can_change_pte_writable(): we broke COW and could map the page - * writable if we have an exclusive anonymous page ... - */ - if (!page || !PageAnon(page) || !PageAnonExclusive(page)) + if (!can_follow_write_common(page, vma, flags)) return false; /* ... and a write-fault isn't required for other reasons. */ @@ -1285,9 +1285,6 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) if (!(vm_flags & VM_WRITE) || (vm_flags & VM_SHADOW_STACK)) { if (!(gup_flags & FOLL_FORCE)) return -EFAULT; - /* hugetlb does not support FOLL_FORCE|FOLL_WRITE. */ - if (is_vm_hugetlb_page(vma)) - return -EFAULT; /* * We used to let the write,force case do COW in a * VM_MAYWRITE VM_SHARED !VM_WRITE vma, so ptrace could diff --git a/mm/hugetlb.c b/mm/hugetlb.c index ea2ed8e301ef..52517b7ce308 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5169,6 +5169,13 @@ static void set_huge_ptep_writable(struct vm_area_struct *vma, update_mmu_cache(vma, address, ptep); } +static void set_huge_ptep_maybe_writable(struct vm_area_struct *vma, + unsigned long address, pte_t *ptep) +{ + if (vma->vm_flags & VM_WRITE) + set_huge_ptep_writable(vma, address, ptep); +} + bool is_hugetlb_entry_migration(pte_t pte) { swp_entry_t swp; @@ -5802,13 +5809,6 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, if (!unshare && huge_pte_uffd_wp(pte)) return 0; - /* - * hugetlb does not support FOLL_FORCE-style write faults that keep the - * PTE mapped R/O such as maybe_mkwrite() would do. - */ - if (WARN_ON_ONCE(!unshare && !(vma->vm_flags & VM_WRITE))) - return VM_FAULT_SIGSEGV; - /* Let's take out MAP_SHARED mappings first. */ if (vma->vm_flags & VM_MAYSHARE) { set_huge_ptep_writable(vma, vmf->address, vmf->pte); @@ -5837,7 +5837,8 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, SetPageAnonExclusive(&old_folio->page); } if (likely(!unshare)) - set_huge_ptep_writable(vma, vmf->address, vmf->pte); + set_huge_ptep_maybe_writable(vma, vmf->address, + vmf->pte); delayacct_wpcopy_end(); return 0; @@ -5943,7 +5944,8 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, spin_lock(vmf->ptl); vmf->pte = hugetlb_walk(vma, vmf->address, huge_page_size(h)); if (likely(vmf->pte && pte_same(huge_ptep_get(mm, vmf->address, vmf->pte), pte))) { - pte_t newpte = make_huge_pte(vma, &new_folio->page, !unshare); + const bool writable = !unshare && (vma->vm_flags & VM_WRITE); + pte_t newpte = make_huge_pte(vma, &new_folio->page, writable); /* Break COW or unshare */ huge_ptep_clear_flush(vma, vmf->address, vmf->pte);