diff mbox series

[v1] hugetlb: support FOLL_FORCE|FOLL_WRITE

Message ID Z1Ce6j5WiBE3kaGf@bender.morinfr.org (mailing list archive)
State New
Headers show
Series [v1] hugetlb: support FOLL_FORCE|FOLL_WRITE | expand

Commit Message

Guillaume Morin Dec. 4, 2024, 6:26 p.m. UTC
FOLL_FORCE|FOLL_WRITE has never been properly supported for hugetlb
mappings.  Since 1d8d14641fd94, we explicitly reject it. However
running software on hugetlb mappings is a useful optimization.
Multiple tools allow to use that such as Intel iodlr or
libhugetlbfs.

Cc: Muchun Song <muchun.song@linux.dev>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Eric Hagberg <ehagberg@janestreet.com>
Signed-off-by: Guillaume Morin <guillaume@morinfr.org>
---
 mm/gup.c     | 93 ++++++++++++++++++++++++++--------------------------
 mm/hugetlb.c | 20 ++++++-----
 2 files changed, 58 insertions(+), 55 deletions(-)

Comments

David Hildenbrand Dec. 4, 2024, 7:01 p.m. UTC | #1
On 04.12.24 19:26, Guillaume Morin wrote:

Patch prefix should likely be "mm/hugetlb: ..."

> FOLL_FORCE|FOLL_WRITE has never been properly supported for hugetlb
> mappings.  Since 1d8d14641fd94, we explicitly reject it. However

"Since commit 1d8d14641fd9 ("mm/hugetlb: support write-faults in shared 
mappings") ..."

> running software on hugetlb mappings is a useful optimization.
> Multiple tools allow to use that such as Intel iodlr or
> libhugetlbfs.

It would be better to link to the actual request where people ran into 
that when using PTRACE_POKETEXT

That hugetlb is getting used is rather obvious :)

> 
> Cc: Muchun Song <muchun.song@linux.dev>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Eric Hagberg <ehagberg@janestreet.com>
> Signed-off-by: Guillaume Morin <guillaume@morinfr.org>
> ---

[...]

>   		delayacct_wpcopy_end();
>   		return 0;
> @@ -5943,7 +5944,8 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio,
>   	spin_lock(vmf->ptl);
>   	vmf->pte = hugetlb_walk(vma, vmf->address, huge_page_size(h));
>   	if (likely(vmf->pte && pte_same(huge_ptep_get(mm, vmf->address, vmf->pte), pte))) {
> -		pte_t newpte = make_huge_pte(vma, &new_folio->page, !unshare);
> +		const bool writable = !unshare && (vma->vm_flags & VM_WRITE);
> +		pte_t newpte = make_huge_pte(vma, &new_folio->page, writable);
>   
>   		/* Break COW or unshare */
>   		huge_ptep_clear_flush(vma, vmf->address, vmf->pte);

After rebasing to [1] this hunk here can likely be dropped. 
make_huge_pte() will perform the VM_WRITE check.


[1] https://lkml.kernel.org/r/20241204153100.1967364-1-david@redhat.com
Guillaume Morin Dec. 4, 2024, 7:13 p.m. UTC | #2
On 04 Dec 20:01, David Hildenbrand wrote:
>
> On 04.12.24 19:26, Guillaume Morin wrote:
> 
> Patch prefix should likely be "mm/hugetlb: ..."
> 
> > FOLL_FORCE|FOLL_WRITE has never been properly supported for hugetlb
> > mappings.  Since 1d8d14641fd94, we explicitly reject it. However
> 
> "Since commit 1d8d14641fd9 ("mm/hugetlb: support write-faults in shared
> mappings") ..."

Will fix in v2.

> 
> > running software on hugetlb mappings is a useful optimization.
> > Multiple tools allow to use that such as Intel iodlr or
> > libhugetlbfs.
> 
> It would be better to link to the actual request where people ran into that
> when using PTRACE_POKETEXT
> 
> That hugetlb is getting used is rather obvious :)

Well, allow me to point out that I said running software on a hugetlb
mapping, not generally using hugetlb.

That said, which link are you referring to? The only discussion I am
aware of is off mailing lists.

Guillaume.
David Hildenbrand Dec. 4, 2024, 7:30 p.m. UTC | #3
On 04.12.24 20:13, Guillaume Morin wrote:
> On 04 Dec 20:01, David Hildenbrand wrote:
>>
>> On 04.12.24 19:26, Guillaume Morin wrote:
>>
>> Patch prefix should likely be "mm/hugetlb: ..."
>>
>>> FOLL_FORCE|FOLL_WRITE has never been properly supported for hugetlb
>>> mappings.  Since 1d8d14641fd94, we explicitly reject it. However
>>
>> "Since commit 1d8d14641fd9 ("mm/hugetlb: support write-faults in shared
>> mappings") ..."
> 
> Will fix in v2.
> 
>>
>>> running software on hugetlb mappings is a useful optimization.
>>> Multiple tools allow to use that such as Intel iodlr or
>>> libhugetlbfs.
>>
>> It would be better to link to the actual request where people ran into that
>> when using PTRACE_POKETEXT
>>
>> That hugetlb is getting used is rather obvious :)
> 
> Well, allow me to point out that I said running software on a hugetlb
> mapping, not generally using hugetlb.

Well, yes, but that's not really big news. People have been doing that 
(and rewriting their apps using libhugetlbfs to place text on huge pages 
pre file THP) for decades. :)

See below.

> 
> That said, which link are you referring to? The only discussion I am
> aware of is off mailing lists.

Oh, indeed, I could have sworn it was public.

I'd write something like

"Eric reported that PTRACE_POKETEXT fails when applications use hugetlb 
for mapping text using huge pages. Before commit 1d8d14641fd9, 
PTRACE_POKETEXT worked by accident, but it was buggy and silently ended 
up mapping pages writable into the page tables even though VM_WRITE was 
not set.

In general, FOLL_FORCE|FOLL_WRITE does currently not work with hugetlb. 
Let's implement FOLL_FORCE|FOLL_WRITE properly for hugetlb, such that 
what used to work in the past by accident now properly works, allowing 
applications using hugetlb for text etc. to get properly debugged.

This change might also be required to implement uprobes support for 
hugetlb [1].

[1] link to our discussion
"
kernel test robot Dec. 5, 2024, 12:39 a.m. UTC | #4
Hi Guillaume,

kernel test robot noticed the following build errors:

[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on linus/master v6.13-rc1 next-20241204]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Guillaume-Morin/hugetlb-support-FOLL_FORCE-FOLL_WRITE/20241205-022843
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/Z1Ce6j5WiBE3kaGf%40bender.morinfr.org
patch subject: [PATCH v1] hugetlb: support FOLL_FORCE|FOLL_WRITE
config: i386-buildonly-randconfig-004 (https://download.01.org/0day-ci/archive/20241205/202412050840.umPPa7cK-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241205/202412050840.umPPa7cK-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202412050840.umPPa7cK-lkp@intel.com/

All errors (new ones prefixed by >>):

   mm/gup.c: In function 'can_follow_write_pud':
>> mm/gup.c:665:48: error: implicit declaration of function 'pud_soft_dirty'; did you mean 'pmd_soft_dirty'? [-Werror=implicit-function-declaration]
     665 |         return !vma_soft_dirty_enabled(vma) || pud_soft_dirty(pud);
         |                                                ^~~~~~~~~~~~~~
         |                                                pmd_soft_dirty
   cc1: some warnings being treated as errors


vim +665 mm/gup.c

   650	
   651	#ifdef CONFIG_PGTABLE_HAS_HUGE_LEAVES
   652	/* FOLL_FORCE can write to even unwritable PUDs in COW mappings. */
   653	static inline bool can_follow_write_pud(pud_t pud, struct page *page,
   654						struct vm_area_struct *vma,
   655						unsigned int flags)
   656	{
   657		/* If the pud is writable, we can write to the page. */
   658		if (pud_write(pud))
   659			return true;
   660	
   661		if (!can_follow_write_common(page, vma, flags))
   662			return false;
   663	
   664		/* ... and a write-fault isn't required for other reasons. */
 > 665		return !vma_soft_dirty_enabled(vma) || pud_soft_dirty(pud);
   666	}
   667
kernel test robot Dec. 5, 2024, 1:23 a.m. UTC | #5
Hi Guillaume,

kernel test robot noticed the following build warnings:

[auto build test WARNING on akpm-mm/mm-everything]
[also build test WARNING on linus/master v6.13-rc1 next-20241204]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Guillaume-Morin/hugetlb-support-FOLL_FORCE-FOLL_WRITE/20241205-022843
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/Z1Ce6j5WiBE3kaGf%40bender.morinfr.org
patch subject: [PATCH v1] hugetlb: support FOLL_FORCE|FOLL_WRITE
config: x86_64-buildonly-randconfig-002-20241205 (https://download.01.org/0day-ci/archive/20241205/202412050954.m9cwNOJC-lkp@intel.com/config)
compiler: clang version 19.1.3 (https://github.com/llvm/llvm-project ab51eccf88f5321e7c60591c5546b254b6afab99)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241205/202412050954.m9cwNOJC-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202412050954.m9cwNOJC-lkp@intel.com/

All warnings (new ones prefixed by >>):

   In file included from mm/gup.c:7:
   In file included from include/linux/mm.h:2223:
   include/linux/vmstat.h:504:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     504 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     505 |                            item];
         |                            ~~~~
   include/linux/vmstat.h:511:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     511 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     512 |                            NR_VM_NUMA_EVENT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~~
   include/linux/vmstat.h:518:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
     518 |         return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
         |                               ~~~~~~~~~~~ ^ ~~~
   include/linux/vmstat.h:524:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     524 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     525 |                            NR_VM_NUMA_EVENT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~~
   In file included from mm/gup.c:20:
   include/linux/mm_inline.h:47:41: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
      47 |         __mod_lruvec_state(lruvec, NR_LRU_BASE + lru, nr_pages);
         |                                    ~~~~~~~~~~~ ^ ~~~
   include/linux/mm_inline.h:49:22: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
      49 |                                 NR_ZONE_LRU_BASE + lru, nr_pages);
         |                                 ~~~~~~~~~~~~~~~~ ^ ~~~
>> mm/gup.c:681:33: warning: variable 'page' is uninitialized when used here [-Wuninitialized]
     681 |             !can_follow_write_pud(pud, page, vma, flags))
         |                                        ^~~~
   mm/gup.c:673:19: note: initialize the variable 'page' to silence this warning
     673 |         struct page *page;
         |                          ^
         |                           = NULL
   7 warnings generated.


vim +/page +681 mm/gup.c

   667	
   668	static struct page *follow_huge_pud(struct vm_area_struct *vma,
   669					    unsigned long addr, pud_t *pudp,
   670					    int flags, struct follow_page_context *ctx)
   671	{
   672		struct mm_struct *mm = vma->vm_mm;
   673		struct page *page;
   674		pud_t pud = *pudp;
   675		unsigned long pfn = pud_pfn(pud);
   676		int ret;
   677	
   678		assert_spin_locked(pud_lockptr(mm, pudp));
   679	
   680		if ((flags & FOLL_WRITE) &&
 > 681		    !can_follow_write_pud(pud, page, vma, flags))
   682			return NULL;
   683	
   684		if (!pud_present(pud))
   685			return NULL;
   686	
   687		pfn += (addr & ~PUD_MASK) >> PAGE_SHIFT;
   688	
   689		if (IS_ENABLED(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) &&
   690		    pud_devmap(pud)) {
   691			/*
   692			 * device mapped pages can only be returned if the caller
   693			 * will manage the page reference count.
   694			 *
   695			 * At least one of FOLL_GET | FOLL_PIN must be set, so
   696			 * assert that here:
   697			 */
   698			if (!(flags & (FOLL_GET | FOLL_PIN)))
   699				return ERR_PTR(-EEXIST);
   700	
   701			if (flags & FOLL_TOUCH)
   702				touch_pud(vma, addr, pudp, flags & FOLL_WRITE);
   703	
   704			ctx->pgmap = get_dev_pagemap(pfn, ctx->pgmap);
   705			if (!ctx->pgmap)
   706				return ERR_PTR(-EFAULT);
   707		}
   708	
   709		page = pfn_to_page(pfn);
   710	
   711		if (!pud_devmap(pud) && !pud_write(pud) &&
   712		    gup_must_unshare(vma, flags, page))
   713			return ERR_PTR(-EMLINK);
   714	
   715		ret = try_grab_folio(page_folio(page), 1, flags);
   716		if (ret)
   717			page = ERR_PTR(ret);
   718		else
   719			ctx->page_mask = HPAGE_PUD_NR - 1;
   720	
   721		return page;
   722	}
   723
Guillaume Morin Dec. 5, 2024, 1:42 a.m. UTC | #6
On 05 Dec  8:39, kernel test robot wrote:
> All errors (new ones prefixed by >>):
> 
>    mm/gup.c: In function 'can_follow_write_pud':
> >> mm/gup.c:665:48: error: implicit declaration of function 'pud_soft_dirty'; did you mean 'pmd_soft_dirty'? [-Werror=implicit-function-declaration]
>      665 |         return !vma_soft_dirty_enabled(vma) || pud_soft_dirty(pud);
>          |                                                ^~~~~~~~~~~~~~
>          |                                                pmd_soft_dirty
>    cc1: some warnings being treated as errors

David, how do you recommend I deal with this?

There is no prototype for pud_soft_dirty() in include/linux/pgtable.h
if CONFIG_HAVE_ARCH_SOFT_DIRTY is not set. Should I just add one?
Guillaume Morin Dec. 5, 2024, 1:56 a.m. UTC | #7
On 05 Dec  2:42, Guillaume Morin wrote:
>
> On 05 Dec  8:39, kernel test robot wrote:
> > All errors (new ones prefixed by >>):
> > 
> >    mm/gup.c: In function 'can_follow_write_pud':
> > >> mm/gup.c:665:48: error: implicit declaration of function 'pud_soft_dirty'; did you mean 'pmd_soft_dirty'? [-Werror=implicit-function-declaration]
> >      665 |         return !vma_soft_dirty_enabled(vma) || pud_soft_dirty(pud);
> >          |                                                ^~~~~~~~~~~~~~
> >          |                                                pmd_soft_dirty
> >    cc1: some warnings being treated as errors
> 
> David, how do you recommend I deal with this?
> 
> There is no prototype for pud_soft_dirty() in include/linux/pgtable.h
> if CONFIG_HAVE_ARCH_SOFT_DIRTY is not set. Should I just add one?

I am going to respin v3 with that. Let me know if you think it makes
sense or you think it should be fixed some other way
kernel test robot Dec. 5, 2024, 2:05 a.m. UTC | #8
Hi Guillaume,

kernel test robot noticed the following build errors:

[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on linus/master v6.13-rc1 next-20241204]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Guillaume-Morin/hugetlb-support-FOLL_FORCE-FOLL_WRITE/20241205-022843
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/Z1Ce6j5WiBE3kaGf%40bender.morinfr.org
patch subject: [PATCH v1] hugetlb: support FOLL_FORCE|FOLL_WRITE
config: i386-buildonly-randconfig-002 (https://download.01.org/0day-ci/archive/20241205/202412050943.6b8BLXfY-lkp@intel.com/config)
compiler: clang version 19.1.3 (https://github.com/llvm/llvm-project ab51eccf88f5321e7c60591c5546b254b6afab99)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241205/202412050943.6b8BLXfY-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202412050943.6b8BLXfY-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from mm/gup.c:7:
   In file included from include/linux/mm.h:2223:
   include/linux/vmstat.h:518:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
     518 |         return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
         |                               ~~~~~~~~~~~ ^ ~~~
   In file included from mm/gup.c:20:
   include/linux/mm_inline.h:47:41: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
      47 |         __mod_lruvec_state(lruvec, NR_LRU_BASE + lru, nr_pages);
         |                                    ~~~~~~~~~~~ ^ ~~~
   include/linux/mm_inline.h:49:22: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
      49 |                                 NR_ZONE_LRU_BASE + lru, nr_pages);
         |                                 ~~~~~~~~~~~~~~~~ ^ ~~~
>> mm/gup.c:665:41: error: call to undeclared function 'pud_soft_dirty'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
     665 |         return !vma_soft_dirty_enabled(vma) || pud_soft_dirty(pud);
         |                                                ^
   mm/gup.c:665:41: note: did you mean 'pmd_soft_dirty'?
   include/linux/pgtable.h:1427:19: note: 'pmd_soft_dirty' declared here
    1427 | static inline int pmd_soft_dirty(pmd_t pmd)
         |                   ^
   3 warnings and 1 error generated.


vim +/pud_soft_dirty +665 mm/gup.c

   650	
   651	#ifdef CONFIG_PGTABLE_HAS_HUGE_LEAVES
   652	/* FOLL_FORCE can write to even unwritable PUDs in COW mappings. */
   653	static inline bool can_follow_write_pud(pud_t pud, struct page *page,
   654						struct vm_area_struct *vma,
   655						unsigned int flags)
   656	{
   657		/* If the pud is writable, we can write to the page. */
   658		if (pud_write(pud))
   659			return true;
   660	
   661		if (!can_follow_write_common(page, vma, flags))
   662			return false;
   663	
   664		/* ... and a write-fault isn't required for other reasons. */
 > 665		return !vma_soft_dirty_enabled(vma) || pud_soft_dirty(pud);
   666	}
   667
diff mbox series

Patch

diff --git a/mm/gup.c b/mm/gup.c
index 746070a1d8bf..c680edf33248 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -587,6 +587,33 @@  static struct folio *try_grab_folio_fast(struct page *page, int refs,
 }
 #endif	/* CONFIG_HAVE_GUP_FAST */
 
+/* Common code for can_follow_write_* */
+static inline bool can_follow_write_common(struct page *page,
+		struct vm_area_struct *vma, unsigned int flags)
+{
+	/* Maybe FOLL_FORCE is set to override it? */
+	if (!(flags & FOLL_FORCE))
+		return false;
+
+	/* But FOLL_FORCE has no effect on shared mappings */
+	if (vma->vm_flags & (VM_MAYSHARE | VM_SHARED))
+		return false;
+
+	/* ... or read-only private ones */
+	if (!(vma->vm_flags & VM_MAYWRITE))
+		return false;
+
+	/* ... or already writable ones that just need to take a write fault */
+	if (vma->vm_flags & VM_WRITE)
+		return false;
+
+	/*
+	 * See can_change_pte_writable(): we broke COW and could map the page
+	 * writable if we have an exclusive anonymous page ...
+	 */
+	return page && PageAnon(page) && PageAnonExclusive(page);
+}
+
 static struct page *no_page_table(struct vm_area_struct *vma,
 				  unsigned int flags, unsigned long address)
 {
@@ -613,6 +640,22 @@  static struct page *no_page_table(struct vm_area_struct *vma,
 }
 
 #ifdef CONFIG_PGTABLE_HAS_HUGE_LEAVES
+/* FOLL_FORCE can write to even unwritable PUDs in COW mappings. */
+static inline bool can_follow_write_pud(pud_t pud, struct page *page,
+					struct vm_area_struct *vma,
+					unsigned int flags)
+{
+	/* If the pud is writable, we can write to the page. */
+	if (pud_write(pud))
+		return true;
+
+	if (!can_follow_write_common(page, vma, flags))
+		return false;
+
+	/* ... and a write-fault isn't required for other reasons. */
+	return !vma_soft_dirty_enabled(vma) || pud_soft_dirty(pud);
+}
+
 static struct page *follow_huge_pud(struct vm_area_struct *vma,
 				    unsigned long addr, pud_t *pudp,
 				    int flags, struct follow_page_context *ctx)
@@ -625,7 +668,8 @@  static struct page *follow_huge_pud(struct vm_area_struct *vma,
 
 	assert_spin_locked(pud_lockptr(mm, pudp));
 
-	if ((flags & FOLL_WRITE) && !pud_write(pud))
+	if ((flags & FOLL_WRITE) &&
+	    !can_follow_write_pud(pud, page, vma, flags))
 		return NULL;
 
 	if (!pud_present(pud))
@@ -677,27 +721,7 @@  static inline bool can_follow_write_pmd(pmd_t pmd, struct page *page,
 	if (pmd_write(pmd))
 		return true;
 
-	/* Maybe FOLL_FORCE is set to override it? */
-	if (!(flags & FOLL_FORCE))
-		return false;
-
-	/* But FOLL_FORCE has no effect on shared mappings */
-	if (vma->vm_flags & (VM_MAYSHARE | VM_SHARED))
-		return false;
-
-	/* ... or read-only private ones */
-	if (!(vma->vm_flags & VM_MAYWRITE))
-		return false;
-
-	/* ... or already writable ones that just need to take a write fault */
-	if (vma->vm_flags & VM_WRITE)
-		return false;
-
-	/*
-	 * See can_change_pte_writable(): we broke COW and could map the page
-	 * writable if we have an exclusive anonymous page ...
-	 */
-	if (!page || !PageAnon(page) || !PageAnonExclusive(page))
+	if (!can_follow_write_common(page, vma, flags))
 		return false;
 
 	/* ... and a write-fault isn't required for other reasons. */
@@ -798,27 +822,7 @@  static inline bool can_follow_write_pte(pte_t pte, struct page *page,
 	if (pte_write(pte))
 		return true;
 
-	/* Maybe FOLL_FORCE is set to override it? */
-	if (!(flags & FOLL_FORCE))
-		return false;
-
-	/* But FOLL_FORCE has no effect on shared mappings */
-	if (vma->vm_flags & (VM_MAYSHARE | VM_SHARED))
-		return false;
-
-	/* ... or read-only private ones */
-	if (!(vma->vm_flags & VM_MAYWRITE))
-		return false;
-
-	/* ... or already writable ones that just need to take a write fault */
-	if (vma->vm_flags & VM_WRITE)
-		return false;
-
-	/*
-	 * See can_change_pte_writable(): we broke COW and could map the page
-	 * writable if we have an exclusive anonymous page ...
-	 */
-	if (!page || !PageAnon(page) || !PageAnonExclusive(page))
+	if (!can_follow_write_common(page, vma, flags))
 		return false;
 
 	/* ... and a write-fault isn't required for other reasons. */
@@ -1285,9 +1289,6 @@  static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
 		if (!(vm_flags & VM_WRITE) || (vm_flags & VM_SHADOW_STACK)) {
 			if (!(gup_flags & FOLL_FORCE))
 				return -EFAULT;
-			/* hugetlb does not support FOLL_FORCE|FOLL_WRITE. */
-			if (is_vm_hugetlb_page(vma))
-				return -EFAULT;
 			/*
 			 * We used to let the write,force case do COW in a
 			 * VM_MAYWRITE VM_SHARED !VM_WRITE vma, so ptrace could
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index ea2ed8e301ef..52517b7ce308 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5169,6 +5169,13 @@  static void set_huge_ptep_writable(struct vm_area_struct *vma,
 		update_mmu_cache(vma, address, ptep);
 }
 
+static void set_huge_ptep_maybe_writable(struct vm_area_struct *vma,
+					 unsigned long address, pte_t *ptep)
+{
+	if (vma->vm_flags & VM_WRITE)
+		set_huge_ptep_writable(vma, address, ptep);
+}
+
 bool is_hugetlb_entry_migration(pte_t pte)
 {
 	swp_entry_t swp;
@@ -5802,13 +5809,6 @@  static vm_fault_t hugetlb_wp(struct folio *pagecache_folio,
 	if (!unshare && huge_pte_uffd_wp(pte))
 		return 0;
 
-	/*
-	 * hugetlb does not support FOLL_FORCE-style write faults that keep the
-	 * PTE mapped R/O such as maybe_mkwrite() would do.
-	 */
-	if (WARN_ON_ONCE(!unshare && !(vma->vm_flags & VM_WRITE)))
-		return VM_FAULT_SIGSEGV;
-
 	/* Let's take out MAP_SHARED mappings first. */
 	if (vma->vm_flags & VM_MAYSHARE) {
 		set_huge_ptep_writable(vma, vmf->address, vmf->pte);
@@ -5837,7 +5837,8 @@  static vm_fault_t hugetlb_wp(struct folio *pagecache_folio,
 			SetPageAnonExclusive(&old_folio->page);
 		}
 		if (likely(!unshare))
-			set_huge_ptep_writable(vma, vmf->address, vmf->pte);
+			set_huge_ptep_maybe_writable(vma, vmf->address,
+						     vmf->pte);
 
 		delayacct_wpcopy_end();
 		return 0;
@@ -5943,7 +5944,8 @@  static vm_fault_t hugetlb_wp(struct folio *pagecache_folio,
 	spin_lock(vmf->ptl);
 	vmf->pte = hugetlb_walk(vma, vmf->address, huge_page_size(h));
 	if (likely(vmf->pte && pte_same(huge_ptep_get(mm, vmf->address, vmf->pte), pte))) {
-		pte_t newpte = make_huge_pte(vma, &new_folio->page, !unshare);
+		const bool writable = !unshare && (vma->vm_flags & VM_WRITE);
+		pte_t newpte = make_huge_pte(vma, &new_folio->page, writable);
 
 		/* Break COW or unshare */
 		huge_ptep_clear_flush(vma, vmf->address, vmf->pte);