Message ID | Z1Ce6j5WiBE3kaGf@bender.morinfr.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v1] hugetlb: support FOLL_FORCE|FOLL_WRITE | expand |
On 04.12.24 19:26, Guillaume Morin wrote: Patch prefix should likely be "mm/hugetlb: ..." > FOLL_FORCE|FOLL_WRITE has never been properly supported for hugetlb > mappings. Since 1d8d14641fd94, we explicitly reject it. However "Since commit 1d8d14641fd9 ("mm/hugetlb: support write-faults in shared mappings") ..." > running software on hugetlb mappings is a useful optimization. > Multiple tools allow to use that such as Intel iodlr or > libhugetlbfs. It would be better to link to the actual request where people ran into that when using PTRACE_POKETEXT That hugetlb is getting used is rather obvious :) > > Cc: Muchun Song <muchun.song@linux.dev> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Peter Xu <peterx@redhat.com> > Cc: David Hildenbrand <david@redhat.com> > Cc: Eric Hagberg <ehagberg@janestreet.com> > Signed-off-by: Guillaume Morin <guillaume@morinfr.org> > --- [...] > delayacct_wpcopy_end(); > return 0; > @@ -5943,7 +5944,8 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, > spin_lock(vmf->ptl); > vmf->pte = hugetlb_walk(vma, vmf->address, huge_page_size(h)); > if (likely(vmf->pte && pte_same(huge_ptep_get(mm, vmf->address, vmf->pte), pte))) { > - pte_t newpte = make_huge_pte(vma, &new_folio->page, !unshare); > + const bool writable = !unshare && (vma->vm_flags & VM_WRITE); > + pte_t newpte = make_huge_pte(vma, &new_folio->page, writable); > > /* Break COW or unshare */ > huge_ptep_clear_flush(vma, vmf->address, vmf->pte); After rebasing to [1] this hunk here can likely be dropped. make_huge_pte() will perform the VM_WRITE check. [1] https://lkml.kernel.org/r/20241204153100.1967364-1-david@redhat.com
On 04 Dec 20:01, David Hildenbrand wrote: > > On 04.12.24 19:26, Guillaume Morin wrote: > > Patch prefix should likely be "mm/hugetlb: ..." > > > FOLL_FORCE|FOLL_WRITE has never been properly supported for hugetlb > > mappings. Since 1d8d14641fd94, we explicitly reject it. However > > "Since commit 1d8d14641fd9 ("mm/hugetlb: support write-faults in shared > mappings") ..." Will fix in v2. > > > running software on hugetlb mappings is a useful optimization. > > Multiple tools allow to use that such as Intel iodlr or > > libhugetlbfs. > > It would be better to link to the actual request where people ran into that > when using PTRACE_POKETEXT > > That hugetlb is getting used is rather obvious :) Well, allow me to point out that I said running software on a hugetlb mapping, not generally using hugetlb. That said, which link are you referring to? The only discussion I am aware of is off mailing lists. Guillaume.
On 04.12.24 20:13, Guillaume Morin wrote: > On 04 Dec 20:01, David Hildenbrand wrote: >> >> On 04.12.24 19:26, Guillaume Morin wrote: >> >> Patch prefix should likely be "mm/hugetlb: ..." >> >>> FOLL_FORCE|FOLL_WRITE has never been properly supported for hugetlb >>> mappings. Since 1d8d14641fd94, we explicitly reject it. However >> >> "Since commit 1d8d14641fd9 ("mm/hugetlb: support write-faults in shared >> mappings") ..." > > Will fix in v2. > >> >>> running software on hugetlb mappings is a useful optimization. >>> Multiple tools allow to use that such as Intel iodlr or >>> libhugetlbfs. >> >> It would be better to link to the actual request where people ran into that >> when using PTRACE_POKETEXT >> >> That hugetlb is getting used is rather obvious :) > > Well, allow me to point out that I said running software on a hugetlb > mapping, not generally using hugetlb. Well, yes, but that's not really big news. People have been doing that (and rewriting their apps using libhugetlbfs to place text on huge pages pre file THP) for decades. :) See below. > > That said, which link are you referring to? The only discussion I am > aware of is off mailing lists. Oh, indeed, I could have sworn it was public. I'd write something like "Eric reported that PTRACE_POKETEXT fails when applications use hugetlb for mapping text using huge pages. Before commit 1d8d14641fd9, PTRACE_POKETEXT worked by accident, but it was buggy and silently ended up mapping pages writable into the page tables even though VM_WRITE was not set. In general, FOLL_FORCE|FOLL_WRITE does currently not work with hugetlb. Let's implement FOLL_FORCE|FOLL_WRITE properly for hugetlb, such that what used to work in the past by accident now properly works, allowing applications using hugetlb for text etc. to get properly debugged. This change might also be required to implement uprobes support for hugetlb [1]. [1] link to our discussion "
Hi Guillaume, kernel test robot noticed the following build errors: [auto build test ERROR on akpm-mm/mm-everything] [also build test ERROR on linus/master v6.13-rc1 next-20241204] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Guillaume-Morin/hugetlb-support-FOLL_FORCE-FOLL_WRITE/20241205-022843 base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything patch link: https://lore.kernel.org/r/Z1Ce6j5WiBE3kaGf%40bender.morinfr.org patch subject: [PATCH v1] hugetlb: support FOLL_FORCE|FOLL_WRITE config: i386-buildonly-randconfig-004 (https://download.01.org/0day-ci/archive/20241205/202412050840.umPPa7cK-lkp@intel.com/config) compiler: gcc-12 (Debian 12.2.0-14) 12.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241205/202412050840.umPPa7cK-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202412050840.umPPa7cK-lkp@intel.com/ All errors (new ones prefixed by >>): mm/gup.c: In function 'can_follow_write_pud': >> mm/gup.c:665:48: error: implicit declaration of function 'pud_soft_dirty'; did you mean 'pmd_soft_dirty'? [-Werror=implicit-function-declaration] 665 | return !vma_soft_dirty_enabled(vma) || pud_soft_dirty(pud); | ^~~~~~~~~~~~~~ | pmd_soft_dirty cc1: some warnings being treated as errors vim +665 mm/gup.c 650 651 #ifdef CONFIG_PGTABLE_HAS_HUGE_LEAVES 652 /* FOLL_FORCE can write to even unwritable PUDs in COW mappings. */ 653 static inline bool can_follow_write_pud(pud_t pud, struct page *page, 654 struct vm_area_struct *vma, 655 unsigned int flags) 656 { 657 /* If the pud is writable, we can write to the page. */ 658 if (pud_write(pud)) 659 return true; 660 661 if (!can_follow_write_common(page, vma, flags)) 662 return false; 663 664 /* ... and a write-fault isn't required for other reasons. */ > 665 return !vma_soft_dirty_enabled(vma) || pud_soft_dirty(pud); 666 } 667
Hi Guillaume, kernel test robot noticed the following build warnings: [auto build test WARNING on akpm-mm/mm-everything] [also build test WARNING on linus/master v6.13-rc1 next-20241204] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Guillaume-Morin/hugetlb-support-FOLL_FORCE-FOLL_WRITE/20241205-022843 base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything patch link: https://lore.kernel.org/r/Z1Ce6j5WiBE3kaGf%40bender.morinfr.org patch subject: [PATCH v1] hugetlb: support FOLL_FORCE|FOLL_WRITE config: x86_64-buildonly-randconfig-002-20241205 (https://download.01.org/0day-ci/archive/20241205/202412050954.m9cwNOJC-lkp@intel.com/config) compiler: clang version 19.1.3 (https://github.com/llvm/llvm-project ab51eccf88f5321e7c60591c5546b254b6afab99) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241205/202412050954.m9cwNOJC-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202412050954.m9cwNOJC-lkp@intel.com/ All warnings (new ones prefixed by >>): In file included from mm/gup.c:7: In file included from include/linux/mm.h:2223: include/linux/vmstat.h:504:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion] 504 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS + | ~~~~~~~~~~~~~~~~~~~~~ ^ 505 | item]; | ~~~~ include/linux/vmstat.h:511:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion] 511 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS + | ~~~~~~~~~~~~~~~~~~~~~ ^ 512 | NR_VM_NUMA_EVENT_ITEMS + | ~~~~~~~~~~~~~~~~~~~~~~ include/linux/vmstat.h:518:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion] 518 | return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_" | ~~~~~~~~~~~ ^ ~~~ include/linux/vmstat.h:524:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion] 524 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS + | ~~~~~~~~~~~~~~~~~~~~~ ^ 525 | NR_VM_NUMA_EVENT_ITEMS + | ~~~~~~~~~~~~~~~~~~~~~~ In file included from mm/gup.c:20: include/linux/mm_inline.h:47:41: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion] 47 | __mod_lruvec_state(lruvec, NR_LRU_BASE + lru, nr_pages); | ~~~~~~~~~~~ ^ ~~~ include/linux/mm_inline.h:49:22: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum lru_list') [-Wenum-enum-conversion] 49 | NR_ZONE_LRU_BASE + lru, nr_pages); | ~~~~~~~~~~~~~~~~ ^ ~~~ >> mm/gup.c:681:33: warning: variable 'page' is uninitialized when used here [-Wuninitialized] 681 | !can_follow_write_pud(pud, page, vma, flags)) | ^~~~ mm/gup.c:673:19: note: initialize the variable 'page' to silence this warning 673 | struct page *page; | ^ | = NULL 7 warnings generated. vim +/page +681 mm/gup.c 667 668 static struct page *follow_huge_pud(struct vm_area_struct *vma, 669 unsigned long addr, pud_t *pudp, 670 int flags, struct follow_page_context *ctx) 671 { 672 struct mm_struct *mm = vma->vm_mm; 673 struct page *page; 674 pud_t pud = *pudp; 675 unsigned long pfn = pud_pfn(pud); 676 int ret; 677 678 assert_spin_locked(pud_lockptr(mm, pudp)); 679 680 if ((flags & FOLL_WRITE) && > 681 !can_follow_write_pud(pud, page, vma, flags)) 682 return NULL; 683 684 if (!pud_present(pud)) 685 return NULL; 686 687 pfn += (addr & ~PUD_MASK) >> PAGE_SHIFT; 688 689 if (IS_ENABLED(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) && 690 pud_devmap(pud)) { 691 /* 692 * device mapped pages can only be returned if the caller 693 * will manage the page reference count. 694 * 695 * At least one of FOLL_GET | FOLL_PIN must be set, so 696 * assert that here: 697 */ 698 if (!(flags & (FOLL_GET | FOLL_PIN))) 699 return ERR_PTR(-EEXIST); 700 701 if (flags & FOLL_TOUCH) 702 touch_pud(vma, addr, pudp, flags & FOLL_WRITE); 703 704 ctx->pgmap = get_dev_pagemap(pfn, ctx->pgmap); 705 if (!ctx->pgmap) 706 return ERR_PTR(-EFAULT); 707 } 708 709 page = pfn_to_page(pfn); 710 711 if (!pud_devmap(pud) && !pud_write(pud) && 712 gup_must_unshare(vma, flags, page)) 713 return ERR_PTR(-EMLINK); 714 715 ret = try_grab_folio(page_folio(page), 1, flags); 716 if (ret) 717 page = ERR_PTR(ret); 718 else 719 ctx->page_mask = HPAGE_PUD_NR - 1; 720 721 return page; 722 } 723
On 05 Dec 8:39, kernel test robot wrote: > All errors (new ones prefixed by >>): > > mm/gup.c: In function 'can_follow_write_pud': > >> mm/gup.c:665:48: error: implicit declaration of function 'pud_soft_dirty'; did you mean 'pmd_soft_dirty'? [-Werror=implicit-function-declaration] > 665 | return !vma_soft_dirty_enabled(vma) || pud_soft_dirty(pud); > | ^~~~~~~~~~~~~~ > | pmd_soft_dirty > cc1: some warnings being treated as errors David, how do you recommend I deal with this? There is no prototype for pud_soft_dirty() in include/linux/pgtable.h if CONFIG_HAVE_ARCH_SOFT_DIRTY is not set. Should I just add one?
On 05 Dec 2:42, Guillaume Morin wrote: > > On 05 Dec 8:39, kernel test robot wrote: > > All errors (new ones prefixed by >>): > > > > mm/gup.c: In function 'can_follow_write_pud': > > >> mm/gup.c:665:48: error: implicit declaration of function 'pud_soft_dirty'; did you mean 'pmd_soft_dirty'? [-Werror=implicit-function-declaration] > > 665 | return !vma_soft_dirty_enabled(vma) || pud_soft_dirty(pud); > > | ^~~~~~~~~~~~~~ > > | pmd_soft_dirty > > cc1: some warnings being treated as errors > > David, how do you recommend I deal with this? > > There is no prototype for pud_soft_dirty() in include/linux/pgtable.h > if CONFIG_HAVE_ARCH_SOFT_DIRTY is not set. Should I just add one? I am going to respin v3 with that. Let me know if you think it makes sense or you think it should be fixed some other way
Hi Guillaume, kernel test robot noticed the following build errors: [auto build test ERROR on akpm-mm/mm-everything] [also build test ERROR on linus/master v6.13-rc1 next-20241204] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Guillaume-Morin/hugetlb-support-FOLL_FORCE-FOLL_WRITE/20241205-022843 base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything patch link: https://lore.kernel.org/r/Z1Ce6j5WiBE3kaGf%40bender.morinfr.org patch subject: [PATCH v1] hugetlb: support FOLL_FORCE|FOLL_WRITE config: i386-buildonly-randconfig-002 (https://download.01.org/0day-ci/archive/20241205/202412050943.6b8BLXfY-lkp@intel.com/config) compiler: clang version 19.1.3 (https://github.com/llvm/llvm-project ab51eccf88f5321e7c60591c5546b254b6afab99) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241205/202412050943.6b8BLXfY-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202412050943.6b8BLXfY-lkp@intel.com/ All errors (new ones prefixed by >>): In file included from mm/gup.c:7: In file included from include/linux/mm.h:2223: include/linux/vmstat.h:518:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion] 518 | return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_" | ~~~~~~~~~~~ ^ ~~~ In file included from mm/gup.c:20: include/linux/mm_inline.h:47:41: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion] 47 | __mod_lruvec_state(lruvec, NR_LRU_BASE + lru, nr_pages); | ~~~~~~~~~~~ ^ ~~~ include/linux/mm_inline.h:49:22: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum lru_list') [-Wenum-enum-conversion] 49 | NR_ZONE_LRU_BASE + lru, nr_pages); | ~~~~~~~~~~~~~~~~ ^ ~~~ >> mm/gup.c:665:41: error: call to undeclared function 'pud_soft_dirty'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] 665 | return !vma_soft_dirty_enabled(vma) || pud_soft_dirty(pud); | ^ mm/gup.c:665:41: note: did you mean 'pmd_soft_dirty'? include/linux/pgtable.h:1427:19: note: 'pmd_soft_dirty' declared here 1427 | static inline int pmd_soft_dirty(pmd_t pmd) | ^ 3 warnings and 1 error generated. vim +/pud_soft_dirty +665 mm/gup.c 650 651 #ifdef CONFIG_PGTABLE_HAS_HUGE_LEAVES 652 /* FOLL_FORCE can write to even unwritable PUDs in COW mappings. */ 653 static inline bool can_follow_write_pud(pud_t pud, struct page *page, 654 struct vm_area_struct *vma, 655 unsigned int flags) 656 { 657 /* If the pud is writable, we can write to the page. */ 658 if (pud_write(pud)) 659 return true; 660 661 if (!can_follow_write_common(page, vma, flags)) 662 return false; 663 664 /* ... and a write-fault isn't required for other reasons. */ > 665 return !vma_soft_dirty_enabled(vma) || pud_soft_dirty(pud); 666 } 667
diff --git a/mm/gup.c b/mm/gup.c index 746070a1d8bf..c680edf33248 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -587,6 +587,33 @@ static struct folio *try_grab_folio_fast(struct page *page, int refs, } #endif /* CONFIG_HAVE_GUP_FAST */ +/* Common code for can_follow_write_* */ +static inline bool can_follow_write_common(struct page *page, + struct vm_area_struct *vma, unsigned int flags) +{ + /* Maybe FOLL_FORCE is set to override it? */ + if (!(flags & FOLL_FORCE)) + return false; + + /* But FOLL_FORCE has no effect on shared mappings */ + if (vma->vm_flags & (VM_MAYSHARE | VM_SHARED)) + return false; + + /* ... or read-only private ones */ + if (!(vma->vm_flags & VM_MAYWRITE)) + return false; + + /* ... or already writable ones that just need to take a write fault */ + if (vma->vm_flags & VM_WRITE) + return false; + + /* + * See can_change_pte_writable(): we broke COW and could map the page + * writable if we have an exclusive anonymous page ... + */ + return page && PageAnon(page) && PageAnonExclusive(page); +} + static struct page *no_page_table(struct vm_area_struct *vma, unsigned int flags, unsigned long address) { @@ -613,6 +640,22 @@ static struct page *no_page_table(struct vm_area_struct *vma, } #ifdef CONFIG_PGTABLE_HAS_HUGE_LEAVES +/* FOLL_FORCE can write to even unwritable PUDs in COW mappings. */ +static inline bool can_follow_write_pud(pud_t pud, struct page *page, + struct vm_area_struct *vma, + unsigned int flags) +{ + /* If the pud is writable, we can write to the page. */ + if (pud_write(pud)) + return true; + + if (!can_follow_write_common(page, vma, flags)) + return false; + + /* ... and a write-fault isn't required for other reasons. */ + return !vma_soft_dirty_enabled(vma) || pud_soft_dirty(pud); +} + static struct page *follow_huge_pud(struct vm_area_struct *vma, unsigned long addr, pud_t *pudp, int flags, struct follow_page_context *ctx) @@ -625,7 +668,8 @@ static struct page *follow_huge_pud(struct vm_area_struct *vma, assert_spin_locked(pud_lockptr(mm, pudp)); - if ((flags & FOLL_WRITE) && !pud_write(pud)) + if ((flags & FOLL_WRITE) && + !can_follow_write_pud(pud, page, vma, flags)) return NULL; if (!pud_present(pud)) @@ -677,27 +721,7 @@ static inline bool can_follow_write_pmd(pmd_t pmd, struct page *page, if (pmd_write(pmd)) return true; - /* Maybe FOLL_FORCE is set to override it? */ - if (!(flags & FOLL_FORCE)) - return false; - - /* But FOLL_FORCE has no effect on shared mappings */ - if (vma->vm_flags & (VM_MAYSHARE | VM_SHARED)) - return false; - - /* ... or read-only private ones */ - if (!(vma->vm_flags & VM_MAYWRITE)) - return false; - - /* ... or already writable ones that just need to take a write fault */ - if (vma->vm_flags & VM_WRITE) - return false; - - /* - * See can_change_pte_writable(): we broke COW and could map the page - * writable if we have an exclusive anonymous page ... - */ - if (!page || !PageAnon(page) || !PageAnonExclusive(page)) + if (!can_follow_write_common(page, vma, flags)) return false; /* ... and a write-fault isn't required for other reasons. */ @@ -798,27 +822,7 @@ static inline bool can_follow_write_pte(pte_t pte, struct page *page, if (pte_write(pte)) return true; - /* Maybe FOLL_FORCE is set to override it? */ - if (!(flags & FOLL_FORCE)) - return false; - - /* But FOLL_FORCE has no effect on shared mappings */ - if (vma->vm_flags & (VM_MAYSHARE | VM_SHARED)) - return false; - - /* ... or read-only private ones */ - if (!(vma->vm_flags & VM_MAYWRITE)) - return false; - - /* ... or already writable ones that just need to take a write fault */ - if (vma->vm_flags & VM_WRITE) - return false; - - /* - * See can_change_pte_writable(): we broke COW and could map the page - * writable if we have an exclusive anonymous page ... - */ - if (!page || !PageAnon(page) || !PageAnonExclusive(page)) + if (!can_follow_write_common(page, vma, flags)) return false; /* ... and a write-fault isn't required for other reasons. */ @@ -1285,9 +1289,6 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) if (!(vm_flags & VM_WRITE) || (vm_flags & VM_SHADOW_STACK)) { if (!(gup_flags & FOLL_FORCE)) return -EFAULT; - /* hugetlb does not support FOLL_FORCE|FOLL_WRITE. */ - if (is_vm_hugetlb_page(vma)) - return -EFAULT; /* * We used to let the write,force case do COW in a * VM_MAYWRITE VM_SHARED !VM_WRITE vma, so ptrace could diff --git a/mm/hugetlb.c b/mm/hugetlb.c index ea2ed8e301ef..52517b7ce308 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5169,6 +5169,13 @@ static void set_huge_ptep_writable(struct vm_area_struct *vma, update_mmu_cache(vma, address, ptep); } +static void set_huge_ptep_maybe_writable(struct vm_area_struct *vma, + unsigned long address, pte_t *ptep) +{ + if (vma->vm_flags & VM_WRITE) + set_huge_ptep_writable(vma, address, ptep); +} + bool is_hugetlb_entry_migration(pte_t pte) { swp_entry_t swp; @@ -5802,13 +5809,6 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, if (!unshare && huge_pte_uffd_wp(pte)) return 0; - /* - * hugetlb does not support FOLL_FORCE-style write faults that keep the - * PTE mapped R/O such as maybe_mkwrite() would do. - */ - if (WARN_ON_ONCE(!unshare && !(vma->vm_flags & VM_WRITE))) - return VM_FAULT_SIGSEGV; - /* Let's take out MAP_SHARED mappings first. */ if (vma->vm_flags & VM_MAYSHARE) { set_huge_ptep_writable(vma, vmf->address, vmf->pte); @@ -5837,7 +5837,8 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, SetPageAnonExclusive(&old_folio->page); } if (likely(!unshare)) - set_huge_ptep_writable(vma, vmf->address, vmf->pte); + set_huge_ptep_maybe_writable(vma, vmf->address, + vmf->pte); delayacct_wpcopy_end(); return 0; @@ -5943,7 +5944,8 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, spin_lock(vmf->ptl); vmf->pte = hugetlb_walk(vma, vmf->address, huge_page_size(h)); if (likely(vmf->pte && pte_same(huge_ptep_get(mm, vmf->address, vmf->pte), pte))) { - pte_t newpte = make_huge_pte(vma, &new_folio->page, !unshare); + const bool writable = !unshare && (vma->vm_flags & VM_WRITE); + pte_t newpte = make_huge_pte(vma, &new_folio->page, writable); /* Break COW or unshare */ huge_ptep_clear_flush(vma, vmf->address, vmf->pte);
FOLL_FORCE|FOLL_WRITE has never been properly supported for hugetlb mappings. Since 1d8d14641fd94, we explicitly reject it. However running software on hugetlb mappings is a useful optimization. Multiple tools allow to use that such as Intel iodlr or libhugetlbfs. Cc: Muchun Song <muchun.song@linux.dev> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Xu <peterx@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Eric Hagberg <ehagberg@janestreet.com> Signed-off-by: Guillaume Morin <guillaume@morinfr.org> --- mm/gup.c | 93 ++++++++++++++++++++++++++-------------------------- mm/hugetlb.c | 20 ++++++----- 2 files changed, 58 insertions(+), 55 deletions(-)