Message ID | 20220823135841.934465-2-haiyue.wang@intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | fix follow_page related issues | expand |
On Tue, 23 Aug 2022 21:58:40 +0800 Haiyue Wang <haiyue.wang@intel.com> wrote: > Not all huge page APIs support FOLL_GET option, so move_pages() syscall > will fail to get the page node information for some huge pages. > > Like x86 on linux 5.19 with 1GB huge page API follow_huge_pud(), it will > return NULL page for FOLL_GET when calling move_pages() syscall with the > NULL 'nodes' parameter, the 'status' parameter has '-2' error in array. > > Note: follow_huge_pud() now supports FOLL_GET in linux 6.0. > Link: https://lore.kernel.org/all/20220714042420.1847125-3-naoya.horiguchi@linux.dev > > But these huge page APIs don't support FOLL_GET: > 1. follow_huge_pud() in arch/s390/mm/hugetlbpage.c > 2. follow_huge_addr() in arch/ia64/mm/hugetlbpage.c > It will cause WARN_ON_ONCE for FOLL_GET. > 3. follow_huge_pgd() in mm/hugetlb.c What happened to the proposal to fix these three sites so this patch is not needed? > This is an temporary solution to mitigate the side effect of the race > condition fix by calling follow_page() with FOLL_GET set for huge pages. > > After supporting follow huge page by FOLL_GET is done, this fix can be > reverted safely.
On Wed, 24 Aug 2022 11:38:58 -0700 Andrew Morton <akpm@linux-foundation.org> wrote: > On Tue, 23 Aug 2022 21:58:40 +0800 Haiyue Wang <haiyue.wang@intel.com> wrote: > > > Not all huge page APIs support FOLL_GET option, so move_pages() syscall > > will fail to get the page node information for some huge pages. > > > > Like x86 on linux 5.19 with 1GB huge page API follow_huge_pud(), it will > > return NULL page for FOLL_GET when calling move_pages() syscall with the > > NULL 'nodes' parameter, the 'status' parameter has '-2' error in array. > > > > Note: follow_huge_pud() now supports FOLL_GET in linux 6.0. > > Link: https://lore.kernel.org/all/20220714042420.1847125-3-naoya.horiguchi@linux.dev > > > > But these huge page APIs don't support FOLL_GET: > > 1. follow_huge_pud() in arch/s390/mm/hugetlbpage.c > > 2. follow_huge_addr() in arch/ia64/mm/hugetlbpage.c > > It will cause WARN_ON_ONCE for FOLL_GET. > > 3. follow_huge_pgd() in mm/hugetlb.c > > What happened to the proposal to fix these three sites so this patch is > not needed? For s390, you can add my patch from https://lore.kernel.org/linux-mm/20220818135717.609eef8a@thinkpad/ to this series. Or we can bring it upstream via s390 tree, whatever suits best. It certainly makes sense to have, also independent from this series. Adding some s390 people on cc.
On Thu, 25 Aug 2022 14:39:17 +0200 Gerald Schaefer <gerald.schaefer@linux.ibm.com> wrote: > On Wed, 24 Aug 2022 11:38:58 -0700 > Andrew Morton <akpm@linux-foundation.org> wrote: > > > On Tue, 23 Aug 2022 21:58:40 +0800 Haiyue Wang <haiyue.wang@intel.com> wrote: > > > > > Not all huge page APIs support FOLL_GET option, so move_pages() syscall > > > will fail to get the page node information for some huge pages. > > > > > > Like x86 on linux 5.19 with 1GB huge page API follow_huge_pud(), it will > > > return NULL page for FOLL_GET when calling move_pages() syscall with the > > > NULL 'nodes' parameter, the 'status' parameter has '-2' error in array. > > > > > > Note: follow_huge_pud() now supports FOLL_GET in linux 6.0. > > > Link: https://lore.kernel.org/all/20220714042420.1847125-3-naoya.horiguchi@linux.dev > > > > > > But these huge page APIs don't support FOLL_GET: > > > 1. follow_huge_pud() in arch/s390/mm/hugetlbpage.c > > > 2. follow_huge_addr() in arch/ia64/mm/hugetlbpage.c > > > It will cause WARN_ON_ONCE for FOLL_GET. > > > 3. follow_huge_pgd() in mm/hugetlb.c > > > > What happened to the proposal to fix these three sites so this patch is > > not needed? > > For s390, you can add my patch from > https://lore.kernel.org/linux-mm/20220818135717.609eef8a@thinkpad/ > to this series. > Thanks, I added that.
On 08/24/22 11:38, Andrew Morton wrote: > On Tue, 23 Aug 2022 21:58:40 +0800 Haiyue Wang <haiyue.wang@intel.com> wrote: > > > Not all huge page APIs support FOLL_GET option, so move_pages() syscall > > will fail to get the page node information for some huge pages. > > > > Like x86 on linux 5.19 with 1GB huge page API follow_huge_pud(), it will > > return NULL page for FOLL_GET when calling move_pages() syscall with the > > NULL 'nodes' parameter, the 'status' parameter has '-2' error in array. > > > > Note: follow_huge_pud() now supports FOLL_GET in linux 6.0. > > Link: https://lore.kernel.org/all/20220714042420.1847125-3-naoya.horiguchi@linux.dev > > > > But these huge page APIs don't support FOLL_GET: > > 1. follow_huge_pud() in arch/s390/mm/hugetlbpage.c > > 2. follow_huge_addr() in arch/ia64/mm/hugetlbpage.c > > It will cause WARN_ON_ONCE for FOLL_GET. > > 3. follow_huge_pgd() in mm/hugetlb.c > > What happened to the proposal to fix these three sites so this patch is > not needed? > Note mpe's comments here: https://lore.kernel.org/linux-mm/87r113jgqn.fsf@mpe.ellerman.id.au/ It has been a while since powerpc supported hugetlb pages at the PGD level. And, this was the only architecture which had pages at this level. Unless I am missing something, this means there is no issue with follow_huge_pgd as there is no way this code could be invoked when commit 4cd614841c06 was made.
diff --git a/mm/migrate.c b/mm/migrate.c index 6a1597c92261..581dfaad9257 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1848,6 +1848,7 @@ static void do_pages_stat_array(struct mm_struct *mm, unsigned long nr_pages, for (i = 0; i < nr_pages; i++) { unsigned long addr = (unsigned long)(*pages); + unsigned int foll_flags = FOLL_DUMP; struct vm_area_struct *vma; struct page *page; int err = -EFAULT; @@ -1856,8 +1857,12 @@ static void do_pages_stat_array(struct mm_struct *mm, unsigned long nr_pages, if (!vma) goto set_status; + /* Not all huge page follow APIs support 'FOLL_GET' */ + if (!is_vm_hugetlb_page(vma)) + foll_flags |= FOLL_GET; + /* FOLL_DUMP to ignore special (like zero) pages */ - page = follow_page(vma, addr, FOLL_GET | FOLL_DUMP); + page = follow_page(vma, addr, foll_flags); err = PTR_ERR(page); if (IS_ERR(page)) @@ -1865,7 +1870,8 @@ static void do_pages_stat_array(struct mm_struct *mm, unsigned long nr_pages, if (page && !is_zone_device_page(page)) { err = page_to_nid(page); - put_page(page); + if (foll_flags & FOLL_GET) + put_page(page); } else { err = -ENOENT; }