Message ID | 20230818211533.2523697-1-zokeefe@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v2,1/2] mm/thp: fix "mm: thp: kill __transhuge_page_enabled()" | expand |
On Fri, Aug 18, 2023 at 2:15 PM Zach O'Keefe <zokeefe@google.com> wrote: > > The 6.0 commits: > > commit 9fec51689ff6 ("mm: thp: kill transparent_hugepage_active()") > commit 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()") > > merged "can we have THPs in this VMA?" logic that was previously done > separately by fault-path, khugepaged, and smaps "THPeligible" checks. > > During the process, the semantics of the fault path check changed in two > ways: > > 1) A VM_NO_KHUGEPAGED check was introduced (also added to smaps path). > 2) We no longer checked if non-anonymous memory had a vm_ops->huge_fault > handler that could satisfy the fault. Previously, this check had been > done in create_huge_pud() and create_huge_pmd() routines, but after > the changes, we never reach those routines. > > During the review of the above commits, it was determined that in-tree > users weren't affected by the change; most notably, since the only relevant > user (in terms of THP) of VM_MIXEDMAP or ->huge_fault is DAX, which is > explicitly approved early in approval logic. However, there is at least > one occurrence where an out-of-tree driver that used > VM_HUGEPAGE|VM_MIXEDMAP with a vm_ops->huge_fault handler, was broken. > > Remove the VM_NO_KHUGEPAGED check when not in collapse path and give > any ->huge_fault handler a chance to handle the fault. Note that we > don't validate the file mode or mapping alignment, which is consistent > with the behavior before the aforementioned commits. > > Fixes: 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()") > Reported-by: Saurabh Singh Sengar <ssengar@microsoft.com> > Signed-off-by: Zach O'Keefe <zokeefe@google.com> > Cc: Yang Shi <shy828301@gmail.com> > --- > Changed from v1[1]: > - [Saurabhi] Allow ->huge_fault handler to handle fault, if it exists > > [1] https://lore.kernel.org/linux-mm/CAAa6QmQw+F=o6htOn=6ADD6mwvMO=Ow_67f3ifBv3GpXx9Xg_g@mail.gmail.com/ Thanks, Zach. The patch looks correct to me. You can add Reviewed-by:Yang Shi <shy828301@gmail.com>. A further comment below... > > --- > mm/huge_memory.c | 17 ++++++++++------- > 1 file changed, 10 insertions(+), 7 deletions(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index eb3678360b97..cd379b2c077b 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -96,11 +96,11 @@ bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags, > return in_pf; > > /* > - * Special VMA and hugetlb VMA. > + * khugepaged special VMA and hugetlb VMA. > * Must be checked after dax since some dax mappings may have > * VM_MIXEDMAP set. > */ > - if (vm_flags & VM_NO_KHUGEPAGED) > + if (!in_pf && !smaps && (vm_flags & VM_NO_KHUGEPAGED)) I'm wondering whether we shall remove VM_MIXEDMAP from VM_NO_KHUGEPAGED or not if that kind VMAs are huge page applicable for some usecases. The downside may be some CPU time waste on the VM_MIXEDMAP area which has PFN instead of struct page, but it should be ok. Anything else did I miss? Just back from a long vacation, my brain is still not running in full speed yet. > return false; > > /* > @@ -128,12 +128,15 @@ bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags, > !hugepage_flags_always()))) > return false; > > - /* Only regular file is valid */ > - if (!in_pf && file_thp_enabled(vma)) > - return true; > - > if (!vma_is_anonymous(vma)) > - return false; > + return in_pf ? > + /* > + * Trust that ->huge_fault() handlers know > + * what they are doing in fault path. > + */ > + !!vma->vm_ops->huge_fault : > + /* Only regular file is valid in collapse path */ > + file_thp_enabled(vma); > > if (vma_is_temporary_stack(vma)) > return false; > -- > 2.42.0.rc1.204.g551eb34607-goog >
On Mon, Aug 21, 2023 at 3:53 PM Yang Shi <shy828301@gmail.com> wrote: > > On Fri, Aug 18, 2023 at 2:15 PM Zach O'Keefe <zokeefe@google.com> wrote: > > > > The 6.0 commits: > > > > commit 9fec51689ff6 ("mm: thp: kill transparent_hugepage_active()") > > commit 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()") > > > > merged "can we have THPs in this VMA?" logic that was previously done > > separately by fault-path, khugepaged, and smaps "THPeligible" checks. > > > > During the process, the semantics of the fault path check changed in two > > ways: > > > > 1) A VM_NO_KHUGEPAGED check was introduced (also added to smaps path). > > 2) We no longer checked if non-anonymous memory had a vm_ops->huge_fault > > handler that could satisfy the fault. Previously, this check had been > > done in create_huge_pud() and create_huge_pmd() routines, but after > > the changes, we never reach those routines. > > > > During the review of the above commits, it was determined that in-tree > > users weren't affected by the change; most notably, since the only relevant > > user (in terms of THP) of VM_MIXEDMAP or ->huge_fault is DAX, which is > > explicitly approved early in approval logic. However, there is at least > > one occurrence where an out-of-tree driver that used > > VM_HUGEPAGE|VM_MIXEDMAP with a vm_ops->huge_fault handler, was broken. > > > > Remove the VM_NO_KHUGEPAGED check when not in collapse path and give > > any ->huge_fault handler a chance to handle the fault. Note that we > > don't validate the file mode or mapping alignment, which is consistent > > with the behavior before the aforementioned commits. > > > > Fixes: 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()") > > Reported-by: Saurabh Singh Sengar <ssengar@microsoft.com> > > Signed-off-by: Zach O'Keefe <zokeefe@google.com> > > Cc: Yang Shi <shy828301@gmail.com> > > --- > > Changed from v1[1]: > > - [Saurabhi] Allow ->huge_fault handler to handle fault, if it exists > > > > [1] https://lore.kernel.org/linux-mm/CAAa6QmQw+F=o6htOn=6ADD6mwvMO=Ow_67f3ifBv3GpXx9Xg_g@mail.gmail.com/ > > Thanks, Zach. The patch looks correct to me. You can add > Reviewed-by:Yang Shi <shy828301@gmail.com>. Hey Yang, thanks for taking the time to review .. and .. welcome back :) Sorry to do this to you, but while responding to you on another thread I realized an issue below: > A further comment below... > > > > > --- > > mm/huge_memory.c | 17 ++++++++++------- > > 1 file changed, 10 insertions(+), 7 deletions(-) > > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > index eb3678360b97..cd379b2c077b 100644 > > --- a/mm/huge_memory.c > > +++ b/mm/huge_memory.c > > @@ -96,11 +96,11 @@ bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags, > > return in_pf; > > > > /* > > - * Special VMA and hugetlb VMA. > > + * khugepaged special VMA and hugetlb VMA. > > * Must be checked after dax since some dax mappings may have > > * VM_MIXEDMAP set. > > */ > > - if (vm_flags & VM_NO_KHUGEPAGED) > > + if (!in_pf && !smaps && (vm_flags & VM_NO_KHUGEPAGED)) > > I'm wondering whether we shall remove VM_MIXEDMAP from > VM_NO_KHUGEPAGED or not if that kind VMAs are huge page applicable for > some usecases. The downside may be some CPU time waste on the > VM_MIXEDMAP area which has PFN instead of struct page, but it should > be ok. Anything else did I miss? Just back from a long vacation, my > brain is still not running in full speed yet. I was thinking about the same thing, and had originally intended to raise that question here -- but thought it better to stick with the immediate issue. Ironically, we've gone off on both a THPeligible tangent and another about faulting file-backed memory. But ya, AFAIU, there is no technical reason why collapse can't act on VM_MIXEDMAP, as long as all the pages it finds are vm_normal() pages. I don't know enough about the possible use cases here though, and whether this is the best memory to be allocating precious hugepages to. You also raise a good point about cpu usage, since there may be a greater chance of failing late in scan due finding a PFN-only mapping. > > return false; > > > > /* > > @@ -128,12 +128,15 @@ bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags, > > !hugepage_flags_always()))) > > return false; > > > > - /* Only regular file is valid */ > > - if (!in_pf && file_thp_enabled(vma)) > > - return true; > > - > > if (!vma_is_anonymous(vma)) > > - return false; > > + return in_pf ? > > + /* > > + * Trust that ->huge_fault() handlers know > > + * what they are doing in fault path. > > + */ > > + !!vma->vm_ops->huge_fault : > > + /* Only regular file is valid in collapse path */ > > + file_thp_enabled(vma); This works for fault and collapse paths, but what about smaps? I think we should be doing both checks, and returning "true" if either is true. This also raises the question of how hugepage_vma_check() is set up, and how we've been using "in_pf" and "smaps". Today, these mean, "am I in fault path?" and "am I in smaps path?", whereas I think they ought to be, "should I check fault path, else check collapse path", and "am I in smaps path?". smaps path should then use hugepage_vma_check(in_pf) || hugepage_vma_check(!in_pf). It a depends on how pedantic we want to be about THPeligible, but I've found a few corner cases where the distinction matters. What I think I'll do is send off an embarrassing 3rd revision of this simple patch -- removing Patch 2 that was previously included in v2 -- just so we have a shot of getting the fix for Saurabh into 6.6. We can worry about any other refactorings / fixes later.. Thanks, Zach > > if (vma_is_temporary_stack(vma)) > > return false; > > -- > > 2.42.0.rc1.204.g551eb34607-goog > >
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index eb3678360b97..cd379b2c077b 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -96,11 +96,11 @@ bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags, return in_pf; /* - * Special VMA and hugetlb VMA. + * khugepaged special VMA and hugetlb VMA. * Must be checked after dax since some dax mappings may have * VM_MIXEDMAP set. */ - if (vm_flags & VM_NO_KHUGEPAGED) + if (!in_pf && !smaps && (vm_flags & VM_NO_KHUGEPAGED)) return false; /* @@ -128,12 +128,15 @@ bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags, !hugepage_flags_always()))) return false; - /* Only regular file is valid */ - if (!in_pf && file_thp_enabled(vma)) - return true; - if (!vma_is_anonymous(vma)) - return false; + return in_pf ? + /* + * Trust that ->huge_fault() handlers know + * what they are doing in fault path. + */ + !!vma->vm_ops->huge_fault : + /* Only regular file is valid in collapse path */ + file_thp_enabled(vma); if (vma_is_temporary_stack(vma)) return false;
The 6.0 commits: commit 9fec51689ff6 ("mm: thp: kill transparent_hugepage_active()") commit 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()") merged "can we have THPs in this VMA?" logic that was previously done separately by fault-path, khugepaged, and smaps "THPeligible" checks. During the process, the semantics of the fault path check changed in two ways: 1) A VM_NO_KHUGEPAGED check was introduced (also added to smaps path). 2) We no longer checked if non-anonymous memory had a vm_ops->huge_fault handler that could satisfy the fault. Previously, this check had been done in create_huge_pud() and create_huge_pmd() routines, but after the changes, we never reach those routines. During the review of the above commits, it was determined that in-tree users weren't affected by the change; most notably, since the only relevant user (in terms of THP) of VM_MIXEDMAP or ->huge_fault is DAX, which is explicitly approved early in approval logic. However, there is at least one occurrence where an out-of-tree driver that used VM_HUGEPAGE|VM_MIXEDMAP with a vm_ops->huge_fault handler, was broken. Remove the VM_NO_KHUGEPAGED check when not in collapse path and give any ->huge_fault handler a chance to handle the fault. Note that we don't validate the file mode or mapping alignment, which is consistent with the behavior before the aforementioned commits. Fixes: 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()") Reported-by: Saurabh Singh Sengar <ssengar@microsoft.com> Signed-off-by: Zach O'Keefe <zokeefe@google.com> Cc: Yang Shi <shy828301@gmail.com> --- Changed from v1[1]: - [Saurabhi] Allow ->huge_fault handler to handle fault, if it exists [1] https://lore.kernel.org/linux-mm/CAAa6QmQw+F=o6htOn=6ADD6mwvMO=Ow_67f3ifBv3GpXx9Xg_g@mail.gmail.com/ --- mm/huge_memory.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-)