Message ID | 20190613175747.1964753-6-songliubraving@fb.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | THP aware uprobe | expand |
On Thu, Jun 13, 2019 at 10:57:47AM -0700, Song Liu wrote: > After all uprobes are removed from the huge page (with PTE pgtable), it > is possible to collapse the pmd and benefit from THP again. This patch > does the collapse. > > An issue on earlier version was discovered by kbuild test robot. > > Reported-by: kbuild test robot <lkp@intel.com> > Signed-off-by: Song Liu <songliubraving@fb.com> > --- > include/linux/huge_mm.h | 7 +++++ > kernel/events/uprobes.c | 5 ++- > mm/huge_memory.c | 69 +++++++++++++++++++++++++++++++++++++++++ I still sync it's duplication of khugepaged functinallity. We need to fix khugepaged to handle SCAN_PAGE_COMPOUND and probably refactor the code to be able to call for collapse of particular range if we have all locks taken (as we do in uprobe case).
> On Jun 21, 2019, at 5:48 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote: > > On Thu, Jun 13, 2019 at 10:57:47AM -0700, Song Liu wrote: >> After all uprobes are removed from the huge page (with PTE pgtable), it >> is possible to collapse the pmd and benefit from THP again. This patch >> does the collapse. >> >> An issue on earlier version was discovered by kbuild test robot. >> >> Reported-by: kbuild test robot <lkp@intel.com> >> Signed-off-by: Song Liu <songliubraving@fb.com> >> --- >> include/linux/huge_mm.h | 7 +++++ >> kernel/events/uprobes.c | 5 ++- >> mm/huge_memory.c | 69 +++++++++++++++++++++++++++++++++++++++++ > > I still sync it's duplication of khugepaged functinallity. We need to fix > khugepaged to handle SCAN_PAGE_COMPOUND and probably refactor the code to > be able to call for collapse of particular range if we have all locks > taken (as we do in uprobe case). > I see the point now. I misunderstood it for a while. If we add this to khugepaged, it will have some conflicts with my other patchset. How about we move the functionality to khugepaged after these two sets get in? Thanks, Song
On Fri, Jun 21, 2019 at 01:17:05PM +0000, Song Liu wrote: > > > > On Jun 21, 2019, at 5:48 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote: > > > > On Thu, Jun 13, 2019 at 10:57:47AM -0700, Song Liu wrote: > >> After all uprobes are removed from the huge page (with PTE pgtable), it > >> is possible to collapse the pmd and benefit from THP again. This patch > >> does the collapse. > >> > >> An issue on earlier version was discovered by kbuild test robot. > >> > >> Reported-by: kbuild test robot <lkp@intel.com> > >> Signed-off-by: Song Liu <songliubraving@fb.com> > >> --- > >> include/linux/huge_mm.h | 7 +++++ > >> kernel/events/uprobes.c | 5 ++- > >> mm/huge_memory.c | 69 +++++++++++++++++++++++++++++++++++++++++ > > > > I still sync it's duplication of khugepaged functinallity. We need to fix > > khugepaged to handle SCAN_PAGE_COMPOUND and probably refactor the code to > > be able to call for collapse of particular range if we have all locks > > taken (as we do in uprobe case). > > > > I see the point now. I misunderstood it for a while. > > If we add this to khugepaged, it will have some conflicts with my other > patchset. How about we move the functionality to khugepaged after these > two sets get in? Is the last patch of the patchset essential? I think this part can be done a bit later in a proper way, no?
> On Jun 21, 2019, at 6:36 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote: > > On Fri, Jun 21, 2019 at 01:17:05PM +0000, Song Liu wrote: >> >> >>> On Jun 21, 2019, at 5:48 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote: >>> >>> On Thu, Jun 13, 2019 at 10:57:47AM -0700, Song Liu wrote: >>>> After all uprobes are removed from the huge page (with PTE pgtable), it >>>> is possible to collapse the pmd and benefit from THP again. This patch >>>> does the collapse. >>>> >>>> An issue on earlier version was discovered by kbuild test robot. >>>> >>>> Reported-by: kbuild test robot <lkp@intel.com> >>>> Signed-off-by: Song Liu <songliubraving@fb.com> >>>> --- >>>> include/linux/huge_mm.h | 7 +++++ >>>> kernel/events/uprobes.c | 5 ++- >>>> mm/huge_memory.c | 69 +++++++++++++++++++++++++++++++++++++++++ >>> >>> I still sync it's duplication of khugepaged functinallity. We need to fix >>> khugepaged to handle SCAN_PAGE_COMPOUND and probably refactor the code to >>> be able to call for collapse of particular range if we have all locks >>> taken (as we do in uprobe case). >>> >> >> I see the point now. I misunderstood it for a while. >> >> If we add this to khugepaged, it will have some conflicts with my other >> patchset. How about we move the functionality to khugepaged after these >> two sets get in? > > Is the last patch of the patchset essential? I think this part can be done > a bit later in a proper way, no? Technically, we need this patch to regroup pmd mapped page, and thus get the performance benefit after the uprobe is detached. On the other hand, if we get the first 4 patches of the this set and the other set in soonish. I will work on improving this patch right after that.. Thanks, Song
> On Jun 21, 2019, at 6:45 AM, Song Liu <songliubraving@fb.com> wrote: > > > >> On Jun 21, 2019, at 6:36 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote: >> >> On Fri, Jun 21, 2019 at 01:17:05PM +0000, Song Liu wrote: >>> >>> >>>> On Jun 21, 2019, at 5:48 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote: >>>> >>>> On Thu, Jun 13, 2019 at 10:57:47AM -0700, Song Liu wrote: >>>>> After all uprobes are removed from the huge page (with PTE pgtable), it >>>>> is possible to collapse the pmd and benefit from THP again. This patch >>>>> does the collapse. >>>>> >>>>> An issue on earlier version was discovered by kbuild test robot. >>>>> >>>>> Reported-by: kbuild test robot <lkp@intel.com> >>>>> Signed-off-by: Song Liu <songliubraving@fb.com> >>>>> --- >>>>> include/linux/huge_mm.h | 7 +++++ >>>>> kernel/events/uprobes.c | 5 ++- >>>>> mm/huge_memory.c | 69 +++++++++++++++++++++++++++++++++++++++++ >>>> >>>> I still sync it's duplication of khugepaged functinallity. We need to fix >>>> khugepaged to handle SCAN_PAGE_COMPOUND and probably refactor the code to >>>> be able to call for collapse of particular range if we have all locks >>>> taken (as we do in uprobe case). >>>> >>> >>> I see the point now. I misunderstood it for a while. >>> >>> If we add this to khugepaged, it will have some conflicts with my other >>> patchset. How about we move the functionality to khugepaged after these >>> two sets get in? >> >> Is the last patch of the patchset essential? I think this part can be done >> a bit later in a proper way, no? > > Technically, we need this patch to regroup pmd mapped page, and thus get > the performance benefit after the uprobe is detached. > > On the other hand, if we get the first 4 patches of the this set and the > other set in soonish. I will work on improving this patch right after that.. Actually, it might be pretty easy. We can just call try_collapse_huge_pmd() in khugepaged.c (in khugepaged_scan_shmem() or khugepaged_scan_file() after my other set). Let me fold that in and send v5. Thanks, Song
> On Jun 21, 2019, at 9:30 AM, Song Liu <songliubraving@fb.com> wrote: > > > >> On Jun 21, 2019, at 6:45 AM, Song Liu <songliubraving@fb.com> wrote: >> >> >> >>> On Jun 21, 2019, at 6:36 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote: >>> >>> On Fri, Jun 21, 2019 at 01:17:05PM +0000, Song Liu wrote: >>>> >>>> >>>>> On Jun 21, 2019, at 5:48 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote: >>>>> >>>>> On Thu, Jun 13, 2019 at 10:57:47AM -0700, Song Liu wrote: >>>>>> After all uprobes are removed from the huge page (with PTE pgtable), it >>>>>> is possible to collapse the pmd and benefit from THP again. This patch >>>>>> does the collapse. >>>>>> >>>>>> An issue on earlier version was discovered by kbuild test robot. >>>>>> >>>>>> Reported-by: kbuild test robot <lkp@intel.com> >>>>>> Signed-off-by: Song Liu <songliubraving@fb.com> >>>>>> --- >>>>>> include/linux/huge_mm.h | 7 +++++ >>>>>> kernel/events/uprobes.c | 5 ++- >>>>>> mm/huge_memory.c | 69 +++++++++++++++++++++++++++++++++++++++++ >>>>> >>>>> I still sync it's duplication of khugepaged functinallity. We need to fix >>>>> khugepaged to handle SCAN_PAGE_COMPOUND and probably refactor the code to >>>>> be able to call for collapse of particular range if we have all locks >>>>> taken (as we do in uprobe case). >>>>> >>>> >>>> I see the point now. I misunderstood it for a while. >>>> >>>> If we add this to khugepaged, it will have some conflicts with my other >>>> patchset. How about we move the functionality to khugepaged after these >>>> two sets get in? >>> >>> Is the last patch of the patchset essential? I think this part can be done >>> a bit later in a proper way, no? >> >> Technically, we need this patch to regroup pmd mapped page, and thus get >> the performance benefit after the uprobe is detached. >> >> On the other hand, if we get the first 4 patches of the this set and the >> other set in soonish. I will work on improving this patch right after that.. > > Actually, it might be pretty easy. We can just call try_collapse_huge_pmd() > in khugepaged.c (in khugepaged_scan_shmem() or khugepaged_scan_file() after > my other set). > > Let me fold that in and send v5. On a second thought, if we would have khugepaged to do collapse, we need a dedicated bit to tell khugepaged which pmd to collapse. Otherwise, it may accidentally collapse pmd that are split by other split_huge_pmd. I could not think of a good solution here. In this case, we really need a flag for this specific pmd. Flags for the compound_page or for the vma would not work, as those could be shared by multiple pmds. If the analysis above is correct, we probably need uprobe to specifically call try_collapse_huge_pmd() for now. Please share your suggestions. Thanks! Song
On Fri, Jun 21, 2019 at 06:04:14PM +0000, Song Liu wrote: > > > > On Jun 21, 2019, at 9:30 AM, Song Liu <songliubraving@fb.com> wrote: > > > > > > > >> On Jun 21, 2019, at 6:45 AM, Song Liu <songliubraving@fb.com> wrote: > >> > >> > >> > >>> On Jun 21, 2019, at 6:36 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote: > >>> > >>> On Fri, Jun 21, 2019 at 01:17:05PM +0000, Song Liu wrote: > >>>> > >>>> > >>>>> On Jun 21, 2019, at 5:48 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote: > >>>>> > >>>>> On Thu, Jun 13, 2019 at 10:57:47AM -0700, Song Liu wrote: > >>>>>> After all uprobes are removed from the huge page (with PTE pgtable), it > >>>>>> is possible to collapse the pmd and benefit from THP again. This patch > >>>>>> does the collapse. > >>>>>> > >>>>>> An issue on earlier version was discovered by kbuild test robot. > >>>>>> > >>>>>> Reported-by: kbuild test robot <lkp@intel.com> > >>>>>> Signed-off-by: Song Liu <songliubraving@fb.com> > >>>>>> --- > >>>>>> include/linux/huge_mm.h | 7 +++++ > >>>>>> kernel/events/uprobes.c | 5 ++- > >>>>>> mm/huge_memory.c | 69 +++++++++++++++++++++++++++++++++++++++++ > >>>>> > >>>>> I still sync it's duplication of khugepaged functinallity. We need to fix > >>>>> khugepaged to handle SCAN_PAGE_COMPOUND and probably refactor the code to > >>>>> be able to call for collapse of particular range if we have all locks > >>>>> taken (as we do in uprobe case). > >>>>> > >>>> > >>>> I see the point now. I misunderstood it for a while. > >>>> > >>>> If we add this to khugepaged, it will have some conflicts with my other > >>>> patchset. How about we move the functionality to khugepaged after these > >>>> two sets get in? > >>> > >>> Is the last patch of the patchset essential? I think this part can be done > >>> a bit later in a proper way, no? > >> > >> Technically, we need this patch to regroup pmd mapped page, and thus get > >> the performance benefit after the uprobe is detached. > >> > >> On the other hand, if we get the first 4 patches of the this set and the > >> other set in soonish. I will work on improving this patch right after that.. > > > > Actually, it might be pretty easy. We can just call try_collapse_huge_pmd() > > in khugepaged.c (in khugepaged_scan_shmem() or khugepaged_scan_file() after > > my other set). > > > > Let me fold that in and send v5. > > On a second thought, if we would have khugepaged to do collapse, we need a > dedicated bit to tell khugepaged which pmd to collapse. Otherwise, it may > accidentally collapse pmd that are split by other split_huge_pmd. Why is it a problem? Do you know a situation where such collapse possible and will break split_huge_pmd() user's expectation. If there's such user it is broken: normal locking should prevent such situation.
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 7cd5c150c21d..30669e9a9340 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -250,6 +250,9 @@ static inline bool thp_migration_supported(void) return IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION); } +extern void try_collapse_huge_pmd(struct vm_area_struct *vma, + struct page *page); + #else /* CONFIG_TRANSPARENT_HUGEPAGE */ #define HPAGE_PMD_SHIFT ({ BUILD_BUG(); 0; }) #define HPAGE_PMD_MASK ({ BUILD_BUG(); 0; }) @@ -368,6 +371,10 @@ static inline bool thp_migration_supported(void) { return false; } + +static inline void try_collapse_huge_pmd(struct vm_area_struct *vma, + struct page *page) {} + #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #endif /* _LINUX_HUGE_MM_H */ diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index a20d7b43a056..9bec602bf79e 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -474,6 +474,7 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, struct page *old_page, *new_page; struct vm_area_struct *vma; int ret, is_register, ref_ctr_updated = 0; + struct page *orig_page = NULL; is_register = is_swbp_insn(&opcode); uprobe = container_of(auprobe, struct uprobe, arch); @@ -512,7 +513,6 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, copy_to_page(new_page, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE); if (!is_register) { - struct page *orig_page; pgoff_t index; index = vaddr_to_offset(vma, vaddr & PAGE_MASK) >> PAGE_SHIFT; @@ -540,6 +540,9 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, if (ret && is_register && ref_ctr_updated) update_ref_ctr(uprobe, mm, -1); + if (!ret && orig_page && PageTransCompound(orig_page)) + try_collapse_huge_pmd(vma, orig_page); + return ret; } diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 9f8bce9a6b32..cc8464650b72 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2886,6 +2886,75 @@ static struct shrinker deferred_split_shrinker = { .flags = SHRINKER_NUMA_AWARE, }; +/** + * try_collapse_huge_pmd - try collapse pmd for a pte mapped huge page + * @vma: vma containing the huge page + * @page: any sub page of the huge page + */ +void try_collapse_huge_pmd(struct vm_area_struct *vma, + struct page *page) +{ + struct page *hpage = compound_head(page); + struct mm_struct *mm = vma->vm_mm; + struct mmu_notifier_range range; + unsigned long haddr; + unsigned long addr; + pmd_t *pmd, _pmd; + spinlock_t *ptl; + int i, count = 0; + + VM_BUG_ON_PAGE(!PageCompound(page), page); + + haddr = page_address_in_vma(hpage, vma); + pmd = mm_find_pmd(mm, haddr); + if (!pmd) + return; + + lock_page(hpage); + ptl = pmd_lock(mm, pmd); + + /* step 1: check all mapped PTEs */ + for (i = 0, addr = haddr; i < HPAGE_PMD_NR; i++, addr += PAGE_SIZE) { + pte_t *pte = pte_offset_map(pmd, addr); + + if (pte_none(*pte)) + continue; + if (hpage + i != vm_normal_page(vma, addr, *pte)) { + spin_unlock(ptl); + unlock_page(hpage); + return; + } + count++; + } + + /* step 2: adjust rmap */ + for (i = 0, addr = haddr; i < HPAGE_PMD_NR; i++, addr += PAGE_SIZE) { + pte_t *pte = pte_offset_map(pmd, addr); + struct page *p; + + if (pte_none(*pte)) + continue; + p = vm_normal_page(vma, addr, *pte); + page_remove_rmap(p, false); + } + + /* step 3: flip page table */ + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, NULL, mm, + haddr, haddr + HPAGE_PMD_SIZE); + mmu_notifier_invalidate_range_start(&range); + + _pmd = pmdp_collapse_flush(vma, haddr, pmd); + spin_unlock(ptl); + mmu_notifier_invalidate_range_end(&range); + + /* step 4: free pgtable, set refcount, mm_counters, etc. */ + page_ref_sub(page, count); + unlock_page(hpage); + mm_dec_nr_ptes(mm); + pte_free(mm, pmd_pgtable(_pmd)); + add_mm_counter(mm, mm_counter_file(page), -count); +} + #ifdef CONFIG_DEBUG_FS static int split_huge_pages_set(void *data, u64 val) {
After all uprobes are removed from the huge page (with PTE pgtable), it is possible to collapse the pmd and benefit from THP again. This patch does the collapse. An issue on earlier version was discovered by kbuild test robot. Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Song Liu <songliubraving@fb.com> --- include/linux/huge_mm.h | 7 +++++ kernel/events/uprobes.c | 5 ++- mm/huge_memory.c | 69 +++++++++++++++++++++++++++++++++++++++++ 3 files changed, 80 insertions(+), 1 deletion(-)