Message ID | 20200327170601.18563-5-kirill.shutemov@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | thp/khugepaged improvements and CoW semantics | expand |
On 27 Mar 2020, at 13:05, Kirill A. Shutemov wrote: > The page can be included into collapse as long as it doesn't have extra > pins (from GUP or otherwise). > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > --- > mm/khugepaged.c | 28 ++++++++++++++++------------ > 1 file changed, 16 insertions(+), 12 deletions(-) > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index 39e0994abeb8..b47edfe57f7b 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -581,18 +581,26 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, > } > > /* > - * cannot use mapcount: can't collapse if there's a gup pin. > - * The page must only be referenced by the scanned process > - * and page swap cache. > + * Check if the page has any GUP (or other external) pins. > + * > + * The page table that maps the page has been already unlinked > + * from the page table tree and this process cannot get > + * additinal pin on the page. > + * > + * New pins can come later if the page is shared across fork, > + * but not for the this process. It is fine. The other process > + * cannot write to the page, only trigger CoW. > */ > - if (page_count(page) != 1 + PageSwapCache(page)) { > + if (total_mapcount(page) + PageSwapCache(page) != > + page_count(page)) { Do you think having a function for this check would be better? Since the check is used three times. > /* > * Drain pagevec and retry just in case we can get rid > * of the extra pin, like in swapin case. > */ > lru_add_drain(); > } > - if (page_count(page) != 1 + PageSwapCache(page)) { > + if (total_mapcount(page) + PageSwapCache(page) != > + page_count(page)) { > unlock_page(page); > result = SCAN_PAGE_COUNT; > goto out; > @@ -680,7 +688,6 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page, > } else { > src_page = pte_page(pteval); > copy_user_highpage(page, src_page, address, vma); > - VM_BUG_ON_PAGE(page_mapcount(src_page) != 1, src_page); Maybe replace it with this? VM_BUG_ON_PAGE(page_mapcount(src_page) + PageSwapCache(src_page) != page_count(src_page), src_page); > release_pte_page(src_page); > /* > * ptl mostly unnecessary, but preempt has to > @@ -1209,12 +1216,9 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, > goto out_unmap; > } > > - /* > - * cannot use mapcount: can't collapse if there's a gup pin. > - * The page must only be referenced by the scanned process > - * and page swap cache. > - */ > - if (page_count(page) != 1 + PageSwapCache(page)) { > + /* Check if the page has any GUP (or other external) pins */ > + if (total_mapcount(page) + PageSwapCache(page) != > + page_count(page)) { > result = SCAN_PAGE_COUNT; > goto out_unmap; > } > -- > 2.26.0 Thanks. — Best Regards, Yan Zi
On Fri, Mar 27, 2020 at 11:20 AM Zi Yan <zi.yan@sent.com> wrote: > > On 27 Mar 2020, at 13:05, Kirill A. Shutemov wrote: > > > The page can be included into collapse as long as it doesn't have extra > > pins (from GUP or otherwise). > > > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > > --- > > mm/khugepaged.c | 28 ++++++++++++++++------------ > > 1 file changed, 16 insertions(+), 12 deletions(-) > > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > > index 39e0994abeb8..b47edfe57f7b 100644 > > --- a/mm/khugepaged.c > > +++ b/mm/khugepaged.c > > @@ -581,18 +581,26 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, > > } > > > > /* > > - * cannot use mapcount: can't collapse if there's a gup pin. > > - * The page must only be referenced by the scanned process > > - * and page swap cache. > > + * Check if the page has any GUP (or other external) pins. > > + * > > + * The page table that maps the page has been already unlinked > > + * from the page table tree and this process cannot get > > + * additinal pin on the page. > > + * > > + * New pins can come later if the page is shared across fork, > > + * but not for the this process. It is fine. The other process > > + * cannot write to the page, only trigger CoW. > > */ > > - if (page_count(page) != 1 + PageSwapCache(page)) { > > + if (total_mapcount(page) + PageSwapCache(page) != > > + page_count(page)) { > > Do you think having a function for this check would be better? Since the check is used three times. > > > /* > > * Drain pagevec and retry just in case we can get rid > > * of the extra pin, like in swapin case. > > */ > > lru_add_drain(); > > } > > - if (page_count(page) != 1 + PageSwapCache(page)) { > > + if (total_mapcount(page) + PageSwapCache(page) != > > + page_count(page)) { > > unlock_page(page); > > result = SCAN_PAGE_COUNT; > > goto out; > > @@ -680,7 +688,6 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page, > > } else { > > src_page = pte_page(pteval); > > copy_user_highpage(page, src_page, address, vma); > > - VM_BUG_ON_PAGE(page_mapcount(src_page) != 1, src_page); > > Maybe replace it with this? > > VM_BUG_ON_PAGE(page_mapcount(src_page) + PageSwapCache(src_page) != page_count(src_page), src_page); I don't think this is correct either. If a THP is PTE mapped its refcount would be bumped by the number of PTE mapped subpages. But page_mapcount() would just return the mapcount of that specific subpage. So, total_mapcount() should be used, but the same check has been done before reaching here. > > > > release_pte_page(src_page); > > /* > > * ptl mostly unnecessary, but preempt has to > > @@ -1209,12 +1216,9 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, > > goto out_unmap; > > } > > > > - /* > > - * cannot use mapcount: can't collapse if there's a gup pin. > > - * The page must only be referenced by the scanned process > > - * and page swap cache. > > - */ > > - if (page_count(page) != 1 + PageSwapCache(page)) { > > + /* Check if the page has any GUP (or other external) pins */ > > + if (total_mapcount(page) + PageSwapCache(page) != > > + page_count(page)) { > > result = SCAN_PAGE_COUNT; > > goto out_unmap; > > } > > -- > > 2.26.0 > > Thanks. > > — > Best Regards, > Yan Zi
<snip> >>> /* >>> * Drain pagevec and retry just in case we can get rid >>> * of the extra pin, like in swapin case. >>> */ >>> lru_add_drain(); >>> } >>> - if (page_count(page) != 1 + PageSwapCache(page)) { >>> + if (total_mapcount(page) + PageSwapCache(page) != >>> + page_count(page)) { >>> unlock_page(page); >>> result = SCAN_PAGE_COUNT; >>> goto out; >>> @@ -680,7 +688,6 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page, >>> } else { >>> src_page = pte_page(pteval); >>> copy_user_highpage(page, src_page, address, vma); >>> - VM_BUG_ON_PAGE(page_mapcount(src_page) != 1, src_page); >> >> Maybe replace it with this? >> >> VM_BUG_ON_PAGE(page_mapcount(src_page) + PageSwapCache(src_page) != page_count(src_page), src_page); > > I don't think this is correct either. If a THP is PTE mapped its > refcount would be bumped by the number of PTE mapped subpages. But > page_mapcount() would just return the mapcount of that specific > subpage. So, total_mapcount() should be used, but the same check has > been done before reaching here. Yes, you are right. Thanks. Please disregard this comment. — Best Regards, Yan Zi
diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 39e0994abeb8..b47edfe57f7b 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -581,18 +581,26 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, } /* - * cannot use mapcount: can't collapse if there's a gup pin. - * The page must only be referenced by the scanned process - * and page swap cache. + * Check if the page has any GUP (or other external) pins. + * + * The page table that maps the page has been already unlinked + * from the page table tree and this process cannot get + * additinal pin on the page. + * + * New pins can come later if the page is shared across fork, + * but not for the this process. It is fine. The other process + * cannot write to the page, only trigger CoW. */ - if (page_count(page) != 1 + PageSwapCache(page)) { + if (total_mapcount(page) + PageSwapCache(page) != + page_count(page)) { /* * Drain pagevec and retry just in case we can get rid * of the extra pin, like in swapin case. */ lru_add_drain(); } - if (page_count(page) != 1 + PageSwapCache(page)) { + if (total_mapcount(page) + PageSwapCache(page) != + page_count(page)) { unlock_page(page); result = SCAN_PAGE_COUNT; goto out; @@ -680,7 +688,6 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page, } else { src_page = pte_page(pteval); copy_user_highpage(page, src_page, address, vma); - VM_BUG_ON_PAGE(page_mapcount(src_page) != 1, src_page); release_pte_page(src_page); /* * ptl mostly unnecessary, but preempt has to @@ -1209,12 +1216,9 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, goto out_unmap; } - /* - * cannot use mapcount: can't collapse if there's a gup pin. - * The page must only be referenced by the scanned process - * and page swap cache. - */ - if (page_count(page) != 1 + PageSwapCache(page)) { + /* Check if the page has any GUP (or other external) pins */ + if (total_mapcount(page) + PageSwapCache(page) != + page_count(page)) { result = SCAN_PAGE_COUNT; goto out_unmap; }
The page can be included into collapse as long as it doesn't have extra pins (from GUP or otherwise). Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> --- mm/khugepaged.c | 28 ++++++++++++++++------------ 1 file changed, 16 insertions(+), 12 deletions(-)