Message ID | 1585750678-5840-1-git-send-email-kernelfans@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm/rmap: fix the handling of !private device page in try_to_unmap_one() | expand |
On Wed 01-04-20 22:17:58, Pingfan Liu wrote: > This patch is a pure code refinement without any functional changes. > > try_to_unmap_one() is shared by try_to_unmap() and try_to_munlock(). As for > unmap, if try_to_unmap_one() return true, it means the pte has been teared > down and mapcount dec. I haven't really checked the full history of the rmap walk but this is certainly not the currently implemented semantic of this callback. Returing true only tells the caller that it should continue with other VMAs which map the given page. It doesn't really mean that the pte has been torn down. The munlock case is a nice example of how that is use properly while migration path for device pages how it is used incorrectly because it doesn't make any sense to walk other VMAs because is_device_private_page is a property of the page not the VMA. And that is the only reason to drop that. > Apparently the current code > > if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) && > is_zone_device_page(page) && !is_device_private_page(page)) > return true; > > conflicts with this logic. > > Further more, as for zone_device, migration can only happen on > is_device_private_page(page). For other zone_device, memmap_init_zone_device() > raises an extra _refcount on all zone pages. This extra _refcount will block > migration. So in try_to_unmap_one(), it can just return false for other zone > device. > > The reason why original code happen to work one !private zone_device. > -1. if page mapped, then try_to_unmap_one()->page_remove_rmap() is skipped, and > finally try_to_unmap(){ return !page_mapcount(page) ? true : false;} will > return false. > -2. if page not mapped, the extra _refcount will prevent the migration. And this sounds like a separate change to me worth a patch on its own. > Signed-off-by: Pingfan Liu <kernelfans@gmail.com> > Cc: Jérôme Glisse <jglisse@redhat.com> > Cc: Michal Hocko <mhocko@kernel.org> > Cc: Ralph Campbell <rcampbell@nvidia.com> > Cc: Balbir Singh <bsingharora@gmail.com> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: John Hubbard <jhubbard@nvidia.com> > Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > To: linux-mm@kvack.org > --- > v1 -> v2: improve commit log and note in code > mm/rmap.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/mm/rmap.c b/mm/rmap.c > index b838647..723af4f 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -1358,6 +1358,10 @@ void page_remove_rmap(struct page *page, bool compound) > > /* > * @arg: enum ttu_flags will be passed to this argument > + * > + * For munlock, return true if @page is not mlocked by @vma without killing pte > + * For unmap, return true after tearing down pte. > + * For both cases, return false if rmap_walk should be stopped. > */ > static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, > unsigned long address, void *arg) > @@ -1380,7 +1384,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, > > if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) && > is_zone_device_page(page) && !is_device_private_page(page)) > - return true; > + return false; > > if (flags & TTU_SPLIT_HUGE_PMD) { > split_huge_pmd_address(vma, address, > @@ -1487,7 +1491,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, > > if (IS_ENABLED(CONFIG_MIGRATION) && > (flags & TTU_MIGRATION) && > - is_zone_device_page(page)) { > + is_device_private_page(page)) { > swp_entry_t entry; > pte_t swp_pte; > > -- > 2.7.5
On Wed, Apr 1, 2020 at 11:58 PM Michal Hocko <mhocko@kernel.org> wrote: > > On Wed 01-04-20 22:17:58, Pingfan Liu wrote: > > This patch is a pure code refinement without any functional changes. > > > > try_to_unmap_one() is shared by try_to_unmap() and try_to_munlock(). As for > > unmap, if try_to_unmap_one() return true, it means the pte has been teared > > down and mapcount dec. > > I haven't really checked the full history of the rmap walk but this is > certainly not the currently implemented semantic of this callback. > Returing true only tells the caller that it should continue with other > VMAs which map the given page. It doesn't really mean that the pte has > been torn down. The munlock case is a nice example of how that is use I did not paste the whole story in commit log, but note them in the code. For munlock, we only care about the page will be put to correct lru. But as commit "As for unmap", it should tear down pte, otherwise the page may be accessed by and old mapping. Also here omit an assumption, for !private device page, e.g. fs-dax, there is no need to mlock them. > properly while migration path for device pages how it is used > incorrectly because it doesn't make any sense to walk other VMAs because > is_device_private_page is a property of the page not the VMA. And that > is the only reason to drop that. > > > Apparently the current code > > > > if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) && > > is_zone_device_page(page) && !is_device_private_page(page)) > > return true; > > > > conflicts with this logic. > > [...] > > /* > > * @arg: enum ttu_flags will be passed to this argument > > + * > > + * For munlock, return true if @page is not mlocked by @vma without killing pte Here is the note for munlock case Thanks, Pingfan > > + * For unmap, return true after tearing down pte. > > + * For both cases, return false if rmap_walk should be stopped. > > */ > > static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, > > unsigned long address, void *arg) > > @@ -1380,7 +1384,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, > > > > if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) && > > is_zone_device_page(page) && !is_device_private_page(page)) > > - return true; > > + return false; > > > > if (flags & TTU_SPLIT_HUGE_PMD) { > > split_huge_pmd_address(vma, address, > > @@ -1487,7 +1491,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, > > > > if (IS_ENABLED(CONFIG_MIGRATION) && > > (flags & TTU_MIGRATION) && > > - is_zone_device_page(page)) { > > + is_device_private_page(page)) { > > swp_entry_t entry; > > pte_t swp_pte; > > > > -- > > 2.7.5 > > -- > Michal Hocko > SUSE Labs
diff --git a/mm/rmap.c b/mm/rmap.c index b838647..723af4f 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1358,6 +1358,10 @@ void page_remove_rmap(struct page *page, bool compound) /* * @arg: enum ttu_flags will be passed to this argument + * + * For munlock, return true if @page is not mlocked by @vma without killing pte + * For unmap, return true after tearing down pte. + * For both cases, return false if rmap_walk should be stopped. */ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, unsigned long address, void *arg) @@ -1380,7 +1384,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) && is_zone_device_page(page) && !is_device_private_page(page)) - return true; + return false; if (flags & TTU_SPLIT_HUGE_PMD) { split_huge_pmd_address(vma, address, @@ -1487,7 +1491,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) && - is_zone_device_page(page)) { + is_device_private_page(page)) { swp_entry_t entry; pte_t swp_pte;
This patch is a pure code refinement without any functional changes. try_to_unmap_one() is shared by try_to_unmap() and try_to_munlock(). As for unmap, if try_to_unmap_one() return true, it means the pte has been teared down and mapcount dec. Apparently the current code if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) && is_zone_device_page(page) && !is_device_private_page(page)) return true; conflicts with this logic. Further more, as for zone_device, migration can only happen on is_device_private_page(page). For other zone_device, memmap_init_zone_device() raises an extra _refcount on all zone pages. This extra _refcount will block migration. So in try_to_unmap_one(), it can just return false for other zone device. The reason why original code happen to work one !private zone_device. -1. if page mapped, then try_to_unmap_one()->page_remove_rmap() is skipped, and finally try_to_unmap(){ return !page_mapcount(page) ? true : false;} will return false. -2. if page not mapped, the extra _refcount will prevent the migration. Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> To: linux-mm@kvack.org --- v1 -> v2: improve commit log and note in code mm/rmap.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) -- 2.7.5