Message ID | 1f7e1d083864fbb17a20a9c8349d2e8b427e20a3.1628174413.git.baolin.wang@linux.alibaba.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Some cleanup for page migration | expand |
On Thu, Aug 05, 2021 at 11:05:56PM +0800, Baolin Wang wrote: > We've got the expected count for anonymous page or file page by > expected_page_refs() at the beginning of migrate_page_move_mapping(), > thus we should move the page count validation a little forward to > reduce duplicated code. Please add an explanation to the changelog for why it's safe to pull this out from under the i_pages lock. > Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> > --- > mm/migrate.c | 10 ++++------ > 1 file changed, 4 insertions(+), 6 deletions(-) > > diff --git a/mm/migrate.c b/mm/migrate.c > index 239b238..5559571 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -386,11 +386,10 @@ int folio_migrate_mapping(struct address_space *mapping, > int expected_count = expected_page_refs(mapping, &folio->page) + extra_count; > long nr = folio_nr_pages(folio); > > - if (!mapping) { > - /* Anonymous page without mapping */ > - if (folio_ref_count(folio) != expected_count) > - return -EAGAIN; > + if (folio_ref_count(folio) != expected_count) > + return -EAGAIN; > > + if (!mapping) { > /* No turning back from here */ > newfolio->index = folio->index; > newfolio->mapping = folio->mapping; > @@ -404,8 +403,7 @@ int folio_migrate_mapping(struct address_space *mapping, > newzone = folio_zone(newfolio); > > xas_lock_irq(&xas); > - if (folio_ref_count(folio) != expected_count || > - xas_load(&xas) != folio) { > + if (xas_load(&xas) != folio) { > xas_unlock_irq(&xas); > return -EAGAIN; > } > -- > 1.8.3.1 > >
Hi Matthew, > On Thu, Aug 05, 2021 at 11:05:56PM +0800, Baolin Wang wrote: >> We've got the expected count for anonymous page or file page by >> expected_page_refs() at the beginning of migrate_page_move_mapping(), >> thus we should move the page count validation a little forward to >> reduce duplicated code. > > Please add an explanation to the changelog for why it's safe to pull > this out from under the i_pages lock. Sure. In folio_migrate_mapping(), we are sure that the migration page was isolated from lru list and locked, so I think there are no race to get the page count without i_pages lock. Please correct me if I missed something else. Thanks. > >> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> >> --- >> mm/migrate.c | 10 ++++------ >> 1 file changed, 4 insertions(+), 6 deletions(-) >> >> diff --git a/mm/migrate.c b/mm/migrate.c >> index 239b238..5559571 100644 >> --- a/mm/migrate.c >> +++ b/mm/migrate.c >> @@ -386,11 +386,10 @@ int folio_migrate_mapping(struct address_space *mapping, >> int expected_count = expected_page_refs(mapping, &folio->page) + extra_count; >> long nr = folio_nr_pages(folio); >> >> - if (!mapping) { >> - /* Anonymous page without mapping */ >> - if (folio_ref_count(folio) != expected_count) >> - return -EAGAIN; >> + if (folio_ref_count(folio) != expected_count) >> + return -EAGAIN; >> >> + if (!mapping) { >> /* No turning back from here */ >> newfolio->index = folio->index; >> newfolio->mapping = folio->mapping; >> @@ -404,8 +403,7 @@ int folio_migrate_mapping(struct address_space *mapping, >> newzone = folio_zone(newfolio); >> >> xas_lock_irq(&xas); >> - if (folio_ref_count(folio) != expected_count || >> - xas_load(&xas) != folio) { >> + if (xas_load(&xas) != folio) { >> xas_unlock_irq(&xas); >> return -EAGAIN; >> } >> -- >> 1.8.3.1 >> >>
On Fri, Aug 06, 2021 at 11:07:18AM +0800, Baolin Wang wrote: > Hi Matthew, > > > On Thu, Aug 05, 2021 at 11:05:56PM +0800, Baolin Wang wrote: > > > We've got the expected count for anonymous page or file page by > > > expected_page_refs() at the beginning of migrate_page_move_mapping(), > > > thus we should move the page count validation a little forward to > > > reduce duplicated code. > > > > Please add an explanation to the changelog for why it's safe to pull > > this out from under the i_pages lock. > > Sure. In folio_migrate_mapping(), we are sure that the migration page was > isolated from lru list and locked, so I think there are no race to get the > page count without i_pages lock. Please correct me if I missed something > else. Thanks. Unless the page has been removed from i_pages, this isn't a correct explanation. Even if it has been removed from i_pages, unless an RCU grace period has passed, another CPU may still be able to inc the refcount on it (temporarily). The same is true for the page tables, by the way; if someone is using get_user_pages_fast(), they may still be able to see the page.
Hi, > On Fri, Aug 06, 2021 at 11:07:18AM +0800, Baolin Wang wrote: >> Hi Matthew, >> >>> On Thu, Aug 05, 2021 at 11:05:56PM +0800, Baolin Wang wrote: >>>> We've got the expected count for anonymous page or file page by >>>> expected_page_refs() at the beginning of migrate_page_move_mapping(), >>>> thus we should move the page count validation a little forward to >>>> reduce duplicated code. >>> >>> Please add an explanation to the changelog for why it's safe to pull >>> this out from under the i_pages lock. >> >> Sure. In folio_migrate_mapping(), we are sure that the migration page was >> isolated from lru list and locked, so I think there are no race to get the >> page count without i_pages lock. Please correct me if I missed something >> else. Thanks. > > Unless the page has been removed from i_pages, this isn't a correct > explanation. Even if it has been removed from i_pages, unless an > RCU grace period has passed, another CPU may still be able to inc the > refcount on it (temporarily). The same is true for the page tables, > by the way; if someone is using get_user_pages_fast(), they may still > be able to see the page. I don't think this is an issue, cause now we've established a migration pte for this migration page under page lock. If the user want to get page by get_user_pages_fast(), it will wait for the page miggration finished by migration_entry_wait(). So I still think there is no need to check the migration page count under the i_pages lock.
On Sun, Aug 08, 2021 at 10:55:30AM +0800, Baolin Wang wrote: > Hi, > > > On Fri, Aug 06, 2021 at 11:07:18AM +0800, Baolin Wang wrote: > > > Hi Matthew, > > > > > > > On Thu, Aug 05, 2021 at 11:05:56PM +0800, Baolin Wang wrote: > > > > > We've got the expected count for anonymous page or file page by > > > > > expected_page_refs() at the beginning of migrate_page_move_mapping(), > > > > > thus we should move the page count validation a little forward to > > > > > reduce duplicated code. > > > > > > > > Please add an explanation to the changelog for why it's safe to pull > > > > this out from under the i_pages lock. > > > > > > Sure. In folio_migrate_mapping(), we are sure that the migration page was > > > isolated from lru list and locked, so I think there are no race to get the > > > page count without i_pages lock. Please correct me if I missed something > > > else. Thanks. > > > > Unless the page has been removed from i_pages, this isn't a correct > > explanation. Even if it has been removed from i_pages, unless an > > RCU grace period has passed, another CPU may still be able to inc the > > refcount on it (temporarily). The same is true for the page tables, > > by the way; if someone is using get_user_pages_fast(), they may still > > be able to see the page. > > I don't think this is an issue, cause now we've established a migration pte > for this migration page under page lock. If the user want to get page by > get_user_pages_fast(), it will wait for the page miggration finished by > migration_entry_wait(). So I still think there is no need to check the > migration page count under the i_pages lock. I don't know whether the patch is correct or not, but you aren't nearly paranoid enough. Consider this sequence of events: CPU 0: CPU 1: get_user_pages_fast() lockless_pages_from_mm() local_irq_save() gup_pgd_range() gup_p4d_range() gup_pud_range() gup_pmd_range() gup_pte_range() pte_t pte = ptep_get_lockless(ptep); migrate_vma_collect_pmd() ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl) ptep_get_and_clear(mm, addr, ptep); page = pte_page(pte); set_pte_at(mm, addr, ptep, swp_pte); migrate_page_move_mapping() head = try_grab_compound_head(page, 1, flags); ... now page's refcount is temporarily higher than it should be. CPU 0 will notice the PTE is no longer the PTE that it used to be and drop the reference, but in the meantime, CPU 1 can observe the higher refcount. None of this has anything to do with the i_pages lock. Holding it does not protect from this race, but you need to know this kind of thing to decide if changing how we test a page's refcount is safe or not.
On 2021/8/8 18:26, Matthew Wilcox wrote: > On Sun, Aug 08, 2021 at 10:55:30AM +0800, Baolin Wang wrote: >> Hi, >> >>> On Fri, Aug 06, 2021 at 11:07:18AM +0800, Baolin Wang wrote: >>>> Hi Matthew, >>>> >>>>> On Thu, Aug 05, 2021 at 11:05:56PM +0800, Baolin Wang wrote: >>>>>> We've got the expected count for anonymous page or file page by >>>>>> expected_page_refs() at the beginning of migrate_page_move_mapping(), >>>>>> thus we should move the page count validation a little forward to >>>>>> reduce duplicated code. >>>>> >>>>> Please add an explanation to the changelog for why it's safe to pull >>>>> this out from under the i_pages lock. >>>> >>>> Sure. In folio_migrate_mapping(), we are sure that the migration page was >>>> isolated from lru list and locked, so I think there are no race to get the >>>> page count without i_pages lock. Please correct me if I missed something >>>> else. Thanks. >>> >>> Unless the page has been removed from i_pages, this isn't a correct >>> explanation. Even if it has been removed from i_pages, unless an >>> RCU grace period has passed, another CPU may still be able to inc the >>> refcount on it (temporarily). The same is true for the page tables, >>> by the way; if someone is using get_user_pages_fast(), they may still >>> be able to see the page. >> >> I don't think this is an issue, cause now we've established a migration pte >> for this migration page under page lock. If the user want to get page by >> get_user_pages_fast(), it will wait for the page miggration finished by >> migration_entry_wait(). So I still think there is no need to check the >> migration page count under the i_pages lock. > > I don't know whether the patch is correct or not, but you aren't nearly > paranoid enough. Consider this sequence of events: Thanks for describing this scenario. > > CPU 0: CPU 1: > get_user_pages_fast() > lockless_pages_from_mm() > local_irq_save() > gup_pgd_range() > gup_p4d_range() > gup_pud_range() > gup_pmd_range() > gup_pte_range() > pte_t pte = ptep_get_lockless(ptep); > migrate_vma_collect_pmd() > ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl) > ptep_get_and_clear(mm, addr, ptep); > page = pte_page(pte); > set_pte_at(mm, addr, ptep, swp_pte); > migrate_page_move_mapping() > head = try_grab_compound_head(page, 1, flags); On CPU0, after grab the page count, it will validate the PTE again. If swap PTE has been established for this page, it will drop the count and go to the slow path. if (unlikely(pte_val(pte) != pte_val(*ptep))) { put_compound_head(head, 1, flags); goto pte_unmap; } So CPU1 can not observe the abnormal higher refcount in this case if I did not miss anything. > ... now page's refcount is temporarily higher than it should be. CPU 0 > will notice the PTE is no longer the PTE that it used to be and drop > the reference, but in the meantime, CPU 1 can observe the higher refcount. > > None of this has anything to do with the i_pages lock. Holding it does Yes, the i_pages lock can not guarantee anything related getting page count, so I think we can move this out of the i_pages lock. > not protect from this race, but you need to know this kind of thing to > decide if changing how we test a page's refcount is safe or not. Yes, I will continue to check if there are some races when validating the page count. Any suggestion are welcome.
On Sun, Aug 08, 2021 at 11:13:28PM +0800, Baolin Wang wrote: > On 2021/8/8 18:26, Matthew Wilcox wrote: > > On Sun, Aug 08, 2021 at 10:55:30AM +0800, Baolin Wang wrote: > > > Hi, > > > > > > > On Fri, Aug 06, 2021 at 11:07:18AM +0800, Baolin Wang wrote: > > > > > Hi Matthew, > > > > > > > > > > > On Thu, Aug 05, 2021 at 11:05:56PM +0800, Baolin Wang wrote: > > > > > > > We've got the expected count for anonymous page or file page by > > > > > > > expected_page_refs() at the beginning of migrate_page_move_mapping(), > > > > > > > thus we should move the page count validation a little forward to > > > > > > > reduce duplicated code. > > > > > > > > > > > > Please add an explanation to the changelog for why it's safe to pull > > > > > > this out from under the i_pages lock. > > > > > > > > > > Sure. In folio_migrate_mapping(), we are sure that the migration page was > > > > > isolated from lru list and locked, so I think there are no race to get the > > > > > page count without i_pages lock. Please correct me if I missed something > > > > > else. Thanks. > > > > > > > > Unless the page has been removed from i_pages, this isn't a correct > > > > explanation. Even if it has been removed from i_pages, unless an > > > > RCU grace period has passed, another CPU may still be able to inc the > > > > refcount on it (temporarily). The same is true for the page tables, > > > > by the way; if someone is using get_user_pages_fast(), they may still > > > > be able to see the page. > > > > > > I don't think this is an issue, cause now we've established a migration pte > > > for this migration page under page lock. If the user want to get page by > > > get_user_pages_fast(), it will wait for the page miggration finished by > > > migration_entry_wait(). So I still think there is no need to check the > > > migration page count under the i_pages lock. > > > > I don't know whether the patch is correct or not, but you aren't nearly > > paranoid enough. Consider this sequence of events: > > Thanks for describing this scenario. > > > > > CPU 0: CPU 1: > > get_user_pages_fast() > > lockless_pages_from_mm() > > local_irq_save() > > gup_pgd_range() > > gup_p4d_range() > > gup_pud_range() > > gup_pmd_range() > > gup_pte_range() > > pte_t pte = ptep_get_lockless(ptep); > > migrate_vma_collect_pmd() > > ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl) > > ptep_get_and_clear(mm, addr, ptep); > > page = pte_page(pte); > > set_pte_at(mm, addr, ptep, swp_pte); > > migrate_page_move_mapping() > > head = try_grab_compound_head(page, 1, flags); > > On CPU0, after grab the page count, it will validate the PTE again. If swap > PTE has been established for this page, it will drop the count and go to the > slow path. > if (unlikely(pte_val(pte) != pte_val(*ptep))) { > put_compound_head(head, 1, flags); > goto pte_unmap; > } > > So CPU1 can not observe the abnormal higher refcount in this case if I did > not miss anything. This is a race between CPUs. There is no synchronisation between them, so CPU 1 can absolutely see the refcount higher temporarily. Yes, CPU 0 will eventually put the refcount, but CPU 1 can observe it high.
On 2021/8/9 0:01, Matthew Wilcox wrote: > On Sun, Aug 08, 2021 at 11:13:28PM +0800, Baolin Wang wrote: >> On 2021/8/8 18:26, Matthew Wilcox wrote: >>> On Sun, Aug 08, 2021 at 10:55:30AM +0800, Baolin Wang wrote: >>>> Hi, >>>> >>>>> On Fri, Aug 06, 2021 at 11:07:18AM +0800, Baolin Wang wrote: >>>>>> Hi Matthew, >>>>>> >>>>>>> On Thu, Aug 05, 2021 at 11:05:56PM +0800, Baolin Wang wrote: >>>>>>>> We've got the expected count for anonymous page or file page by >>>>>>>> expected_page_refs() at the beginning of migrate_page_move_mapping(), >>>>>>>> thus we should move the page count validation a little forward to >>>>>>>> reduce duplicated code. >>>>>>> >>>>>>> Please add an explanation to the changelog for why it's safe to pull >>>>>>> this out from under the i_pages lock. >>>>>> >>>>>> Sure. In folio_migrate_mapping(), we are sure that the migration page was >>>>>> isolated from lru list and locked, so I think there are no race to get the >>>>>> page count without i_pages lock. Please correct me if I missed something >>>>>> else. Thanks. >>>>> >>>>> Unless the page has been removed from i_pages, this isn't a correct >>>>> explanation. Even if it has been removed from i_pages, unless an >>>>> RCU grace period has passed, another CPU may still be able to inc the >>>>> refcount on it (temporarily). The same is true for the page tables, >>>>> by the way; if someone is using get_user_pages_fast(), they may still >>>>> be able to see the page. >>>> >>>> I don't think this is an issue, cause now we've established a migration pte >>>> for this migration page under page lock. If the user want to get page by >>>> get_user_pages_fast(), it will wait for the page miggration finished by >>>> migration_entry_wait(). So I still think there is no need to check the >>>> migration page count under the i_pages lock. >>> >>> I don't know whether the patch is correct or not, but you aren't nearly >>> paranoid enough. Consider this sequence of events: >> >> Thanks for describing this scenario. >> >>> >>> CPU 0: CPU 1: >>> get_user_pages_fast() >>> lockless_pages_from_mm() >>> local_irq_save() >>> gup_pgd_range() >>> gup_p4d_range() >>> gup_pud_range() >>> gup_pmd_range() >>> gup_pte_range() >>> pte_t pte = ptep_get_lockless(ptep); >>> migrate_vma_collect_pmd() >>> ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl) >>> ptep_get_and_clear(mm, addr, ptep); >>> page = pte_page(pte); >>> set_pte_at(mm, addr, ptep, swp_pte); >>> migrate_page_move_mapping() >>> head = try_grab_compound_head(page, 1, flags); >> >> On CPU0, after grab the page count, it will validate the PTE again. If swap >> PTE has been established for this page, it will drop the count and go to the >> slow path. >> if (unlikely(pte_val(pte) != pte_val(*ptep))) { >> put_compound_head(head, 1, flags); >> goto pte_unmap; >> } >> >> So CPU1 can not observe the abnormal higher refcount in this case if I did >> not miss anything. > > This is a race between CPUs. There is no synchronisation between them, > so CPU 1 can absolutely see the refcount higher temporarily. Yes, > CPU 0 will eventually put the refcount, but CPU 1 can observe it high. OK, I understood your concern. I agree CPU 1 can observe refcount higher temporarily, but the migrate_page_move_mapping() has passed the page count validation, and will think the page mapping can be migrated, since CPU0 will failed to get the page count to go to the slow path. If the CPU0 increase the page count after page_count() validation in migrate_page_move_mapping() on CPU1, and CPU1 will freeze the page count to repalce the mapping. if (!page_ref_freeze(page, expected_count)) { xas_unlock_irq(&xas); return -EAGAIN; } So CPU0 will failed to increase page count by try_grab_compound_head() if this page count is under freezing; or CPU1 will failed to freeze the page count if CPU0 increases page count successfully, which will abort the migration; or after the CPU1 freezing, the CPU0 will increase the page count successfully, but will put the page count since PTE was changed. Until now, I did not see any terrible things when validating the page count in migrate_page_move_mapping() if I understood correctly. But I have another question, should we change to use ptep_get_lockless() instead of pte_val(*ptep) to validate the PTE in gup_pte_range(), to avoid getting the old value? @@ -2185,7 +2185,7 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, goto pte_unmap; } - if (unlikely(pte_val(pte) != pte_val(*ptep))) { + if (unlikely(pte_val(pte) != ptep_get_lockless(ptep))) { put_compound_head(head, 1, flags); goto pte_unmap; }
diff --git a/mm/migrate.c b/mm/migrate.c index 239b238..5559571 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -386,11 +386,10 @@ int folio_migrate_mapping(struct address_space *mapping, int expected_count = expected_page_refs(mapping, &folio->page) + extra_count; long nr = folio_nr_pages(folio); - if (!mapping) { - /* Anonymous page without mapping */ - if (folio_ref_count(folio) != expected_count) - return -EAGAIN; + if (folio_ref_count(folio) != expected_count) + return -EAGAIN; + if (!mapping) { /* No turning back from here */ newfolio->index = folio->index; newfolio->mapping = folio->mapping; @@ -404,8 +403,7 @@ int folio_migrate_mapping(struct address_space *mapping, newzone = folio_zone(newfolio); xas_lock_irq(&xas); - if (folio_ref_count(folio) != expected_count || - xas_load(&xas) != folio) { + if (xas_load(&xas) != folio) { xas_unlock_irq(&xas); return -EAGAIN; }
We've got the expected count for anonymous page or file page by expected_page_refs() at the beginning of migrate_page_move_mapping(), thus we should move the page count validation a little forward to reduce duplicated code. Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> --- mm/migrate.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-)