Message ID | 3b01af093515ce2960ac39bb16ff77473150d179.1661309831.git-series.apopple@nvidia.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v3,1/3] mm/migrate_device.c: Flush TLB while holding PTL | expand |
On 24.08.22 05:03, Alistair Popple wrote: > When clearing a PTE the TLB should be flushed whilst still holding the > PTL to avoid a potential race with madvise/munmap/etc. For example > consider the following sequence: > > CPU0 CPU1 > ---- ---- > > migrate_vma_collect_pmd() > pte_unmap_unlock() > madvise(MADV_DONTNEED) > -> zap_pte_range() > pte_offset_map_lock() > [ PTE not present, TLB not flushed ] > pte_unmap_unlock() > [ page is still accessible via stale TLB ] > flush_tlb_range() > > In this case the page may still be accessed via the stale TLB entry > after madvise returns. Fix this by flushing the TLB while holding the > PTL. > > Signed-off-by: Alistair Popple <apopple@nvidia.com> > Reported-by: Nadav Amit <nadav.amit@gmail.com> > Fixes: 8c3328f1f36a ("mm/migrate: migrate_vma() unmap page from vma while collecting pages") > Cc: stable@vger.kernel.org > > --- > > Changes for v3: > > - New for v3 > --- > mm/migrate_device.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/mm/migrate_device.c b/mm/migrate_device.c > index 27fb37d..6a5ef9f 100644 > --- a/mm/migrate_device.c > +++ b/mm/migrate_device.c > @@ -254,13 +254,14 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, > migrate->dst[migrate->npages] = 0; > migrate->src[migrate->npages++] = mpfn; > } > - arch_leave_lazy_mmu_mode(); > - pte_unmap_unlock(ptep - 1, ptl); > > /* Only flush the TLB if we actually modified any entries */ > if (unmapped) > flush_tlb_range(walk->vma, start, end); > > + arch_leave_lazy_mmu_mode(); > + pte_unmap_unlock(ptep - 1, ptl); > + > return 0; > } > > > base-commit: ffcf9c5700e49c0aee42dcba9a12ba21338e8136 I'm not a TLB-flushing expert, but this matches my understanding (and a TLB flushing Linux documentation I stumbled over some while ago but cannot quickly find). In the ordinary try_to_migrate_one() path, flushing would happen via ptep_clear_flush() (just like we do for the anon_exclusive case here as well), correct?
David Hildenbrand <david@redhat.com> writes: > On 24.08.22 05:03, Alistair Popple wrote: >> When clearing a PTE the TLB should be flushed whilst still holding the >> PTL to avoid a potential race with madvise/munmap/etc. For example >> consider the following sequence: >> >> CPU0 CPU1 >> ---- ---- >> >> migrate_vma_collect_pmd() >> pte_unmap_unlock() >> madvise(MADV_DONTNEED) >> -> zap_pte_range() >> pte_offset_map_lock() >> [ PTE not present, TLB not flushed ] >> pte_unmap_unlock() >> [ page is still accessible via stale TLB ] >> flush_tlb_range() >> >> In this case the page may still be accessed via the stale TLB entry >> after madvise returns. Fix this by flushing the TLB while holding the >> PTL. >> >> Signed-off-by: Alistair Popple <apopple@nvidia.com> >> Reported-by: Nadav Amit <nadav.amit@gmail.com> >> Fixes: 8c3328f1f36a ("mm/migrate: migrate_vma() unmap page from vma while collecting pages") >> Cc: stable@vger.kernel.org >> >> --- >> >> Changes for v3: >> >> - New for v3 >> --- >> mm/migrate_device.c | 5 +++-- >> 1 file changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/mm/migrate_device.c b/mm/migrate_device.c >> index 27fb37d..6a5ef9f 100644 >> --- a/mm/migrate_device.c >> +++ b/mm/migrate_device.c >> @@ -254,13 +254,14 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, >> migrate->dst[migrate->npages] = 0; >> migrate->src[migrate->npages++] = mpfn; >> } >> - arch_leave_lazy_mmu_mode(); >> - pte_unmap_unlock(ptep - 1, ptl); >> >> /* Only flush the TLB if we actually modified any entries */ >> if (unmapped) >> flush_tlb_range(walk->vma, start, end); >> >> + arch_leave_lazy_mmu_mode(); >> + pte_unmap_unlock(ptep - 1, ptl); >> + >> return 0; >> } >> >> >> base-commit: ffcf9c5700e49c0aee42dcba9a12ba21338e8136 > > I'm not a TLB-flushing expert, but this matches my understanding (and a > TLB flushing Linux documentation I stumbled over some while ago but > cannot quickly find). > > In the ordinary try_to_migrate_one() path, flushing would happen via > ptep_clear_flush() (just like we do for the anon_exclusive case here as > well), correct? Correct.
On 24.08.22 14:26, Alistair Popple wrote: > > David Hildenbrand <david@redhat.com> writes: > >> On 24.08.22 05:03, Alistair Popple wrote: >>> When clearing a PTE the TLB should be flushed whilst still holding the >>> PTL to avoid a potential race with madvise/munmap/etc. For example >>> consider the following sequence: >>> >>> CPU0 CPU1 >>> ---- ---- >>> >>> migrate_vma_collect_pmd() >>> pte_unmap_unlock() >>> madvise(MADV_DONTNEED) >>> -> zap_pte_range() >>> pte_offset_map_lock() >>> [ PTE not present, TLB not flushed ] >>> pte_unmap_unlock() >>> [ page is still accessible via stale TLB ] >>> flush_tlb_range() >>> >>> In this case the page may still be accessed via the stale TLB entry >>> after madvise returns. Fix this by flushing the TLB while holding the >>> PTL. >>> >>> Signed-off-by: Alistair Popple <apopple@nvidia.com> >>> Reported-by: Nadav Amit <nadav.amit@gmail.com> >>> Fixes: 8c3328f1f36a ("mm/migrate: migrate_vma() unmap page from vma while collecting pages") >>> Cc: stable@vger.kernel.org >>> >>> --- >>> >>> Changes for v3: >>> >>> - New for v3 >>> --- >>> mm/migrate_device.c | 5 +++-- >>> 1 file changed, 3 insertions(+), 2 deletions(-) >>> >>> diff --git a/mm/migrate_device.c b/mm/migrate_device.c >>> index 27fb37d..6a5ef9f 100644 >>> --- a/mm/migrate_device.c >>> +++ b/mm/migrate_device.c >>> @@ -254,13 +254,14 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, >>> migrate->dst[migrate->npages] = 0; >>> migrate->src[migrate->npages++] = mpfn; >>> } >>> - arch_leave_lazy_mmu_mode(); >>> - pte_unmap_unlock(ptep - 1, ptl); >>> >>> /* Only flush the TLB if we actually modified any entries */ >>> if (unmapped) >>> flush_tlb_range(walk->vma, start, end); >>> >>> + arch_leave_lazy_mmu_mode(); >>> + pte_unmap_unlock(ptep - 1, ptl); >>> + >>> return 0; >>> } >>> >>> >>> base-commit: ffcf9c5700e49c0aee42dcba9a12ba21338e8136 >> >> I'm not a TLB-flushing expert, but this matches my understanding (and a >> TLB flushing Linux documentation I stumbled over some while ago but >> cannot quickly find). >> >> In the ordinary try_to_migrate_one() path, flushing would happen via >> ptep_clear_flush() (just like we do for the anon_exclusive case here as >> well), correct? > > Correct. > Acked-by: David Hildenbrand <david@redhat.com>
Alistair Popple <apopple@nvidia.com> writes: > When clearing a PTE the TLB should be flushed whilst still holding the > PTL to avoid a potential race with madvise/munmap/etc. For example > consider the following sequence: > > CPU0 CPU1 > ---- ---- > > migrate_vma_collect_pmd() > pte_unmap_unlock() > madvise(MADV_DONTNEED) > -> zap_pte_range() > pte_offset_map_lock() > [ PTE not present, TLB not flushed ] > pte_unmap_unlock() > [ page is still accessible via stale TLB ] > flush_tlb_range() > > In this case the page may still be accessed via the stale TLB entry > after madvise returns. Fix this by flushing the TLB while holding the > PTL. > > Signed-off-by: Alistair Popple <apopple@nvidia.com> > Reported-by: Nadav Amit <nadav.amit@gmail.com> > Fixes: 8c3328f1f36a ("mm/migrate: migrate_vma() unmap page from vma while collecting pages") > Cc: stable@vger.kernel.org > > --- > > Changes for v3: > > - New for v3 > --- > mm/migrate_device.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/mm/migrate_device.c b/mm/migrate_device.c > index 27fb37d..6a5ef9f 100644 > --- a/mm/migrate_device.c > +++ b/mm/migrate_device.c > @@ -254,13 +254,14 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, > migrate->dst[migrate->npages] = 0; > migrate->src[migrate->npages++] = mpfn; > } > - arch_leave_lazy_mmu_mode(); > - pte_unmap_unlock(ptep - 1, ptl); > > /* Only flush the TLB if we actually modified any entries */ > if (unmapped) > flush_tlb_range(walk->vma, start, end); It appears that we can increase "unmapped" only if ptep_get_and_clear() is used? Best Regards, Huang, Ying > + arch_leave_lazy_mmu_mode(); > + pte_unmap_unlock(ptep - 1, ptl); > + > return 0; > } > > > base-commit: ffcf9c5700e49c0aee42dcba9a12ba21338e8136
"Huang, Ying" <ying.huang@intel.com> writes: > Alistair Popple <apopple@nvidia.com> writes: > >> When clearing a PTE the TLB should be flushed whilst still holding the >> PTL to avoid a potential race with madvise/munmap/etc. For example >> consider the following sequence: >> >> CPU0 CPU1 >> ---- ---- >> >> migrate_vma_collect_pmd() >> pte_unmap_unlock() >> madvise(MADV_DONTNEED) >> -> zap_pte_range() >> pte_offset_map_lock() >> [ PTE not present, TLB not flushed ] >> pte_unmap_unlock() >> [ page is still accessible via stale TLB ] >> flush_tlb_range() >> >> In this case the page may still be accessed via the stale TLB entry >> after madvise returns. Fix this by flushing the TLB while holding the >> PTL. >> >> Signed-off-by: Alistair Popple <apopple@nvidia.com> >> Reported-by: Nadav Amit <nadav.amit@gmail.com> >> Fixes: 8c3328f1f36a ("mm/migrate: migrate_vma() unmap page from vma while collecting pages") >> Cc: stable@vger.kernel.org >> >> --- >> >> Changes for v3: >> >> - New for v3 >> --- >> mm/migrate_device.c | 5 +++-- >> 1 file changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/mm/migrate_device.c b/mm/migrate_device.c >> index 27fb37d..6a5ef9f 100644 >> --- a/mm/migrate_device.c >> +++ b/mm/migrate_device.c >> @@ -254,13 +254,14 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, >> migrate->dst[migrate->npages] = 0; >> migrate->src[migrate->npages++] = mpfn; >> } >> - arch_leave_lazy_mmu_mode(); >> - pte_unmap_unlock(ptep - 1, ptl); >> >> /* Only flush the TLB if we actually modified any entries */ >> if (unmapped) >> flush_tlb_range(walk->vma, start, end); > > It appears that we can increase "unmapped" only if ptep_get_and_clear() > is used? In other words you mean we only need to increase unmapped if pte_present && !anon_exclusive? Agree, that's a good optimisation to make. However I'm just trying to solve a data corruption issue (not dirtying the page) here, so will post that as a separate optimisation patch. Thanks. - Alistair > Best Regards, > Huang, Ying > >> + arch_leave_lazy_mmu_mode(); >> + pte_unmap_unlock(ptep - 1, ptl); >> + >> return 0; >> } >> >> >> base-commit: ffcf9c5700e49c0aee42dcba9a12ba21338e8136
Alistair Popple <apopple@nvidia.com> writes: > "Huang, Ying" <ying.huang@intel.com> writes: > >> Alistair Popple <apopple@nvidia.com> writes: >> >>> When clearing a PTE the TLB should be flushed whilst still holding the >>> PTL to avoid a potential race with madvise/munmap/etc. For example >>> consider the following sequence: >>> >>> CPU0 CPU1 >>> ---- ---- >>> >>> migrate_vma_collect_pmd() >>> pte_unmap_unlock() >>> madvise(MADV_DONTNEED) >>> -> zap_pte_range() >>> pte_offset_map_lock() >>> [ PTE not present, TLB not flushed ] >>> pte_unmap_unlock() >>> [ page is still accessible via stale TLB ] >>> flush_tlb_range() >>> >>> In this case the page may still be accessed via the stale TLB entry >>> after madvise returns. Fix this by flushing the TLB while holding the >>> PTL. >>> >>> Signed-off-by: Alistair Popple <apopple@nvidia.com> >>> Reported-by: Nadav Amit <nadav.amit@gmail.com> >>> Fixes: 8c3328f1f36a ("mm/migrate: migrate_vma() unmap page from vma while collecting pages") >>> Cc: stable@vger.kernel.org >>> >>> --- >>> >>> Changes for v3: >>> >>> - New for v3 >>> --- >>> mm/migrate_device.c | 5 +++-- >>> 1 file changed, 3 insertions(+), 2 deletions(-) >>> >>> diff --git a/mm/migrate_device.c b/mm/migrate_device.c >>> index 27fb37d..6a5ef9f 100644 >>> --- a/mm/migrate_device.c >>> +++ b/mm/migrate_device.c >>> @@ -254,13 +254,14 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, >>> migrate->dst[migrate->npages] = 0; >>> migrate->src[migrate->npages++] = mpfn; >>> } >>> - arch_leave_lazy_mmu_mode(); >>> - pte_unmap_unlock(ptep - 1, ptl); >>> >>> /* Only flush the TLB if we actually modified any entries */ >>> if (unmapped) >>> flush_tlb_range(walk->vma, start, end); >> >> It appears that we can increase "unmapped" only if ptep_get_and_clear() >> is used? > > In other words you mean we only need to increase unmapped if pte_present > && !anon_exclusive? > > Agree, that's a good optimisation to make. However I'm just trying to > solve a data corruption issue (not dirtying the page) here, so will post > that as a separate optimisation patch. Thanks. OK. Then the patch looks good to me. Feel free to add my, Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Best Regards, Huang, Ying >> >>> + arch_leave_lazy_mmu_mode(); >>> + pte_unmap_unlock(ptep - 1, ptl); >>> + >>> return 0; >>> } >>> >>> >>> base-commit: ffcf9c5700e49c0aee42dcba9a12ba21338e8136
diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 27fb37d..6a5ef9f 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -254,13 +254,14 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, migrate->dst[migrate->npages] = 0; migrate->src[migrate->npages++] = mpfn; } - arch_leave_lazy_mmu_mode(); - pte_unmap_unlock(ptep - 1, ptl); /* Only flush the TLB if we actually modified any entries */ if (unmapped) flush_tlb_range(walk->vma, start, end); + arch_leave_lazy_mmu_mode(); + pte_unmap_unlock(ptep - 1, ptl); + return 0; }
When clearing a PTE the TLB should be flushed whilst still holding the PTL to avoid a potential race with madvise/munmap/etc. For example consider the following sequence: CPU0 CPU1 ---- ---- migrate_vma_collect_pmd() pte_unmap_unlock() madvise(MADV_DONTNEED) -> zap_pte_range() pte_offset_map_lock() [ PTE not present, TLB not flushed ] pte_unmap_unlock() [ page is still accessible via stale TLB ] flush_tlb_range() In this case the page may still be accessed via the stale TLB entry after madvise returns. Fix this by flushing the TLB while holding the PTL. Signed-off-by: Alistair Popple <apopple@nvidia.com> Reported-by: Nadav Amit <nadav.amit@gmail.com> Fixes: 8c3328f1f36a ("mm/migrate: migrate_vma() unmap page from vma while collecting pages") Cc: stable@vger.kernel.org --- Changes for v3: - New for v3 --- mm/migrate_device.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) base-commit: ffcf9c5700e49c0aee42dcba9a12ba21338e8136