Message ID | f5e3b77c5a4c646e000ffadbf6c3db0531a01795.1650810915.git.baolin.wang@linux.alibaba.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Fix cache flush issues considering PMD sharing | expand |
On 4/24/22 07:50, Baolin Wang wrote: > The cache level flush will always be first when changing an existing > virtual–>physical mapping to a new value, since this allows us to > properly handle systems whose caches are strict and require a > virtual–>physical translation to exist for a virtual address. So we > should move the cache flushing before huge_pmd_unshare(). > > As Muchun pointed out[1], now the architectures whose supporting hugetlb > PMD sharing have no cache flush issues in practice. But I think we > should still follow the cache/TLB flushing rules when changing a valid > virtual address mapping in case of potential issues in future. > > [1] https://lore.kernel.org/all/YmT%2F%2FhuUbFX+KHcy@FVFYT0MHHV2J.usts.net/ > Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> > --- > mm/rmap.c | 40 ++++++++++++++++++++++------------------ > 1 file changed, 22 insertions(+), 18 deletions(-) > > diff --git a/mm/rmap.c b/mm/rmap.c > index 61e63db..81872bb 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -1535,15 +1535,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, > * do this outside rmap routines. > */ > VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); > + /* > + * huge_pmd_unshare unmapped an entire PMD page. Perhaps update this comment to say that huge_pmd_unshare 'may' unmap an entire PMD page? > + * There is no way of knowing exactly which PMDs may > + * be cached for this mm, so we must flush them all. > + * start/end were already adjusted above to cover this > + * range. > + */ > + flush_cache_range(vma, range.start, range.end); > + > if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) { > - /* > - * huge_pmd_unshare unmapped an entire PMD > - * page. There is no way of knowing exactly > - * which PMDs may be cached for this mm, so > - * we must flush them all. start/end were > - * already adjusted above to cover this range. > - */ > - flush_cache_range(vma, range.start, range.end); > flush_tlb_range(vma, range.start, range.end); > mmu_notifier_invalidate_range(mm, range.start, > range.end); > @@ -1560,13 +1561,14 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, > page_vma_mapped_walk_done(&pvmw); > break; > } > + } else { > + flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); I know this call to flush_cache_page() existed before your change. But, when looking at this now I wonder how hugetlb pages are handled? Are there any versions of flush_cache_page() that take page size into account?
On 4/26/2022 8:20 AM, Mike Kravetz wrote: > On 4/24/22 07:50, Baolin Wang wrote: >> The cache level flush will always be first when changing an existing >> virtual–>physical mapping to a new value, since this allows us to >> properly handle systems whose caches are strict and require a >> virtual–>physical translation to exist for a virtual address. So we >> should move the cache flushing before huge_pmd_unshare(). >> >> As Muchun pointed out[1], now the architectures whose supporting hugetlb >> PMD sharing have no cache flush issues in practice. But I think we >> should still follow the cache/TLB flushing rules when changing a valid >> virtual address mapping in case of potential issues in future. >> >> [1] https://lore.kernel.org/all/YmT%2F%2FhuUbFX+KHcy@FVFYT0MHHV2J.usts.net/ >> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> >> --- >> mm/rmap.c | 40 ++++++++++++++++++++++------------------ >> 1 file changed, 22 insertions(+), 18 deletions(-) >> >> diff --git a/mm/rmap.c b/mm/rmap.c >> index 61e63db..81872bb 100644 >> --- a/mm/rmap.c >> +++ b/mm/rmap.c >> @@ -1535,15 +1535,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, >> * do this outside rmap routines. >> */ >> VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); >> + /* >> + * huge_pmd_unshare unmapped an entire PMD page. > > Perhaps update this comment to say that huge_pmd_unshare 'may' unmap > an entire PMD page? Sure, will do. > >> + * There is no way of knowing exactly which PMDs may >> + * be cached for this mm, so we must flush them all. >> + * start/end were already adjusted above to cover this >> + * range. >> + */ >> + flush_cache_range(vma, range.start, range.end); >> + >> if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) { >> - /* >> - * huge_pmd_unshare unmapped an entire PMD >> - * page. There is no way of knowing exactly >> - * which PMDs may be cached for this mm, so >> - * we must flush them all. start/end were >> - * already adjusted above to cover this range. >> - */ >> - flush_cache_range(vma, range.start, range.end); >> flush_tlb_range(vma, range.start, range.end); >> mmu_notifier_invalidate_range(mm, range.start, >> range.end); >> @@ -1560,13 +1561,14 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, >> page_vma_mapped_walk_done(&pvmw); >> break; >> } >> + } else { >> + flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); > > I know this call to flush_cache_page() existed before your change. But, when > looking at this now I wonder how hugetlb pages are handled? Are there any > versions of flush_cache_page() that take page size into account? Thanks for reminding. I checked the flush_cache_page() implementation on some architectures (like arm32), they did not consider the hugetlb pages, so I think we may miss flushing the whole cache for hguetlb pages on some architectures. With this patch, we can mitigate this issue, since we change to use flush_cache_range() to cover the possible range to flush cache for hugetlb pages. Bur for anon hugetlb pages, we should also convert to use flush_cache_range() instead. I think we can do this conversion in a separate patch set with checking all the places, where using flush_cache_page() to flush cache for hugetlb pages. How do you think?
On 4/25/22 23:26, Baolin Wang wrote: > > > On 4/26/2022 8:20 AM, Mike Kravetz wrote: >> On 4/24/22 07:50, Baolin Wang wrote: >>> The cache level flush will always be first when changing an existing >>> virtual–>physical mapping to a new value, since this allows us to >>> properly handle systems whose caches are strict and require a >>> virtual–>physical translation to exist for a virtual address. So we >>> should move the cache flushing before huge_pmd_unshare(). >>> >>> As Muchun pointed out[1], now the architectures whose supporting hugetlb >>> PMD sharing have no cache flush issues in practice. But I think we >>> should still follow the cache/TLB flushing rules when changing a valid >>> virtual address mapping in case of potential issues in future. >>> >>> [1] https://lore.kernel.org/all/YmT%2F%2FhuUbFX+KHcy@FVFYT0MHHV2J.usts.net/ >>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> >>> --- >>> mm/rmap.c | 40 ++++++++++++++++++++++------------------ >>> 1 file changed, 22 insertions(+), 18 deletions(-) >>> >>> diff --git a/mm/rmap.c b/mm/rmap.c >>> index 61e63db..81872bb 100644 >>> --- a/mm/rmap.c >>> +++ b/mm/rmap.c >>> @@ -1535,15 +1535,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, >>> * do this outside rmap routines. >>> */ >>> VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); >>> + /* >>> + * huge_pmd_unshare unmapped an entire PMD page. >> >> Perhaps update this comment to say that huge_pmd_unshare 'may' unmap >> an entire PMD page? > > Sure, will do. > >> >>> + * There is no way of knowing exactly which PMDs may >>> + * be cached for this mm, so we must flush them all. >>> + * start/end were already adjusted above to cover this >>> + * range. >>> + */ >>> + flush_cache_range(vma, range.start, range.end); >>> + >>> if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) { >>> - /* >>> - * huge_pmd_unshare unmapped an entire PMD >>> - * page. There is no way of knowing exactly >>> - * which PMDs may be cached for this mm, so >>> - * we must flush them all. start/end were >>> - * already adjusted above to cover this range. >>> - */ >>> - flush_cache_range(vma, range.start, range.end); >>> flush_tlb_range(vma, range.start, range.end); >>> mmu_notifier_invalidate_range(mm, range.start, >>> range.end); >>> @@ -1560,13 +1561,14 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, >>> page_vma_mapped_walk_done(&pvmw); >>> break; >>> } >>> + } else { >>> + flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); >> >> I know this call to flush_cache_page() existed before your change. But, when >> looking at this now I wonder how hugetlb pages are handled? Are there any >> versions of flush_cache_page() that take page size into account? > > Thanks for reminding. I checked the flush_cache_page() implementation on some architectures (like arm32), they did not consider the hugetlb pages, so I think we may miss flushing the whole cache for hguetlb pages on some architectures. > > With this patch, we can mitigate this issue, since we change to use flush_cache_range() to cover the possible range to flush cache for hugetlb pages. Bur for anon hugetlb pages, we should also convert to use > flush_cache_range() instead. I think we can do this conversion in a separate patch set with checking all the places, where using flush_cache_page() to flush cache for hugetlb pages. How do you think? Yes, I am OK with that approach.
diff --git a/mm/rmap.c b/mm/rmap.c index 61e63db..81872bb 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1535,15 +1535,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, * do this outside rmap routines. */ VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); + /* + * huge_pmd_unshare unmapped an entire PMD page. + * There is no way of knowing exactly which PMDs may + * be cached for this mm, so we must flush them all. + * start/end were already adjusted above to cover this + * range. + */ + flush_cache_range(vma, range.start, range.end); + if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) { - /* - * huge_pmd_unshare unmapped an entire PMD - * page. There is no way of knowing exactly - * which PMDs may be cached for this mm, so - * we must flush them all. start/end were - * already adjusted above to cover this range. - */ - flush_cache_range(vma, range.start, range.end); flush_tlb_range(vma, range.start, range.end); mmu_notifier_invalidate_range(mm, range.start, range.end); @@ -1560,13 +1561,14 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, page_vma_mapped_walk_done(&pvmw); break; } + } else { + flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); } /* * Nuke the page table entry. When having to clear * PageAnonExclusive(), we always have to flush. */ - flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); if (should_defer_flush(mm, flags) && !anon_exclusive) { /* * We clear the PTE but do not flush so potentially @@ -1890,15 +1892,16 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, * do this outside rmap routines. */ VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); + /* + * huge_pmd_unshare unmapped an entire PMD page. + * There is no way of knowing exactly which PMDs may + * be cached for this mm, so we must flush them all. + * start/end were already adjusted above to cover this + * range. + */ + flush_cache_range(vma, range.start, range.end); + if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) { - /* - * huge_pmd_unshare unmapped an entire PMD - * page. There is no way of knowing exactly - * which PMDs may be cached for this mm, so - * we must flush them all. start/end were - * already adjusted above to cover this range. - */ - flush_cache_range(vma, range.start, range.end); flush_tlb_range(vma, range.start, range.end); mmu_notifier_invalidate_range(mm, range.start, range.end); @@ -1915,10 +1918,11 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, page_vma_mapped_walk_done(&pvmw); break; } + } else { + flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); } /* Nuke the page table entry. */ - flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); pteval = ptep_clear_flush(vma, address, pvmw.pte); /* Set the dirty flag on the folio now the pte is gone. */
The cache level flush will always be first when changing an existing virtual–>physical mapping to a new value, since this allows us to properly handle systems whose caches are strict and require a virtual–>physical translation to exist for a virtual address. So we should move the cache flushing before huge_pmd_unshare(). As Muchun pointed out[1], now the architectures whose supporting hugetlb PMD sharing have no cache flush issues in practice. But I think we should still follow the cache/TLB flushing rules when changing a valid virtual address mapping in case of potential issues in future. [1] https://lore.kernel.org/all/YmT%2F%2FhuUbFX+KHcy@FVFYT0MHHV2J.usts.net/ Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> --- mm/rmap.c | 40 ++++++++++++++++++++++------------------ 1 file changed, 22 insertions(+), 18 deletions(-)