Message ID | 4f7ae6dfdc838ab71e1655188b657c032ff1f28f.1651056365.git.baolin.wang@linux.alibaba.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Fix cache flush issues considering PMD sharing | expand |
On Wed, Apr 27, 2022 at 06:52:06PM +0800, Baolin Wang wrote: > The cache level flush will always be first when changing an existing > virtual–>physical mapping to a new value, since this allows us to > properly handle systems whose caches are strict and require a > virtual–>physical translation to exist for a virtual address. So we > should move the cache flushing before huge_pmd_unshare(). > Right. > As Muchun pointed out[1], now the architectures whose supporting hugetlb > PMD sharing have no cache flush issues in practice. But I think we > should still follow the cache/TLB flushing rules when changing a valid > virtual address mapping in case of potential issues in future. Right. One point i need to clarify. I do not object this change but want you to clarify this (not an issue in practice) in commit log to let others know they do not need to bp this. > > [1] https://lore.kernel.org/all/YmT%2F%2FhuUbFX+KHcy@FVFYT0MHHV2J.usts.net/ > Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> > --- > mm/rmap.c | 40 ++++++++++++++++++++++------------------ > 1 file changed, 22 insertions(+), 18 deletions(-) > > diff --git a/mm/rmap.c b/mm/rmap.c > index 61e63db..4f0d115 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -1535,15 +1535,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, > * do this outside rmap routines. > */ > VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); > + /* > + * huge_pmd_unshare may unmap an entire PMD page. > + * There is no way of knowing exactly which PMDs may > + * be cached for this mm, so we must flush them all. > + * start/end were already adjusted above to cover this > + * range. > + */ > + flush_cache_range(vma, range.start, range.end); > + flush_cache_range() is always called even if we do not need to flush. How about introducing a new helper like hugetlb_pmd_shared() which returns true for shared PMD? Then: if (hugetlb_pmd_shared(mm, vma, pvmw.pte)) { flush_cache_range(vma, range.start, range.end); huge_pmd_unshare(mm, vma, &address, pvmw.pte); flush_tlb_range(vma, range.start, range.end); } The code could be a little simpler. Right? Thanks.
On 4/28/2022 1:55 PM, Muchun Song wrote: > On Wed, Apr 27, 2022 at 06:52:06PM +0800, Baolin Wang wrote: >> The cache level flush will always be first when changing an existing >> virtual–>physical mapping to a new value, since this allows us to >> properly handle systems whose caches are strict and require a >> virtual–>physical translation to exist for a virtual address. So we >> should move the cache flushing before huge_pmd_unshare(). >> > > Right. > >> As Muchun pointed out[1], now the architectures whose supporting hugetlb >> PMD sharing have no cache flush issues in practice. But I think we >> should still follow the cache/TLB flushing rules when changing a valid >> virtual address mapping in case of potential issues in future. > > Right. One point i need to clarify. I do not object this change but > want you to clarify this (not an issue in practice) in commit log > to let others know they do not need to bp this. > >> >> [1] https://lore.kernel.org/all/YmT%2F%2FhuUbFX+KHcy@FVFYT0MHHV2J.usts.net/ >> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> >> --- >> mm/rmap.c | 40 ++++++++++++++++++++++------------------ >> 1 file changed, 22 insertions(+), 18 deletions(-) >> >> diff --git a/mm/rmap.c b/mm/rmap.c >> index 61e63db..4f0d115 100644 >> --- a/mm/rmap.c >> +++ b/mm/rmap.c >> @@ -1535,15 +1535,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, >> * do this outside rmap routines. >> */ >> VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); >> + /* >> + * huge_pmd_unshare may unmap an entire PMD page. >> + * There is no way of knowing exactly which PMDs may >> + * be cached for this mm, so we must flush them all. >> + * start/end were already adjusted above to cover this >> + * range. >> + */ >> + flush_cache_range(vma, range.start, range.end); >> + > > flush_cache_range() is always called even if we do not need to flush. Right, this is intended. In the original code, if it is not a shared PMD, we will use flush_cache_page() to do cache flushing. However the flush_cache_page() can not cover the whole size of a hugetlb page on some architectures, which is fixed by patch 3. > How about introducing a new helper like hugetlb_pmd_shared() which > returns true for shared PMD? Then: > > if (hugetlb_pmd_shared(mm, vma, pvmw.pte)) { > flush_cache_range(vma, range.start, range.end); > huge_pmd_unshare(mm, vma, &address, pvmw.pte); > flush_tlb_range(vma, range.start, range.end); > } > > The code could be a little simpler. Right? IMHO after patch 3, the code will be changed as below, so seems no need to separate the validation of the shared PMDs from huge_pmd_unshare() into a new function. if (folio_test_hugetlb(folio)) { flush_cache_range(vma, range.start, range.end); if (!folio_test_anon(folio)) { VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); if (hugetlb_pmd_shared(mm, vma, pvmw.pte)) { huge_pmd_unshare(mm, vma, &address, pvmw.pte)); flush_tlb_range(vma, range.start, range.end); ...... break; } } } else { ...... }
On 4/27/22 22:55, Muchun Song wrote: > On Wed, Apr 27, 2022 at 06:52:06PM +0800, Baolin Wang wrote: >> The cache level flush will always be first when changing an existing >> virtual–>physical mapping to a new value, since this allows us to >> properly handle systems whose caches are strict and require a >> virtual–>physical translation to exist for a virtual address. So we >> should move the cache flushing before huge_pmd_unshare(). >> > > Right. > >> As Muchun pointed out[1], now the architectures whose supporting hugetlb >> PMD sharing have no cache flush issues in practice. But I think we >> should still follow the cache/TLB flushing rules when changing a valid >> virtual address mapping in case of potential issues in future. > > Right. One point i need to clarify. I do not object this change but > want you to clarify this (not an issue in practice) in commit log > to let others know they do not need to bp this. > >> >> [1] https://lore.kernel.org/all/YmT%2F%2FhuUbFX+KHcy@FVFYT0MHHV2J.usts.net/ >> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> >> --- >> mm/rmap.c | 40 ++++++++++++++++++++++------------------ >> 1 file changed, 22 insertions(+), 18 deletions(-) >> >> diff --git a/mm/rmap.c b/mm/rmap.c >> index 61e63db..4f0d115 100644 >> --- a/mm/rmap.c >> +++ b/mm/rmap.c >> @@ -1535,15 +1535,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, >> * do this outside rmap routines. >> */ >> VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); >> + /* >> + * huge_pmd_unshare may unmap an entire PMD page. >> + * There is no way of knowing exactly which PMDs may >> + * be cached for this mm, so we must flush them all. >> + * start/end were already adjusted above to cover this >> + * range. >> + */ >> + flush_cache_range(vma, range.start, range.end); >> + > > flush_cache_range() is always called even if we do not need to flush. > How about introducing a new helper like hugetlb_pmd_shared() which > returns true for shared PMD? Then: > > if (hugetlb_pmd_shared(mm, vma, pvmw.pte)) { > flush_cache_range(vma, range.start, range.end); > huge_pmd_unshare(mm, vma, &address, pvmw.pte); > flush_tlb_range(vma, range.start, range.end); > } > > The code could be a little simpler. Right? > > Thanks. > I thought about adding a 'hugetlb_pmd_shared()' interface for another use. I believe it could even be used earlier in this call sequence. Since we hold i_mmap_rwsem, we would even test for shared BEFORE calling adjust_range_if_pmd_sharing_possible. We can not make an authoritative test in adjust range... because not all callers will be holding i_mmap_rwsem. I think we COULD optimize to minimize the flush range. However, I think that would complicate this code even more, and it is difficult enough to follow. My preference would be to over flush as is done here for correctness and simplification. We can optimize later if desired. With Muchun's comment that this is not an issue in practice today, Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
On 5/4/2022 2:42 AM, Mike Kravetz wrote: > On 4/27/22 22:55, Muchun Song wrote: >> On Wed, Apr 27, 2022 at 06:52:06PM +0800, Baolin Wang wrote: >>> The cache level flush will always be first when changing an existing >>> virtual–>physical mapping to a new value, since this allows us to >>> properly handle systems whose caches are strict and require a >>> virtual–>physical translation to exist for a virtual address. So we >>> should move the cache flushing before huge_pmd_unshare(). >>> >> >> Right. >> >>> As Muchun pointed out[1], now the architectures whose supporting hugetlb >>> PMD sharing have no cache flush issues in practice. But I think we >>> should still follow the cache/TLB flushing rules when changing a valid >>> virtual address mapping in case of potential issues in future. >> >> Right. One point i need to clarify. I do not object this change but >> want you to clarify this (not an issue in practice) in commit log >> to let others know they do not need to bp this. >> >>> >>> [1] https://lore.kernel.org/all/YmT%2F%2FhuUbFX+KHcy@FVFYT0MHHV2J.usts.net/ >>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> >>> --- >>> mm/rmap.c | 40 ++++++++++++++++++++++------------------ >>> 1 file changed, 22 insertions(+), 18 deletions(-) >>> >>> diff --git a/mm/rmap.c b/mm/rmap.c >>> index 61e63db..4f0d115 100644 >>> --- a/mm/rmap.c >>> +++ b/mm/rmap.c >>> @@ -1535,15 +1535,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, >>> * do this outside rmap routines. >>> */ >>> VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); >>> + /* >>> + * huge_pmd_unshare may unmap an entire PMD page. >>> + * There is no way of knowing exactly which PMDs may >>> + * be cached for this mm, so we must flush them all. >>> + * start/end were already adjusted above to cover this >>> + * range. >>> + */ >>> + flush_cache_range(vma, range.start, range.end); >>> + >> >> flush_cache_range() is always called even if we do not need to flush. >> How about introducing a new helper like hugetlb_pmd_shared() which >> returns true for shared PMD? Then: >> >> if (hugetlb_pmd_shared(mm, vma, pvmw.pte)) { >> flush_cache_range(vma, range.start, range.end); >> huge_pmd_unshare(mm, vma, &address, pvmw.pte); >> flush_tlb_range(vma, range.start, range.end); >> } >> >> The code could be a little simpler. Right? >> >> Thanks. >> > > I thought about adding a 'hugetlb_pmd_shared()' interface for another use. > I believe it could even be used earlier in this call sequence. Since we > hold i_mmap_rwsem, we would even test for shared BEFORE calling > adjust_range_if_pmd_sharing_possible. We can not make an authoritative test > in adjust range... because not all callers will be holding i_mmap_rwsem. > > I think we COULD optimize to minimize the flush range. However, I think > that would complicate this code even more, and it is difficult enough to > follow. > > My preference would be to over flush as is done here for correctness and > simplification. We can optimize later if desired. OK. Agree. > > With Muchun's comment that this is not an issue in practice today, > Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Thanks.
diff --git a/mm/rmap.c b/mm/rmap.c index 61e63db..4f0d115 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1535,15 +1535,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, * do this outside rmap routines. */ VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); + /* + * huge_pmd_unshare may unmap an entire PMD page. + * There is no way of knowing exactly which PMDs may + * be cached for this mm, so we must flush them all. + * start/end were already adjusted above to cover this + * range. + */ + flush_cache_range(vma, range.start, range.end); + if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) { - /* - * huge_pmd_unshare unmapped an entire PMD - * page. There is no way of knowing exactly - * which PMDs may be cached for this mm, so - * we must flush them all. start/end were - * already adjusted above to cover this range. - */ - flush_cache_range(vma, range.start, range.end); flush_tlb_range(vma, range.start, range.end); mmu_notifier_invalidate_range(mm, range.start, range.end); @@ -1560,13 +1561,14 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, page_vma_mapped_walk_done(&pvmw); break; } + } else { + flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); } /* * Nuke the page table entry. When having to clear * PageAnonExclusive(), we always have to flush. */ - flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); if (should_defer_flush(mm, flags) && !anon_exclusive) { /* * We clear the PTE but do not flush so potentially @@ -1890,15 +1892,16 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, * do this outside rmap routines. */ VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); + /* + * huge_pmd_unshare may unmap an entire PMD page. + * There is no way of knowing exactly which PMDs may + * be cached for this mm, so we must flush them all. + * start/end were already adjusted above to cover this + * range. + */ + flush_cache_range(vma, range.start, range.end); + if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) { - /* - * huge_pmd_unshare unmapped an entire PMD - * page. There is no way of knowing exactly - * which PMDs may be cached for this mm, so - * we must flush them all. start/end were - * already adjusted above to cover this range. - */ - flush_cache_range(vma, range.start, range.end); flush_tlb_range(vma, range.start, range.end); mmu_notifier_invalidate_range(mm, range.start, range.end); @@ -1915,10 +1918,11 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, page_vma_mapped_walk_done(&pvmw); break; } + } else { + flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); } /* Nuke the page table entry. */ - flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); pteval = ptep_clear_flush(vma, address, pvmw.pte); /* Set the dirty flag on the folio now the pte is gone. */
The cache level flush will always be first when changing an existing virtual–>physical mapping to a new value, since this allows us to properly handle systems whose caches are strict and require a virtual–>physical translation to exist for a virtual address. So we should move the cache flushing before huge_pmd_unshare(). As Muchun pointed out[1], now the architectures whose supporting hugetlb PMD sharing have no cache flush issues in practice. But I think we should still follow the cache/TLB flushing rules when changing a valid virtual address mapping in case of potential issues in future. [1] https://lore.kernel.org/all/YmT%2F%2FhuUbFX+KHcy@FVFYT0MHHV2J.usts.net/ Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> --- mm/rmap.c | 40 ++++++++++++++++++++++------------------ 1 file changed, 22 insertions(+), 18 deletions(-)