Message ID | 20240613000721.23093-3-21cnbao@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: clarify folio_add_new_anon_rmap() and __folio_add_anon_rmap() | expand |
On 13.06.24 02:07, Barry Song wrote: > From: Barry Song <v-songbaohua@oppo.com> > > For the !folio_test_anon(folio) case, we can now invoke folio_add_new_anon_rmap() > with the rmap flags set to either EXCLUSIVE or non-EXCLUSIVE. This action will > suppress the VM_WARN_ON_FOLIO check within __folio_add_anon_rmap() while initiating > the process of bringing up mTHP swapin. > > static __always_inline void __folio_add_anon_rmap(struct folio *folio, > struct page *page, int nr_pages, struct vm_area_struct *vma, > unsigned long address, rmap_t flags, enum rmap_level level) > { > ... > if (unlikely(!folio_test_anon(folio))) { > VM_WARN_ON_FOLIO(folio_test_large(folio) && > level != RMAP_LEVEL_PMD, folio); > } > ... > } > > It also enhances the code’s readability. > > Suggested-by: David Hildenbrand <david@redhat.com> > Signed-off-by: Barry Song <v-songbaohua@oppo.com> > --- > mm/memory.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/mm/memory.c b/mm/memory.c > index 2f94921091fb..9c962f62f928 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -4339,6 +4339,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > if (unlikely(folio != swapcache && swapcache)) { > folio_add_new_anon_rmap(folio, vma, address, RMAP_EXCLUSIVE); > folio_add_lru_vma(folio, vma); > + } else if (!folio_test_anon(folio)) { > + folio_add_new_anon_rmap(folio, vma, address, rmap_flags); So, with large folio swapin, we would never end up here if any swp PTE is !exclusive, because we would make sure to allocate a large folio only for suitable regions, correct? It can certainly happen during swapout + refault with large folios. But there, we will always have folio_test_anon() still set and wouldn't run into this code path. (it will all be better with per-folio PAE bit, but that will take some more time until I actually get to implement it properly, handling all the nasty corner cases) Better add a comment regarding why we are sure that the complete thing is exclusive (e.g., currently only small folios). > } else { > folio_add_anon_rmap_ptes(folio, page, nr_pages, vma, address, > rmap_flags);
On Thu, Jun 13, 2024 at 9:23 PM David Hildenbrand <david@redhat.com> wrote: > > On 13.06.24 02:07, Barry Song wrote: > > From: Barry Song <v-songbaohua@oppo.com> > > > > For the !folio_test_anon(folio) case, we can now invoke folio_add_new_anon_rmap() > > with the rmap flags set to either EXCLUSIVE or non-EXCLUSIVE. This action will > > suppress the VM_WARN_ON_FOLIO check within __folio_add_anon_rmap() while initiating > > the process of bringing up mTHP swapin. > > > > static __always_inline void __folio_add_anon_rmap(struct folio *folio, > > struct page *page, int nr_pages, struct vm_area_struct *vma, > > unsigned long address, rmap_t flags, enum rmap_level level) > > { > > ... > > if (unlikely(!folio_test_anon(folio))) { > > VM_WARN_ON_FOLIO(folio_test_large(folio) && > > level != RMAP_LEVEL_PMD, folio); > > } > > ... > > } > > > > It also enhances the code’s readability. > > > > Suggested-by: David Hildenbrand <david@redhat.com> > > Signed-off-by: Barry Song <v-songbaohua@oppo.com> > > --- > > mm/memory.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/mm/memory.c b/mm/memory.c > > index 2f94921091fb..9c962f62f928 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -4339,6 +4339,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > > if (unlikely(folio != swapcache && swapcache)) { > > folio_add_new_anon_rmap(folio, vma, address, RMAP_EXCLUSIVE); > > folio_add_lru_vma(folio, vma); > > + } else if (!folio_test_anon(folio)) { > > + folio_add_new_anon_rmap(folio, vma, address, rmap_flags); > > So, with large folio swapin, we would never end up here if any swp PTE > is !exclusive, because we would make sure to allocate a large folio only > for suitable regions, correct? > > It can certainly happen during swapout + refault with large folios. But > there, we will always have folio_test_anon() still set and wouldn't run > into this code path. > > (it will all be better with per-folio PAE bit, but that will take some > more time until I actually get to implement it properly, handling all > the nasty corner cases) > > Better add a comment regarding why we are sure that the complete thing > is exclusive (e.g., currently only small folios). No, patch 1/3 enables both cases to call folio_add_new_anon_rmap(). Currently, small folios could be non-exclusive. I suppose your question is whether we should ensure that all pages within a mTHP have the same exclusivity status, rather than always being exclusive? > > > } else { > > folio_add_anon_rmap_ptes(folio, page, nr_pages, vma, address, > > rmap_flags); > > -- > Cheers, > > David / dhildenb > Thanks Barry
On 13.06.24 11:58, Barry Song wrote: > On Thu, Jun 13, 2024 at 9:23 PM David Hildenbrand <david@redhat.com> wrote: >> >> On 13.06.24 02:07, Barry Song wrote: >>> From: Barry Song <v-songbaohua@oppo.com> >>> >>> For the !folio_test_anon(folio) case, we can now invoke folio_add_new_anon_rmap() >>> with the rmap flags set to either EXCLUSIVE or non-EXCLUSIVE. This action will >>> suppress the VM_WARN_ON_FOLIO check within __folio_add_anon_rmap() while initiating >>> the process of bringing up mTHP swapin. >>> >>> static __always_inline void __folio_add_anon_rmap(struct folio *folio, >>> struct page *page, int nr_pages, struct vm_area_struct *vma, >>> unsigned long address, rmap_t flags, enum rmap_level level) >>> { >>> ... >>> if (unlikely(!folio_test_anon(folio))) { >>> VM_WARN_ON_FOLIO(folio_test_large(folio) && >>> level != RMAP_LEVEL_PMD, folio); >>> } >>> ... >>> } >>> >>> It also enhances the code’s readability. >>> >>> Suggested-by: David Hildenbrand <david@redhat.com> >>> Signed-off-by: Barry Song <v-songbaohua@oppo.com> >>> --- >>> mm/memory.c | 2 ++ >>> 1 file changed, 2 insertions(+) >>> >>> diff --git a/mm/memory.c b/mm/memory.c >>> index 2f94921091fb..9c962f62f928 100644 >>> --- a/mm/memory.c >>> +++ b/mm/memory.c >>> @@ -4339,6 +4339,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) >>> if (unlikely(folio != swapcache && swapcache)) { >>> folio_add_new_anon_rmap(folio, vma, address, RMAP_EXCLUSIVE); >>> folio_add_lru_vma(folio, vma); >>> + } else if (!folio_test_anon(folio)) { >>> + folio_add_new_anon_rmap(folio, vma, address, rmap_flags); >> >> So, with large folio swapin, we would never end up here if any swp PTE >> is !exclusive, because we would make sure to allocate a large folio only >> for suitable regions, correct? >> >> It can certainly happen during swapout + refault with large folios. But >> there, we will always have folio_test_anon() still set and wouldn't run >> into this code path. >> >> (it will all be better with per-folio PAE bit, but that will take some >> more time until I actually get to implement it properly, handling all >> the nasty corner cases) >> >> Better add a comment regarding why we are sure that the complete thing >> is exclusive (e.g., currently only small folios). > > No, patch 1/3 enables both cases to call folio_add_new_anon_rmap(). Currently, > small folios could be non-exclusive. I suppose your question is > whether we should > ensure that all pages within a mTHP have the same exclusivity status, > rather than > always being exclusive? We must never end up calling folio_add_new_anon_rmap(folio, vma, address, RMAP_EXCLUSIVE) if part of the folio is non-exclusive. I think we *may* call folio_add_new_anon_rmap(folio, vma, address, 0) if part of the folio is exclusive [it go swapped out, so there cannot be GUP references]. But, to be future proof (single PAE bit per folio), we better avoid that on swapin if possible. For small folios, it's clear that it cannot be partially exclusive (single page). We better comment why we obey to the above. Like "We currently ensure that new folios cannot be partially exclusive: they are either fully exclusive or fully shared."
On Thu, Jun 13, 2024 at 10:37 PM David Hildenbrand <david@redhat.com> wrote: > > On 13.06.24 11:58, Barry Song wrote: > > On Thu, Jun 13, 2024 at 9:23 PM David Hildenbrand <david@redhat.com> wrote: > >> > >> On 13.06.24 02:07, Barry Song wrote: > >>> From: Barry Song <v-songbaohua@oppo.com> > >>> > >>> For the !folio_test_anon(folio) case, we can now invoke folio_add_new_anon_rmap() > >>> with the rmap flags set to either EXCLUSIVE or non-EXCLUSIVE. This action will > >>> suppress the VM_WARN_ON_FOLIO check within __folio_add_anon_rmap() while initiating > >>> the process of bringing up mTHP swapin. > >>> > >>> static __always_inline void __folio_add_anon_rmap(struct folio *folio, > >>> struct page *page, int nr_pages, struct vm_area_struct *vma, > >>> unsigned long address, rmap_t flags, enum rmap_level level) > >>> { > >>> ... > >>> if (unlikely(!folio_test_anon(folio))) { > >>> VM_WARN_ON_FOLIO(folio_test_large(folio) && > >>> level != RMAP_LEVEL_PMD, folio); > >>> } > >>> ... > >>> } > >>> > >>> It also enhances the code’s readability. > >>> > >>> Suggested-by: David Hildenbrand <david@redhat.com> > >>> Signed-off-by: Barry Song <v-songbaohua@oppo.com> > >>> --- > >>> mm/memory.c | 2 ++ > >>> 1 file changed, 2 insertions(+) > >>> > >>> diff --git a/mm/memory.c b/mm/memory.c > >>> index 2f94921091fb..9c962f62f928 100644 > >>> --- a/mm/memory.c > >>> +++ b/mm/memory.c > >>> @@ -4339,6 +4339,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > >>> if (unlikely(folio != swapcache && swapcache)) { > >>> folio_add_new_anon_rmap(folio, vma, address, RMAP_EXCLUSIVE); > >>> folio_add_lru_vma(folio, vma); > >>> + } else if (!folio_test_anon(folio)) { > >>> + folio_add_new_anon_rmap(folio, vma, address, rmap_flags); > >> > >> So, with large folio swapin, we would never end up here if any swp PTE > >> is !exclusive, because we would make sure to allocate a large folio only > >> for suitable regions, correct? > >> > >> It can certainly happen during swapout + refault with large folios. But > >> there, we will always have folio_test_anon() still set and wouldn't run > >> into this code path. > >> > >> (it will all be better with per-folio PAE bit, but that will take some > >> more time until I actually get to implement it properly, handling all > >> the nasty corner cases) > >> > >> Better add a comment regarding why we are sure that the complete thing > >> is exclusive (e.g., currently only small folios). > > > > No, patch 1/3 enables both cases to call folio_add_new_anon_rmap(). Currently, > > small folios could be non-exclusive. I suppose your question is > > whether we should > > ensure that all pages within a mTHP have the same exclusivity status, > > rather than > > always being exclusive? > > We must never end up calling folio_add_new_anon_rmap(folio, vma, > address, RMAP_EXCLUSIVE) if part of the folio is non-exclusive. Agreed. > > I think we *may* call folio_add_new_anon_rmap(folio, vma, address, 0) if > part of the folio is exclusive [it go swapped out, so there cannot be > GUP references]. But, to be future proof (single PAE bit per folio), we > better avoid that on swapin if possible. > > For small folios, it's clear that it cannot be partially exclusive > (single page). Yes, for small folios, it is much simpler; they are either entirely exclusive or entirely non-exclusive. For the initial swapin patch[1], which only supports SYNC-IO mTHP swapin, mTHP is always entirely exclusive. I am also considering non-SYNCHRONOUS_IO swapcache-based mTHP swapin but intend to start with the entirely exclusive cases. [1] https://lore.kernel.org/linux-mm/20240304081348.197341-6-21cnbao@gmail.com/ > > We better comment why we obey to the above. Like > > "We currently ensure that new folios cannot be partially exclusive: they > are either fully exclusive or fully shared." > > -- > Cheers, > > David / dhildenb > Thanks Barry
diff --git a/mm/memory.c b/mm/memory.c index 2f94921091fb..9c962f62f928 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4339,6 +4339,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) if (unlikely(folio != swapcache && swapcache)) { folio_add_new_anon_rmap(folio, vma, address, RMAP_EXCLUSIVE); folio_add_lru_vma(folio, vma); + } else if (!folio_test_anon(folio)) { + folio_add_new_anon_rmap(folio, vma, address, rmap_flags); } else { folio_add_anon_rmap_ptes(folio, page, nr_pages, vma, address, rmap_flags);