diff mbox series

[RFC,2/3] mm: do_swap_page: use folio_add_new_anon_rmap() if folio_test_anon(folio)==false

Message ID 20240613000721.23093-3-21cnbao@gmail.com (mailing list archive)
State New
Headers show
Series mm: clarify folio_add_new_anon_rmap() and __folio_add_anon_rmap() | expand

Commit Message

Barry Song June 13, 2024, 12:07 a.m. UTC
From: Barry Song <v-songbaohua@oppo.com>

For the !folio_test_anon(folio) case, we can now invoke folio_add_new_anon_rmap()
with the rmap flags set to either EXCLUSIVE or non-EXCLUSIVE. This action will
suppress the VM_WARN_ON_FOLIO check within __folio_add_anon_rmap() while initiating
the process of bringing up mTHP swapin.

 static __always_inline void __folio_add_anon_rmap(struct folio *folio,
                 struct page *page, int nr_pages, struct vm_area_struct *vma,
                 unsigned long address, rmap_t flags, enum rmap_level level)
 {
         ...
         if (unlikely(!folio_test_anon(folio))) {
                 VM_WARN_ON_FOLIO(folio_test_large(folio) &&
                                  level != RMAP_LEVEL_PMD, folio);
         }
         ...
 }

It also enhances the code’s readability.

Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
---
 mm/memory.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

David Hildenbrand June 13, 2024, 9:23 a.m. UTC | #1
On 13.06.24 02:07, Barry Song wrote:
> From: Barry Song <v-songbaohua@oppo.com>
> 
> For the !folio_test_anon(folio) case, we can now invoke folio_add_new_anon_rmap()
> with the rmap flags set to either EXCLUSIVE or non-EXCLUSIVE. This action will
> suppress the VM_WARN_ON_FOLIO check within __folio_add_anon_rmap() while initiating
> the process of bringing up mTHP swapin.
> 
>   static __always_inline void __folio_add_anon_rmap(struct folio *folio,
>                   struct page *page, int nr_pages, struct vm_area_struct *vma,
>                   unsigned long address, rmap_t flags, enum rmap_level level)
>   {
>           ...
>           if (unlikely(!folio_test_anon(folio))) {
>                   VM_WARN_ON_FOLIO(folio_test_large(folio) &&
>                                    level != RMAP_LEVEL_PMD, folio);
>           }
>           ...
>   }
> 
> It also enhances the code’s readability.
> 
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> ---
>   mm/memory.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 2f94921091fb..9c962f62f928 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4339,6 +4339,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
>   	if (unlikely(folio != swapcache && swapcache)) {
>   		folio_add_new_anon_rmap(folio, vma, address, RMAP_EXCLUSIVE);
>   		folio_add_lru_vma(folio, vma);
> +	} else if (!folio_test_anon(folio)) {
> +		folio_add_new_anon_rmap(folio, vma, address, rmap_flags);

So, with large folio swapin, we would never end up here if any swp PTE 
is !exclusive, because we would make sure to allocate a large folio only 
for suitable regions, correct?

It can certainly happen during swapout + refault with large folios. But 
there, we will always have folio_test_anon() still set and wouldn't run 
into this code path.

(it will all be better with per-folio PAE bit, but that will take some 
more time until I actually get to implement it properly, handling all 
the nasty corner cases)

Better add a comment regarding why we are sure that the complete thing 
is exclusive (e.g., currently only small folios).

>   	} else {
>   		folio_add_anon_rmap_ptes(folio, page, nr_pages, vma, address,
>   					rmap_flags);
Barry Song June 13, 2024, 9:58 a.m. UTC | #2
On Thu, Jun 13, 2024 at 9:23 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 13.06.24 02:07, Barry Song wrote:
> > From: Barry Song <v-songbaohua@oppo.com>
> >
> > For the !folio_test_anon(folio) case, we can now invoke folio_add_new_anon_rmap()
> > with the rmap flags set to either EXCLUSIVE or non-EXCLUSIVE. This action will
> > suppress the VM_WARN_ON_FOLIO check within __folio_add_anon_rmap() while initiating
> > the process of bringing up mTHP swapin.
> >
> >   static __always_inline void __folio_add_anon_rmap(struct folio *folio,
> >                   struct page *page, int nr_pages, struct vm_area_struct *vma,
> >                   unsigned long address, rmap_t flags, enum rmap_level level)
> >   {
> >           ...
> >           if (unlikely(!folio_test_anon(folio))) {
> >                   VM_WARN_ON_FOLIO(folio_test_large(folio) &&
> >                                    level != RMAP_LEVEL_PMD, folio);
> >           }
> >           ...
> >   }
> >
> > It also enhances the code’s readability.
> >
> > Suggested-by: David Hildenbrand <david@redhat.com>
> > Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> > ---
> >   mm/memory.c | 2 ++
> >   1 file changed, 2 insertions(+)
> >
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 2f94921091fb..9c962f62f928 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -4339,6 +4339,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> >       if (unlikely(folio != swapcache && swapcache)) {
> >               folio_add_new_anon_rmap(folio, vma, address, RMAP_EXCLUSIVE);
> >               folio_add_lru_vma(folio, vma);
> > +     } else if (!folio_test_anon(folio)) {
> > +             folio_add_new_anon_rmap(folio, vma, address, rmap_flags);
>
> So, with large folio swapin, we would never end up here if any swp PTE
> is !exclusive, because we would make sure to allocate a large folio only
> for suitable regions, correct?
>
> It can certainly happen during swapout + refault with large folios. But
> there, we will always have folio_test_anon() still set and wouldn't run
> into this code path.
>
> (it will all be better with per-folio PAE bit, but that will take some
> more time until I actually get to implement it properly, handling all
> the nasty corner cases)
>
> Better add a comment regarding why we are sure that the complete thing
> is exclusive (e.g., currently only small folios).

No, patch 1/3 enables both cases to call folio_add_new_anon_rmap(). Currently,
small folios could be non-exclusive. I suppose your question is
whether we should
ensure that all pages within a mTHP have the same exclusivity status,
rather than
always being exclusive?

>
> >       } else {
> >               folio_add_anon_rmap_ptes(folio, page, nr_pages, vma, address,
> >                                       rmap_flags);
>
> --
> Cheers,
>
> David / dhildenb
>

Thanks
Barry
David Hildenbrand June 13, 2024, 10:37 a.m. UTC | #3
On 13.06.24 11:58, Barry Song wrote:
> On Thu, Jun 13, 2024 at 9:23 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 13.06.24 02:07, Barry Song wrote:
>>> From: Barry Song <v-songbaohua@oppo.com>
>>>
>>> For the !folio_test_anon(folio) case, we can now invoke folio_add_new_anon_rmap()
>>> with the rmap flags set to either EXCLUSIVE or non-EXCLUSIVE. This action will
>>> suppress the VM_WARN_ON_FOLIO check within __folio_add_anon_rmap() while initiating
>>> the process of bringing up mTHP swapin.
>>>
>>>    static __always_inline void __folio_add_anon_rmap(struct folio *folio,
>>>                    struct page *page, int nr_pages, struct vm_area_struct *vma,
>>>                    unsigned long address, rmap_t flags, enum rmap_level level)
>>>    {
>>>            ...
>>>            if (unlikely(!folio_test_anon(folio))) {
>>>                    VM_WARN_ON_FOLIO(folio_test_large(folio) &&
>>>                                     level != RMAP_LEVEL_PMD, folio);
>>>            }
>>>            ...
>>>    }
>>>
>>> It also enhances the code’s readability.
>>>
>>> Suggested-by: David Hildenbrand <david@redhat.com>
>>> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
>>> ---
>>>    mm/memory.c | 2 ++
>>>    1 file changed, 2 insertions(+)
>>>
>>> diff --git a/mm/memory.c b/mm/memory.c
>>> index 2f94921091fb..9c962f62f928 100644
>>> --- a/mm/memory.c
>>> +++ b/mm/memory.c
>>> @@ -4339,6 +4339,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
>>>        if (unlikely(folio != swapcache && swapcache)) {
>>>                folio_add_new_anon_rmap(folio, vma, address, RMAP_EXCLUSIVE);
>>>                folio_add_lru_vma(folio, vma);
>>> +     } else if (!folio_test_anon(folio)) {
>>> +             folio_add_new_anon_rmap(folio, vma, address, rmap_flags);
>>
>> So, with large folio swapin, we would never end up here if any swp PTE
>> is !exclusive, because we would make sure to allocate a large folio only
>> for suitable regions, correct?
>>
>> It can certainly happen during swapout + refault with large folios. But
>> there, we will always have folio_test_anon() still set and wouldn't run
>> into this code path.
>>
>> (it will all be better with per-folio PAE bit, but that will take some
>> more time until I actually get to implement it properly, handling all
>> the nasty corner cases)
>>
>> Better add a comment regarding why we are sure that the complete thing
>> is exclusive (e.g., currently only small folios).
> 
> No, patch 1/3 enables both cases to call folio_add_new_anon_rmap(). Currently,
> small folios could be non-exclusive. I suppose your question is
> whether we should
> ensure that all pages within a mTHP have the same exclusivity status,
> rather than
> always being exclusive?

We must never end up calling folio_add_new_anon_rmap(folio, vma, 
address, RMAP_EXCLUSIVE) if part of the folio is non-exclusive.

I think we *may* call folio_add_new_anon_rmap(folio, vma, address, 0) if 
part of the folio is exclusive [it go swapped out, so there cannot be 
GUP references]. But, to be future proof (single PAE bit per folio), we 
better avoid that on swapin if possible.

For small folios, it's clear that it cannot be partially exclusive 
(single page).

We better comment why we obey to the above. Like

"We currently ensure that new folios cannot be partially exclusive: they 
are either fully exclusive or fully shared."
Barry Song June 13, 2024, 10:55 a.m. UTC | #4
On Thu, Jun 13, 2024 at 10:37 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 13.06.24 11:58, Barry Song wrote:
> > On Thu, Jun 13, 2024 at 9:23 PM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 13.06.24 02:07, Barry Song wrote:
> >>> From: Barry Song <v-songbaohua@oppo.com>
> >>>
> >>> For the !folio_test_anon(folio) case, we can now invoke folio_add_new_anon_rmap()
> >>> with the rmap flags set to either EXCLUSIVE or non-EXCLUSIVE. This action will
> >>> suppress the VM_WARN_ON_FOLIO check within __folio_add_anon_rmap() while initiating
> >>> the process of bringing up mTHP swapin.
> >>>
> >>>    static __always_inline void __folio_add_anon_rmap(struct folio *folio,
> >>>                    struct page *page, int nr_pages, struct vm_area_struct *vma,
> >>>                    unsigned long address, rmap_t flags, enum rmap_level level)
> >>>    {
> >>>            ...
> >>>            if (unlikely(!folio_test_anon(folio))) {
> >>>                    VM_WARN_ON_FOLIO(folio_test_large(folio) &&
> >>>                                     level != RMAP_LEVEL_PMD, folio);
> >>>            }
> >>>            ...
> >>>    }
> >>>
> >>> It also enhances the code’s readability.
> >>>
> >>> Suggested-by: David Hildenbrand <david@redhat.com>
> >>> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> >>> ---
> >>>    mm/memory.c | 2 ++
> >>>    1 file changed, 2 insertions(+)
> >>>
> >>> diff --git a/mm/memory.c b/mm/memory.c
> >>> index 2f94921091fb..9c962f62f928 100644
> >>> --- a/mm/memory.c
> >>> +++ b/mm/memory.c
> >>> @@ -4339,6 +4339,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> >>>        if (unlikely(folio != swapcache && swapcache)) {
> >>>                folio_add_new_anon_rmap(folio, vma, address, RMAP_EXCLUSIVE);
> >>>                folio_add_lru_vma(folio, vma);
> >>> +     } else if (!folio_test_anon(folio)) {
> >>> +             folio_add_new_anon_rmap(folio, vma, address, rmap_flags);
> >>
> >> So, with large folio swapin, we would never end up here if any swp PTE
> >> is !exclusive, because we would make sure to allocate a large folio only
> >> for suitable regions, correct?
> >>
> >> It can certainly happen during swapout + refault with large folios. But
> >> there, we will always have folio_test_anon() still set and wouldn't run
> >> into this code path.
> >>
> >> (it will all be better with per-folio PAE bit, but that will take some
> >> more time until I actually get to implement it properly, handling all
> >> the nasty corner cases)
> >>
> >> Better add a comment regarding why we are sure that the complete thing
> >> is exclusive (e.g., currently only small folios).
> >
> > No, patch 1/3 enables both cases to call folio_add_new_anon_rmap(). Currently,
> > small folios could be non-exclusive. I suppose your question is
> > whether we should
> > ensure that all pages within a mTHP have the same exclusivity status,
> > rather than
> > always being exclusive?
>
> We must never end up calling folio_add_new_anon_rmap(folio, vma,
> address, RMAP_EXCLUSIVE) if part of the folio is non-exclusive.

Agreed.

>
> I think we *may* call folio_add_new_anon_rmap(folio, vma, address, 0) if
> part of the folio is exclusive [it go swapped out, so there cannot be
> GUP references]. But, to be future proof (single PAE bit per folio), we
> better avoid that on swapin if possible.
>
> For small folios, it's clear that it cannot be partially exclusive
> (single page).

Yes, for small folios, it is much simpler; they are either entirely exclusive or
entirely non-exclusive.

For the initial swapin patch[1], which only supports SYNC-IO mTHP swapin,
mTHP is always entirely exclusive. I am also considering non-SYNCHRONOUS_IO
swapcache-based mTHP swapin but intend to start with the entirely exclusive
cases.

[1] https://lore.kernel.org/linux-mm/20240304081348.197341-6-21cnbao@gmail.com/

>
> We better comment why we obey to the above. Like
>
> "We currently ensure that new folios cannot be partially exclusive: they
> are either fully exclusive or fully shared."
>
> --
> Cheers,
>
> David / dhildenb
>

Thanks
Barry
diff mbox series

Patch

diff --git a/mm/memory.c b/mm/memory.c
index 2f94921091fb..9c962f62f928 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4339,6 +4339,8 @@  vm_fault_t do_swap_page(struct vm_fault *vmf)
 	if (unlikely(folio != swapcache && swapcache)) {
 		folio_add_new_anon_rmap(folio, vma, address, RMAP_EXCLUSIVE);
 		folio_add_lru_vma(folio, vma);
+	} else if (!folio_test_anon(folio)) {
+		folio_add_new_anon_rmap(folio, vma, address, rmap_flags);
 	} else {
 		folio_add_anon_rmap_ptes(folio, page, nr_pages, vma, address,
 					rmap_flags);