diff mbox series

[v5] mm: shrink skip folio mapped by an exiting process

Message ID 20240708090413.888-1-justinjiang@vivo.com (mailing list archive)
State New
Headers show
Series [v5] mm: shrink skip folio mapped by an exiting process | expand

Commit Message

zhiguojiang July 8, 2024, 9:04 a.m. UTC
The releasing process of the non-shared anonymous folio mapped solely by
an exiting process may go through two flows: 1) the anonymous folio is
firstly is swaped-out into swapspace and transformed into a swp_entry
in shrink_folio_list; 2) then the swp_entry is released in the process
exiting flow. This will increase the cpu load of releasing a non-shared
anonymous folio mapped solely by an exiting process, because the folio
go through swap-out and the releasing the swapspace and swp_entry.

When system is low memory, it is more likely to occur, because more
backend applidatuions will be killed.

The modification is that shrink skips the non-shared anonymous folio
solely mapped by an exting process and the folio is only released
directly in the process exiting flow, which will save swap-out time
and alleviate the load of the process exiting. 

Signed-off-by: Zhiguo Jiang <justinjiang@vivo.com>
---

Change log:
v4->v5:
1.Modify to skip non-shared anonymous folio only.
2.Update comments for pra->referenced = -1.
v3->v4:
1.Modify that the unshared folios mapped only in exiting task are skip.
v2->v3:
Nothing.
v1->v2:
1.The VM_EXITING added in v1 patch is removed, because it will fail
to compile in 32-bit system.

 mm/rmap.c   | 13 +++++++++++++
 mm/vmscan.c |  7 ++++++-
 2 files changed, 19 insertions(+), 1 deletion(-)

Comments

David Hildenbrand July 8, 2024, 9:36 a.m. UTC | #1
On 08.07.24 11:04, Zhiguo Jiang wrote:
> The releasing process of the non-shared anonymous folio mapped solely by
> an exiting process may go through two flows: 1) the anonymous folio is
> firstly is swaped-out into swapspace and transformed into a swp_entry
> in shrink_folio_list; 2) then the swp_entry is released in the process
> exiting flow. This will increase the cpu load of releasing a non-shared
> anonymous folio mapped solely by an exiting process, because the folio
> go through swap-out and the releasing the swapspace and swp_entry.
> 
> When system is low memory, it is more likely to occur, because more
> backend applidatuions will be killed.
> 
> The modification is that shrink skips the non-shared anonymous folio
> solely mapped by an exting process and the folio is only released
> directly in the process exiting flow, which will save swap-out time
> and alleviate the load of the process exiting.
> 
> Signed-off-by: Zhiguo Jiang <justinjiang@vivo.com>
> ---
> 
> Change log:
> v4->v5:
> 1.Modify to skip non-shared anonymous folio only.
> 2.Update comments for pra->referenced = -1.
> v3->v4:
> 1.Modify that the unshared folios mapped only in exiting task are skip.
> v2->v3:
> Nothing.
> v1->v2:
> 1.The VM_EXITING added in v1 patch is removed, because it will fail
> to compile in 32-bit system.
> 
>   mm/rmap.c   | 13 +++++++++++++
>   mm/vmscan.c |  7 ++++++-
>   2 files changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 26806b49a86f..5b5281d71dbb
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -843,6 +843,19 @@ static bool folio_referenced_one(struct folio *folio,
>   	int referenced = 0;
>   	unsigned long start = address, ptes = 0;
>   
> +	/*
> +	 * Skip the non-shared anonymous folio mapped solely by
> +	 * the single exiting process, and release it directly
> +	 * in the process exiting.
> +	 */
> +	if ((!atomic_read(&vma->vm_mm->mm_users) ||
> +		test_bit(MMF_OOM_SKIP, &vma->vm_mm->flags)) &&
> +		folio_test_anon(folio) && folio_test_swapbacked(folio) &&
> +		!folio_likely_mapped_shared(folio)) {

I'm currently working on moving all folio_likely_mapped_shared() under 
the PTL, where we are then sure that the folio is actually mapped by 
this process (e.g., no concurrent unmapping poisslbe).

Can we do the same here directly?
Barry Song July 8, 2024, 9:46 a.m. UTC | #2
On Mon, Jul 8, 2024 at 9:36 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 08.07.24 11:04, Zhiguo Jiang wrote:
> > The releasing process of the non-shared anonymous folio mapped solely by
> > an exiting process may go through two flows: 1) the anonymous folio is
> > firstly is swaped-out into swapspace and transformed into a swp_entry
> > in shrink_folio_list; 2) then the swp_entry is released in the process
> > exiting flow. This will increase the cpu load of releasing a non-shared
> > anonymous folio mapped solely by an exiting process, because the folio
> > go through swap-out and the releasing the swapspace and swp_entry.
> >
> > When system is low memory, it is more likely to occur, because more
> > backend applidatuions will be killed.
> >
> > The modification is that shrink skips the non-shared anonymous folio
> > solely mapped by an exting process and the folio is only released
> > directly in the process exiting flow, which will save swap-out time
> > and alleviate the load of the process exiting.
> >
> > Signed-off-by: Zhiguo Jiang <justinjiang@vivo.com>
> > ---
> >
> > Change log:
> > v4->v5:
> > 1.Modify to skip non-shared anonymous folio only.
> > 2.Update comments for pra->referenced = -1.
> > v3->v4:
> > 1.Modify that the unshared folios mapped only in exiting task are skip.
> > v2->v3:
> > Nothing.
> > v1->v2:
> > 1.The VM_EXITING added in v1 patch is removed, because it will fail
> > to compile in 32-bit system.
> >
> >   mm/rmap.c   | 13 +++++++++++++
> >   mm/vmscan.c |  7 ++++++-
> >   2 files changed, 19 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index 26806b49a86f..5b5281d71dbb
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -843,6 +843,19 @@ static bool folio_referenced_one(struct folio *folio,
> >       int referenced = 0;
> >       unsigned long start = address, ptes = 0;
> >
> > +     /*
> > +      * Skip the non-shared anonymous folio mapped solely by
> > +      * the single exiting process, and release it directly
> > +      * in the process exiting.
> > +      */
> > +     if ((!atomic_read(&vma->vm_mm->mm_users) ||
> > +             test_bit(MMF_OOM_SKIP, &vma->vm_mm->flags)) &&
> > +             folio_test_anon(folio) && folio_test_swapbacked(folio) &&
> > +             !folio_likely_mapped_shared(folio)) {
>
> I'm currently working on moving all folio_likely_mapped_shared() under
> the PTL, where we are then sure that the folio is actually mapped by
> this process (e.g., no concurrent unmapping poisslbe).
>
> Can we do the same here directly?

Implementing this is challenging because page_vma_mapped_walk() is
responsible for
traversing the page table to acquire and release the PTL. This becomes
particularly
complex with mTHP, as we may need to interrupt the page_vma_mapped_walk
loop at the first PTE.

>
> --
> Cheers,
>
> David / dhildenb
>

Thanks
Barry
David Hildenbrand July 8, 2024, 9:49 a.m. UTC | #3
On 08.07.24 11:46, Barry Song wrote:
> On Mon, Jul 8, 2024 at 9:36 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 08.07.24 11:04, Zhiguo Jiang wrote:
>>> The releasing process of the non-shared anonymous folio mapped solely by
>>> an exiting process may go through two flows: 1) the anonymous folio is
>>> firstly is swaped-out into swapspace and transformed into a swp_entry
>>> in shrink_folio_list; 2) then the swp_entry is released in the process
>>> exiting flow. This will increase the cpu load of releasing a non-shared
>>> anonymous folio mapped solely by an exiting process, because the folio
>>> go through swap-out and the releasing the swapspace and swp_entry.
>>>
>>> When system is low memory, it is more likely to occur, because more
>>> backend applidatuions will be killed.
>>>
>>> The modification is that shrink skips the non-shared anonymous folio
>>> solely mapped by an exting process and the folio is only released
>>> directly in the process exiting flow, which will save swap-out time
>>> and alleviate the load of the process exiting.
>>>
>>> Signed-off-by: Zhiguo Jiang <justinjiang@vivo.com>
>>> ---
>>>
>>> Change log:
>>> v4->v5:
>>> 1.Modify to skip non-shared anonymous folio only.
>>> 2.Update comments for pra->referenced = -1.
>>> v3->v4:
>>> 1.Modify that the unshared folios mapped only in exiting task are skip.
>>> v2->v3:
>>> Nothing.
>>> v1->v2:
>>> 1.The VM_EXITING added in v1 patch is removed, because it will fail
>>> to compile in 32-bit system.
>>>
>>>    mm/rmap.c   | 13 +++++++++++++
>>>    mm/vmscan.c |  7 ++++++-
>>>    2 files changed, 19 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>> index 26806b49a86f..5b5281d71dbb
>>> --- a/mm/rmap.c
>>> +++ b/mm/rmap.c
>>> @@ -843,6 +843,19 @@ static bool folio_referenced_one(struct folio *folio,
>>>        int referenced = 0;
>>>        unsigned long start = address, ptes = 0;
>>>
>>> +     /*
>>> +      * Skip the non-shared anonymous folio mapped solely by
>>> +      * the single exiting process, and release it directly
>>> +      * in the process exiting.
>>> +      */
>>> +     if ((!atomic_read(&vma->vm_mm->mm_users) ||
>>> +             test_bit(MMF_OOM_SKIP, &vma->vm_mm->flags)) &&
>>> +             folio_test_anon(folio) && folio_test_swapbacked(folio) &&
>>> +             !folio_likely_mapped_shared(folio)) {
>>
>> I'm currently working on moving all folio_likely_mapped_shared() under
>> the PTL, where we are then sure that the folio is actually mapped by
>> this process (e.g., no concurrent unmapping poisslbe).
>>
>> Can we do the same here directly?
> 
> Implementing this is challenging because page_vma_mapped_walk() is
> responsible for
> traversing the page table to acquire and release the PTL. This becomes
> particularly
> complex with mTHP, as we may need to interrupt the page_vma_mapped_walk
> loop at the first PTE.

Why can't we perform the check under the PTL and bail out? I'm missing 
something important.
Barry Song July 8, 2024, 10:05 a.m. UTC | #4
On Mon, Jul 8, 2024 at 9:49 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 08.07.24 11:46, Barry Song wrote:
> > On Mon, Jul 8, 2024 at 9:36 PM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 08.07.24 11:04, Zhiguo Jiang wrote:
> >>> The releasing process of the non-shared anonymous folio mapped solely by
> >>> an exiting process may go through two flows: 1) the anonymous folio is
> >>> firstly is swaped-out into swapspace and transformed into a swp_entry
> >>> in shrink_folio_list; 2) then the swp_entry is released in the process
> >>> exiting flow. This will increase the cpu load of releasing a non-shared
> >>> anonymous folio mapped solely by an exiting process, because the folio
> >>> go through swap-out and the releasing the swapspace and swp_entry.
> >>>
> >>> When system is low memory, it is more likely to occur, because more
> >>> backend applidatuions will be killed.
> >>>
> >>> The modification is that shrink skips the non-shared anonymous folio
> >>> solely mapped by an exting process and the folio is only released
> >>> directly in the process exiting flow, which will save swap-out time
> >>> and alleviate the load of the process exiting.
> >>>
> >>> Signed-off-by: Zhiguo Jiang <justinjiang@vivo.com>
> >>> ---
> >>>
> >>> Change log:
> >>> v4->v5:
> >>> 1.Modify to skip non-shared anonymous folio only.
> >>> 2.Update comments for pra->referenced = -1.
> >>> v3->v4:
> >>> 1.Modify that the unshared folios mapped only in exiting task are skip.
> >>> v2->v3:
> >>> Nothing.
> >>> v1->v2:
> >>> 1.The VM_EXITING added in v1 patch is removed, because it will fail
> >>> to compile in 32-bit system.
> >>>
> >>>    mm/rmap.c   | 13 +++++++++++++
> >>>    mm/vmscan.c |  7 ++++++-
> >>>    2 files changed, 19 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/mm/rmap.c b/mm/rmap.c
> >>> index 26806b49a86f..5b5281d71dbb
> >>> --- a/mm/rmap.c
> >>> +++ b/mm/rmap.c
> >>> @@ -843,6 +843,19 @@ static bool folio_referenced_one(struct folio *folio,
> >>>        int referenced = 0;
> >>>        unsigned long start = address, ptes = 0;
> >>>
> >>> +     /*
> >>> +      * Skip the non-shared anonymous folio mapped solely by
> >>> +      * the single exiting process, and release it directly
> >>> +      * in the process exiting.
> >>> +      */
> >>> +     if ((!atomic_read(&vma->vm_mm->mm_users) ||
> >>> +             test_bit(MMF_OOM_SKIP, &vma->vm_mm->flags)) &&
> >>> +             folio_test_anon(folio) && folio_test_swapbacked(folio) &&
> >>> +             !folio_likely_mapped_shared(folio)) {
> >>
> >> I'm currently working on moving all folio_likely_mapped_shared() under
> >> the PTL, where we are then sure that the folio is actually mapped by
> >> this process (e.g., no concurrent unmapping poisslbe).
> >>
> >> Can we do the same here directly?
> >
> > Implementing this is challenging because page_vma_mapped_walk() is
> > responsible for
> > traversing the page table to acquire and release the PTL. This becomes
> > particularly
> > complex with mTHP, as we may need to interrupt the page_vma_mapped_walk
> > loop at the first PTE.
>
> Why can't we perform the check under the PTL and bail out? I'm missing
> something important.

You are right. we might use page_vma_mapped_walk_done() to bail out.

>
> --
> Cheers,
>
> David / dhildenb
>
Barry Song July 8, 2024, 11:02 a.m. UTC | #5
On Mon, Jul 8, 2024 at 9:04 PM Zhiguo Jiang <justinjiang@vivo.com> wrote:
>
> The releasing process of the non-shared anonymous folio mapped solely by
> an exiting process may go through two flows: 1) the anonymous folio is
> firstly is swaped-out into swapspace and transformed into a swp_entry
> in shrink_folio_list; 2) then the swp_entry is released in the process
> exiting flow. This will increase the cpu load of releasing a non-shared
> anonymous folio mapped solely by an exiting process, because the folio
> go through swap-out and the releasing the swapspace and swp_entry.
>
> When system is low memory, it is more likely to occur, because more
> backend applidatuions will be killed.
>
> The modification is that shrink skips the non-shared anonymous folio
> solely mapped by an exting process and the folio is only released
> directly in the process exiting flow, which will save swap-out time
> and alleviate the load of the process exiting.
>
> Signed-off-by: Zhiguo Jiang <justinjiang@vivo.com>
> ---
>
> Change log:
> v4->v5:
> 1.Modify to skip non-shared anonymous folio only.
> 2.Update comments for pra->referenced = -1.
> v3->v4:
> 1.Modify that the unshared folios mapped only in exiting task are skip.
> v2->v3:
> Nothing.
> v1->v2:
> 1.The VM_EXITING added in v1 patch is removed, because it will fail
> to compile in 32-bit system.
>
>  mm/rmap.c   | 13 +++++++++++++
>  mm/vmscan.c |  7 ++++++-
>  2 files changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 26806b49a86f..5b5281d71dbb
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -843,6 +843,19 @@ static bool folio_referenced_one(struct folio *folio,
>         int referenced = 0;
>         unsigned long start = address, ptes = 0;
>
> +       /*
> +        * Skip the non-shared anonymous folio mapped solely by
> +        * the single exiting process, and release it directly
> +        * in the process exiting.
> +        */
> +       if ((!atomic_read(&vma->vm_mm->mm_users) ||
> +               test_bit(MMF_OOM_SKIP, &vma->vm_mm->flags)) &&
> +               folio_test_anon(folio) && folio_test_swapbacked(folio) &&
> +               !folio_likely_mapped_shared(folio)) {
> +               pra->referenced = -1;
> +               return false;
> +       }
> +
>         while (page_vma_mapped_walk(&pvmw)) {
>                 address = pvmw.address;

As David suggested, what about the below?

@@ -883,6 +870,21 @@ static bool folio_referenced_one(struct folio *folio,
                        continue;
                }

+               /*
+                * Skip the non-shared anonymous folio mapped solely by
+                * the single exiting process, and release it directly
+                * in the process exiting.
+                */
+               if ((!atomic_read(&vma->vm_mm->mm_users) ||
+                                       test_bit(MMF_OOM_SKIP,
&vma->vm_mm->flags)) &&
+                               folio_test_anon(folio) &&
folio_test_swapbacked(folio) &&
+                               !folio_likely_mapped_shared(folio)) {
+                       pra->referenced = -1;
+                       page_vma_mapped_walk_done(&pvmw);
+                       return false;
+               }
+
                if (pvmw.pte) {
                        if (lru_gen_enabled() &&
                            pte_young(ptep_get(pvmw.pte))) {


By the way, I am not convinced that using test_bit(MMF_OOM_SKIP,
&vma->vm_mm->flags) is
correct (I think it is wrong).   For example, global_init can directly have it:
                if (is_global_init(p)) {
                        can_oom_reap = false;
                        set_bit(MMF_OOM_SKIP, &mm->flags);
                        pr_info("oom killer %d (%s) has mm pinned by %d (%s)\n",
                                        task_pid_nr(victim), victim->comm,
                                        task_pid_nr(p), p->comm);
                        continue;
                }

And exit_mmap() automatically has MMF_OOM_SKIP.

What is the purpose of this check? Is there a better way to determine
if a process is an
OOM target? What about check_stable_address_space() ?


>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 0761f91b407f..bae7a8bf6b3d
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -863,7 +863,12 @@ static enum folio_references folio_check_references(struct folio *folio,
>         if (vm_flags & VM_LOCKED)
>                 return FOLIOREF_ACTIVATE;
>
> -       /* rmap lock contention: rotate */
> +       /*
> +        * There are two cases to consider.
> +        * 1) Rmap lock contention: rotate.
> +        * 2) Skip the non-shared anonymous folio mapped solely by
> +        *    the single exiting process.
> +        */
>         if (referenced_ptes == -1)
>                 return FOLIOREF_KEEP;
>
> --
> 2.39.0
>

Thanks
Barry
zhiguojiang July 8, 2024, 12:17 p.m. UTC | #6
在 2024/7/8 19:02, Barry Song 写道:
> On Mon, Jul 8, 2024 at 9:04 PM Zhiguo Jiang <justinjiang@vivo.com> wrote:
>> The releasing process of the non-shared anonymous folio mapped solely by
>> an exiting process may go through two flows: 1) the anonymous folio is
>> firstly is swaped-out into swapspace and transformed into a swp_entry
>> in shrink_folio_list; 2) then the swp_entry is released in the process
>> exiting flow. This will increase the cpu load of releasing a non-shared
>> anonymous folio mapped solely by an exiting process, because the folio
>> go through swap-out and the releasing the swapspace and swp_entry.
>>
>> When system is low memory, it is more likely to occur, because more
>> backend applidatuions will be killed.
>>
>> The modification is that shrink skips the non-shared anonymous folio
>> solely mapped by an exting process and the folio is only released
>> directly in the process exiting flow, which will save swap-out time
>> and alleviate the load of the process exiting.
>>
>> Signed-off-by: Zhiguo Jiang <justinjiang@vivo.com>
>> ---
>>
>> Change log:
>> v4->v5:
>> 1.Modify to skip non-shared anonymous folio only.
>> 2.Update comments for pra->referenced = -1.
>> v3->v4:
>> 1.Modify that the unshared folios mapped only in exiting task are skip.
>> v2->v3:
>> Nothing.
>> v1->v2:
>> 1.The VM_EXITING added in v1 patch is removed, because it will fail
>> to compile in 32-bit system.
>>
>>   mm/rmap.c   | 13 +++++++++++++
>>   mm/vmscan.c |  7 ++++++-
>>   2 files changed, 19 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index 26806b49a86f..5b5281d71dbb
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -843,6 +843,19 @@ static bool folio_referenced_one(struct folio *folio,
>>          int referenced = 0;
>>          unsigned long start = address, ptes = 0;
>>
>> +       /*
>> +        * Skip the non-shared anonymous folio mapped solely by
>> +        * the single exiting process, and release it directly
>> +        * in the process exiting.
>> +        */
>> +       if ((!atomic_read(&vma->vm_mm->mm_users) ||
>> +               test_bit(MMF_OOM_SKIP, &vma->vm_mm->flags)) &&
>> +               folio_test_anon(folio) && folio_test_swapbacked(folio) &&
>> +               !folio_likely_mapped_shared(folio)) {
>> +               pra->referenced = -1;
>> +               return false;
>> +       }
>> +
>>          while (page_vma_mapped_walk(&pvmw)) {
>>                  address = pvmw.address;
Sure, I agree with your modification suggestions. This way, using PTL 
indeed sure
that the folio is mapped by this process.
Thanks
> As David suggested, what about the below?
>
> @@ -883,6 +870,21 @@ static bool folio_referenced_one(struct folio *folio,
>                          continue;
>                  }
>
> +               /*
> +                * Skip the non-shared anonymous folio mapped solely by
> +                * the single exiting process, and release it directly
> +                * in the process exiting.
> +                */
> +               if ((!atomic_read(&vma->vm_mm->mm_users) ||
> +                                       test_bit(MMF_OOM_SKIP,
> &vma->vm_mm->flags)) &&
> +                               folio_test_anon(folio) &&
> folio_test_swapbacked(folio) &&
> +                               !folio_likely_mapped_shared(folio)) {
> +                       pra->referenced = -1;
> +                       page_vma_mapped_walk_done(&pvmw);
> +                       return false;
> +               }
> +
>                  if (pvmw.pte) {
>                          if (lru_gen_enabled() &&
>                              pte_young(ptep_get(pvmw.pte))) {
>
>
> By the way, I am not convinced that using test_bit(MMF_OOM_SKIP,
> &vma->vm_mm->flags) is
> correct (I think it is wrong).   For example, global_init can directly have it:
>                  if (is_global_init(p)) {
>                          can_oom_reap = false;
>                          set_bit(MMF_OOM_SKIP, &mm->flags);
>                          pr_info("oom killer %d (%s) has mm pinned by %d (%s)\n",
>                                          task_pid_nr(victim), victim->comm,
>                                          task_pid_nr(p), p->comm);
>                          continue;
>                  }
>
> And exit_mmap() automatically has MMF_OOM_SKIP.
>
> What is the purpose of this check? Is there a better way to determine
> if a process is an
> OOM target? What about check_stable_address_space() ?
1.Sorry, I overlook the situation with if (is_global_init(p)), 
MMF_OOM_SKIP is indeed not suitable.

2.check_stable_address_space() can indicate oom_reaper, but it seems 
unable to identify the situation where the process exits normally. What 
about task_is_dying()? static inline bool task_is_dying(void) { return 
tsk_is_oom_victim(current) || fatal_signal_pending(current) || 
(current->flags & PF_EXITING); } Thanks
>
>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index 0761f91b407f..bae7a8bf6b3d
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -863,7 +863,12 @@ static enum folio_references folio_check_references(struct folio *folio,
>>          if (vm_flags & VM_LOCKED)
>>                  return FOLIOREF_ACTIVATE;
>>
>> -       /* rmap lock contention: rotate */
>> +       /*
>> +        * There are two cases to consider.
>> +        * 1) Rmap lock contention: rotate.
>> +        * 2) Skip the non-shared anonymous folio mapped solely by
>> +        *    the single exiting process.
>> +        */
>>          if (referenced_ptes == -1)
>>                  return FOLIOREF_KEEP;
>>
>> --
>> 2.39.0
>>
> Thanks
> Barry
zhiguojiang July 8, 2024, 12:25 p.m. UTC | #7
在 2024/7/8 20:17, zhiguojiang 写道:
>
>
> 在 2024/7/8 19:02, Barry Song 写道:
>> On Mon, Jul 8, 2024 at 9:04 PM Zhiguo Jiang <justinjiang@vivo.com> 
>> wrote:
>>> The releasing process of the non-shared anonymous folio mapped 
>>> solely by
>>> an exiting process may go through two flows: 1) the anonymous folio is
>>> firstly is swaped-out into swapspace and transformed into a swp_entry
>>> in shrink_folio_list; 2) then the swp_entry is released in the process
>>> exiting flow. This will increase the cpu load of releasing a non-shared
>>> anonymous folio mapped solely by an exiting process, because the folio
>>> go through swap-out and the releasing the swapspace and swp_entry.
>>>
>>> When system is low memory, it is more likely to occur, because more
>>> backend applidatuions will be killed.
>>>
>>> The modification is that shrink skips the non-shared anonymous folio
>>> solely mapped by an exting process and the folio is only released
>>> directly in the process exiting flow, which will save swap-out time
>>> and alleviate the load of the process exiting.
>>>
>>> Signed-off-by: Zhiguo Jiang <justinjiang@vivo.com>
>>> ---
>>>
>>> Change log:
>>> v4->v5:
>>> 1.Modify to skip non-shared anonymous folio only.
>>> 2.Update comments for pra->referenced = -1.
>>> v3->v4:
>>> 1.Modify that the unshared folios mapped only in exiting task are skip.
>>> v2->v3:
>>> Nothing.
>>> v1->v2:
>>> 1.The VM_EXITING added in v1 patch is removed, because it will fail
>>> to compile in 32-bit system.
>>>
>>>   mm/rmap.c   | 13 +++++++++++++
>>>   mm/vmscan.c |  7 ++++++-
>>>   2 files changed, 19 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>> index 26806b49a86f..5b5281d71dbb
>>> --- a/mm/rmap.c
>>> +++ b/mm/rmap.c
>>> @@ -843,6 +843,19 @@ static bool folio_referenced_one(struct folio 
>>> *folio,
>>>          int referenced = 0;
>>>          unsigned long start = address, ptes = 0;
>>>
>>> +       /*
>>> +        * Skip the non-shared anonymous folio mapped solely by
>>> +        * the single exiting process, and release it directly
>>> +        * in the process exiting.
>>> +        */
>>> +       if ((!atomic_read(&vma->vm_mm->mm_users) ||
>>> +               test_bit(MMF_OOM_SKIP, &vma->vm_mm->flags)) &&
>>> +               folio_test_anon(folio) && 
>>> folio_test_swapbacked(folio) &&
>>> +               !folio_likely_mapped_shared(folio)) {
>>> +               pra->referenced = -1;
>>> +               return false;
>>> +       }
>>> +
>>>          while (page_vma_mapped_walk(&pvmw)) {
>>>                  address = pvmw.address;
> Sure, I agree with your modification suggestions. This way, using PTL 
> indeed sure
> that the folio is mapped by this process.
> Thanks
>> As David suggested, what about the below?
>>
>> @@ -883,6 +870,21 @@ static bool folio_referenced_one(struct folio 
>> *folio,
>>                          continue;
>>                  }
>>
>> +               /*
>> +                * Skip the non-shared anonymous folio mapped solely by
>> +                * the single exiting process, and release it directly
>> +                * in the process exiting.
>> +                */
>> +               if ((!atomic_read(&vma->vm_mm->mm_users) ||
>> +                                       test_bit(MMF_OOM_SKIP,
>> &vma->vm_mm->flags)) &&
>> +                               folio_test_anon(folio) &&
>> folio_test_swapbacked(folio) &&
>> + !folio_likely_mapped_shared(folio)) {
>> +                       pra->referenced = -1;
>> +                       page_vma_mapped_walk_done(&pvmw);
>> +                       return false;
>> +               }
>> +
>>                  if (pvmw.pte) {
>>                          if (lru_gen_enabled() &&
>>                              pte_young(ptep_get(pvmw.pte))) {
>>
>>
>> By the way, I am not convinced that using test_bit(MMF_OOM_SKIP,
>> &vma->vm_mm->flags) is
>> correct (I think it is wrong).   For example, global_init can 
>> directly have it:
>>                  if (is_global_init(p)) {
>>                          can_oom_reap = false;
>>                          set_bit(MMF_OOM_SKIP, &mm->flags);
>>                          pr_info("oom killer %d (%s) has mm pinned by 
>> %d (%s)\n",
>>                                          task_pid_nr(victim), 
>> victim->comm,
>>                                          task_pid_nr(p), p->comm);
>>                          continue;
>>                  }
>>
>> And exit_mmap() automatically has MMF_OOM_SKIP.
>>
>> What is the purpose of this check? Is there a better way to determine
>> if a process is an
>> OOM target? What about check_stable_address_space() ?
> 1.Sorry, I overlook the situation with if (is_global_init(p)), 
> MMF_OOM_SKIP is indeed not suitable.
>
> 2.check_stable_address_space() can indicate oom_reaper, but it seems 
> unable to identify the situation where the process exits normally. 
> What about task_is_dying()? static inline bool task_is_dying(void) { 
> return tsk_is_oom_victim(current) || fatal_signal_pending(current) || 
> (current->flags & PF_EXITING); } Thanks
We can migrate task_is_dying() from mm/memcontrol.c to include/linux/oom.h
> static inline bool task_is_dying(void)
> {
>     return tsk_is_oom_victim(current) || fatal_signal_pending(current) ||
>         (current->flags & PF_EXITING);
> }

>>
>>
>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>> index 0761f91b407f..bae7a8bf6b3d
>>> --- a/mm/vmscan.c
>>> +++ b/mm/vmscan.c
>>> @@ -863,7 +863,12 @@ static enum folio_references 
>>> folio_check_references(struct folio *folio,
>>>          if (vm_flags & VM_LOCKED)
>>>                  return FOLIOREF_ACTIVATE;
>>>
>>> -       /* rmap lock contention: rotate */
>>> +       /*
>>> +        * There are two cases to consider.
>>> +        * 1) Rmap lock contention: rotate.
>>> +        * 2) Skip the non-shared anonymous folio mapped solely by
>>> +        *    the single exiting process.
>>> +        */
>>>          if (referenced_ptes == -1)
>>>                  return FOLIOREF_KEEP;
>>>
>>> -- 
>>> 2.39.0
>>>
>> Thanks
>> Barry
>
Barry Song July 8, 2024, 12:41 p.m. UTC | #8
zhiguojiang <justinjiang@vivo.com> 于 2024年7月9日周二 00:25写道:

>
>
> 在 2024/7/8 20:17, zhiguojiang 写道:
> >
> >
> > 在 2024/7/8 19:02, Barry Song 写道:
> >> On Mon, Jul 8, 2024 at 9:04 PM Zhiguo Jiang <justinjiang@vivo.com>
> >> wrote:
> >>> The releasing process of the non-shared anonymous folio mapped
> >>> solely by
> >>> an exiting process may go through two flows: 1) the anonymous folio is
> >>> firstly is swaped-out into swapspace and transformed into a swp_entry
> >>> in shrink_folio_list; 2) then the swp_entry is released in the process
> >>> exiting flow. This will increase the cpu load of releasing a non-shared
> >>> anonymous folio mapped solely by an exiting process, because the folio
> >>> go through swap-out and the releasing the swapspace and swp_entry.
> >>>
> >>> When system is low memory, it is more likely to occur, because more
> >>> backend applidatuions will be killed.
> >>>
> >>> The modification is that shrink skips the non-shared anonymous folio
> >>> solely mapped by an exting process and the folio is only released
> >>> directly in the process exiting flow, which will save swap-out time
> >>> and alleviate the load of the process exiting.
> >>>
> >>> Signed-off-by: Zhiguo Jiang <justinjiang@vivo.com>
> >>> ---
> >>>
> >>> Change log:
> >>> v4->v5:
> >>> 1.Modify to skip non-shared anonymous folio only.
> >>> 2.Update comments for pra->referenced = -1.
> >>> v3->v4:
> >>> 1.Modify that the unshared folios mapped only in exiting task are skip.
> >>> v2->v3:
> >>> Nothing.
> >>> v1->v2:
> >>> 1.The VM_EXITING added in v1 patch is removed, because it will fail
> >>> to compile in 32-bit system.
> >>>
> >>>   mm/rmap.c   | 13 +++++++++++++
> >>>   mm/vmscan.c |  7 ++++++-
> >>>   2 files changed, 19 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/mm/rmap.c b/mm/rmap.c
> >>> index 26806b49a86f..5b5281d71dbb
> >>> --- a/mm/rmap.c
> >>> +++ b/mm/rmap.c
> >>> @@ -843,6 +843,19 @@ static bool folio_referenced_one(struct folio
> >>> *folio,
> >>>          int referenced = 0;
> >>>          unsigned long start = address, ptes = 0;
> >>>
> >>> +       /*
> >>> +        * Skip the non-shared anonymous folio mapped solely by
> >>> +        * the single exiting process, and release it directly
> >>> +        * in the process exiting.
> >>> +        */
> >>> +       if ((!atomic_read(&vma->vm_mm->mm_users) ||
> >>> +               test_bit(MMF_OOM_SKIP, &vma->vm_mm->flags)) &&
> >>> +               folio_test_anon(folio) &&
> >>> folio_test_swapbacked(folio) &&
> >>> +               !folio_likely_mapped_shared(folio)) {
> >>> +               pra->referenced = -1;
> >>> +               return false;
> >>> +       }
> >>> +
> >>>          while (page_vma_mapped_walk(&pvmw)) {
> >>>                  address = pvmw.address;
> > Sure, I agree with your modification suggestions. This way, using PTL
> > indeed sure
> > that the folio is mapped by this process.
> > Thanks
> >> As David suggested, what about the below?
> >>
> >> @@ -883,6 +870,21 @@ static bool folio_referenced_one(struct folio
> >> *folio,
> >>                          continue;
> >>                  }
> >>
> >> +               /*
> >> +                * Skip the non-shared anonymous folio mapped solely by
> >> +                * the single exiting process, and release it directly
> >> +                * in the process exiting.
> >> +                */
> >> +               if ((!atomic_read(&vma->vm_mm->mm_users) ||
> >> +                                       test_bit(MMF_OOM_SKIP,
> >> &vma->vm_mm->flags)) &&
> >> +                               folio_test_anon(folio) &&
> >> folio_test_swapbacked(folio) &&
> >> + !folio_likely_mapped_shared(folio)) {
> >> +                       pra->referenced = -1;
> >> +                       page_vma_mapped_walk_done(&pvmw);
> >> +                       return false;
> >> +               }
> >> +
> >>                  if (pvmw.pte) {
> >>                          if (lru_gen_enabled() &&
> >>                              pte_young(ptep_get(pvmw.pte))) {
> >>
> >>
> >> By the way, I am not convinced that using test_bit(MMF_OOM_SKIP,
> >> &vma->vm_mm->flags) is
> >> correct (I think it is wrong).   For example, global_init can
> >> directly have it:
> >>                  if (is_global_init(p)) {
> >>                          can_oom_reap = false;
> >>                          set_bit(MMF_OOM_SKIP, &mm->flags);
> >>                          pr_info("oom killer %d (%s) has mm pinned by
> >> %d (%s)\n",
> >>                                          task_pid_nr(victim),
> >> victim->comm,
> >>                                          task_pid_nr(p), p->comm);
> >>                          continue;
> >>                  }
> >>
> >> And exit_mmap() automatically has MMF_OOM_SKIP.
> >>
> >> What is the purpose of this check? Is there a better way to determine
> >> if a process is an
> >> OOM target? What about check_stable_address_space() ?
> > 1.Sorry, I overlook the situation with if (is_global_init(p)),
> > MMF_OOM_SKIP is indeed not suitable.
> >
> > 2.check_stable_address_space() can indicate oom_reaper, but it seems
> > unable to identify the situation where the process exits normally.
> > What about task_is_dying()? static inline bool task_is_dying(void) {
> > return tsk_is_oom_victim(current) || fatal_signal_pending(current) ||
> > (current->flags & PF_EXITING); } Thanks
> We can migrate task_is_dying() from mm/memcontrol.c to include/linux/oom.h
> > static inline bool task_is_dying(void)
> > {
> >     return tsk_is_oom_victim(current) || fatal_signal_pending(current) ||
> >         (current->flags & PF_EXITING);
> > }
>

no. current is kswapd.


> >>
> >>
> >>> diff --git a/mm/vmscan.c b/mm/vmscan.c
> >>> index 0761f91b407f..bae7a8bf6b3d
> >>> --- a/mm/vmscan.c
> >>> +++ b/mm/vmscan.c
> >>> @@ -863,7 +863,12 @@ static enum folio_references
> >>> folio_check_references(struct folio *folio,
> >>>          if (vm_flags & VM_LOCKED)
> >>>                  return FOLIOREF_ACTIVATE;
> >>>
> >>> -       /* rmap lock contention: rotate */
> >>> +       /*
> >>> +        * There are two cases to consider.
> >>> +        * 1) Rmap lock contention: rotate.
> >>> +        * 2) Skip the non-shared anonymous folio mapped solely by
> >>> +        *    the single exiting process.
> >>> +        */
> >>>          if (referenced_ptes == -1)
> >>>                  return FOLIOREF_KEEP;
> >>>
> >>> --
> >>> 2.39.0
> >>>
> >> Thanks
> >> Barry
> >
>
>
zhiguojiang July 8, 2024, 1:11 p.m. UTC | #9
在 2024/7/8 20:41, Barry Song 写道:
>
>
> zhiguojiang <justinjiang@vivo.com> 于 2024年7月9日周二 00:25写道:
>
>
>
>     在 2024/7/8 20:17, zhiguojiang 写道:
>     >
>     >
>     > 在 2024/7/8 19:02, Barry Song 写道:
>     >> On Mon, Jul 8, 2024 at 9:04 PM Zhiguo Jiang <justinjiang@vivo.com>
>     >> wrote:
>     >>> The releasing process of the non-shared anonymous folio mapped
>     >>> solely by
>     >>> an exiting process may go through two flows: 1) the anonymous
>     folio is
>     >>> firstly is swaped-out into swapspace and transformed into a
>     swp_entry
>     >>> in shrink_folio_list; 2) then the swp_entry is released in the
>     process
>     >>> exiting flow. This will increase the cpu load of releasing a
>     non-shared
>     >>> anonymous folio mapped solely by an exiting process, because
>     the folio
>     >>> go through swap-out and the releasing the swapspace and swp_entry.
>     >>>
>     >>> When system is low memory, it is more likely to occur, because
>     more
>     >>> backend applidatuions will be killed.
>     >>>
>     >>> The modification is that shrink skips the non-shared anonymous
>     folio
>     >>> solely mapped by an exting process and the folio is only released
>     >>> directly in the process exiting flow, which will save swap-out
>     time
>     >>> and alleviate the load of the process exiting.
>     >>>
>     >>> Signed-off-by: Zhiguo Jiang <justinjiang@vivo.com>
>     >>> ---
>     >>>
>     >>> Change log:
>     >>> v4->v5:
>     >>> 1.Modify to skip non-shared anonymous folio only.
>     >>> 2.Update comments for pra->referenced = -1.
>     >>> v3->v4:
>     >>> 1.Modify that the unshared folios mapped only in exiting task
>     are skip.
>     >>> v2->v3:
>     >>> Nothing.
>     >>> v1->v2:
>     >>> 1.The VM_EXITING added in v1 patch is removed, because it will
>     fail
>     >>> to compile in 32-bit system.
>     >>>
>     >>>   mm/rmap.c   | 13 +++++++++++++
>     >>>   mm/vmscan.c |  7 ++++++-
>     >>>   2 files changed, 19 insertions(+), 1 deletion(-)
>     >>>
>     >>> diff --git a/mm/rmap.c b/mm/rmap.c
>     >>> index 26806b49a86f..5b5281d71dbb
>     >>> --- a/mm/rmap.c
>     >>> +++ b/mm/rmap.c
>     >>> @@ -843,6 +843,19 @@ static bool folio_referenced_one(struct
>     folio
>     >>> *folio,
>     >>>          int referenced = 0;
>     >>>          unsigned long start = address, ptes = 0;
>     >>>
>     >>> +       /*
>     >>> +        * Skip the non-shared anonymous folio mapped solely by
>     >>> +        * the single exiting process, and release it directly
>     >>> +        * in the process exiting.
>     >>> +        */
>     >>> +       if ((!atomic_read(&vma->vm_mm->mm_users) ||
>     >>> +               test_bit(MMF_OOM_SKIP, &vma->vm_mm->flags)) &&
>     >>> +               folio_test_anon(folio) &&
>     >>> folio_test_swapbacked(folio) &&
>     >>> + !folio_likely_mapped_shared(folio)) {
>     >>> +               pra->referenced = -1;
>     >>> +               return false;
>     >>> +       }
>     >>> +
>     >>>          while (page_vma_mapped_walk(&pvmw)) {
>     >>>                  address = pvmw.address;
>     > Sure, I agree with your modification suggestions. This way,
>     using PTL
>     > indeed sure
>     > that the folio is mapped by this process.
>     > Thanks
>     >> As David suggested, what about the below?
>     >>
>     >> @@ -883,6 +870,21 @@ static bool folio_referenced_one(struct folio
>     >> *folio,
>     >>                          continue;
>     >>                  }
>     >>
>     >> +               /*
>     >> +                * Skip the non-shared anonymous folio mapped
>     solely by
>     >> +                * the single exiting process, and release it
>     directly
>     >> +                * in the process exiting.
>     >> +                */
>     >> +               if ((!atomic_read(&vma->vm_mm->mm_users) ||
>     >> + test_bit(MMF_OOM_SKIP,
>     >> &vma->vm_mm->flags)) &&
>     >> + folio_test_anon(folio) &&
>     >> folio_test_swapbacked(folio) &&
>     >> + !folio_likely_mapped_shared(folio)) {
>     >> +                       pra->referenced = -1;
>     >> + page_vma_mapped_walk_done(&pvmw);
>     >> +                       return false;
>     >> +               }
>     >> +
>     >>                  if (pvmw.pte) {
>     >>                          if (lru_gen_enabled() &&
>     >> pte_young(ptep_get(pvmw.pte))) {
>     >>
>     >>
>     >> By the way, I am not convinced that using test_bit(MMF_OOM_SKIP,
>     >> &vma->vm_mm->flags) is
>     >> correct (I think it is wrong).   For example, global_init can
>     >> directly have it:
>     >>                  if (is_global_init(p)) {
>     >>                          can_oom_reap = false;
>     >>                          set_bit(MMF_OOM_SKIP, &mm->flags);
>     >>                          pr_info("oom killer %d (%s) has mm
>     pinned by
>     >> %d (%s)\n",
>     >> task_pid_nr(victim),
>     >> victim->comm,
>     >> task_pid_nr(p), p->comm);
>     >>                          continue;
>     >>                  }
>     >>
>     >> And exit_mmap() automatically has MMF_OOM_SKIP.
>     >>
>     >> What is the purpose of this check? Is there a better way to
>     determine
>     >> if a process is an
>     >> OOM target? What about check_stable_address_space() ?
>     > 1.Sorry, I overlook the situation with if (is_global_init(p)),
>     > MMF_OOM_SKIP is indeed not suitable.
>     >
>     > 2.check_stable_address_space() can indicate oom_reaper, but it
>     seems
>     > unable to identify the situation where the process exits normally.
>     > What about task_is_dying()? static inline bool
>     task_is_dying(void) {
>     > return tsk_is_oom_victim(current) ||
>     fatal_signal_pending(current) ||
>     > (current->flags & PF_EXITING); } Thanks
>     We can migrate task_is_dying() from mm/memcontrol.c to
>     include/linux/oom.h
>     > static inline bool task_is_dying(void)
>     > {
>     >     return tsk_is_oom_victim(current) ||
>     fatal_signal_pending(current) ||
>     >         (current->flags & PF_EXITING);
>     > }
>
>
> no. current is kswapd.
Hi Barry,

It seems feasible for check_stable_address_space() replacing MMF_OOM_SKIP.
check_stable_address_space() can indicate oom kill, and 
!atomic_read(&vma->vm_mm->mm_users)
can indicate the normal process exiting.

         /*
          * Skip the non-shared anonymous folio mapped solely by
          * the single exiting process, and release it directly
          * in the process exiting.
          */
         if ((!atomic_read(&vma->vm_mm->mm_users) ||
             check_stable_address_space(vma->vm_mm)) &&
             folio_test_anon(folio) && folio_test_swapbacked(folio) &&
             !folio_likely_mapped_shared(folio)) {
             pra->referenced = -1;
             page_vma_mapped_walk_done(&pvmw);
             return false;
         }

Thanks
Zhiguo
>
>
>     >>
>     >>
>     >>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>     >>> index 0761f91b407f..bae7a8bf6b3d
>     >>> --- a/mm/vmscan.c
>     >>> +++ b/mm/vmscan.c
>     >>> @@ -863,7 +863,12 @@ static enum folio_references
>     >>> folio_check_references(struct folio *folio,
>     >>>          if (vm_flags & VM_LOCKED)
>     >>>                  return FOLIOREF_ACTIVATE;
>     >>>
>     >>> -       /* rmap lock contention: rotate */
>     >>> +       /*
>     >>> +        * There are two cases to consider.
>     >>> +        * 1) Rmap lock contention: rotate.
>     >>> +        * 2) Skip the non-shared anonymous folio mapped solely by
>     >>> +        *    the single exiting process.
>     >>> +        */
>     >>>          if (referenced_ptes == -1)
>     >>>                  return FOLIOREF_KEEP;
>     >>>
>     >>> --
>     >>> 2.39.0
>     >>>
>     >> Thanks
>     >> Barry
>     >
>
Barry Song July 8, 2024, 9:34 p.m. UTC | #10
On Tue, Jul 9, 2024 at 1:11 AM zhiguojiang <justinjiang@vivo.com> wrote:
>
>
>
> 在 2024/7/8 20:41, Barry Song 写道:
> >
> >
> > zhiguojiang <justinjiang@vivo.com> 于 2024年7月9日周二 00:25写道:
> >
> >
> >
> >     在 2024/7/8 20:17, zhiguojiang 写道:
> >     >
> >     >
> >     > 在 2024/7/8 19:02, Barry Song 写道:
> >     >> On Mon, Jul 8, 2024 at 9:04 PM Zhiguo Jiang <justinjiang@vivo.com>
> >     >> wrote:
> >     >>> The releasing process of the non-shared anonymous folio mapped
> >     >>> solely by
> >     >>> an exiting process may go through two flows: 1) the anonymous
> >     folio is
> >     >>> firstly is swaped-out into swapspace and transformed into a
> >     swp_entry
> >     >>> in shrink_folio_list; 2) then the swp_entry is released in the
> >     process
> >     >>> exiting flow. This will increase the cpu load of releasing a
> >     non-shared
> >     >>> anonymous folio mapped solely by an exiting process, because
> >     the folio
> >     >>> go through swap-out and the releasing the swapspace and swp_entry.
> >     >>>
> >     >>> When system is low memory, it is more likely to occur, because
> >     more
> >     >>> backend applidatuions will be killed.
> >     >>>
> >     >>> The modification is that shrink skips the non-shared anonymous
> >     folio
> >     >>> solely mapped by an exting process and the folio is only released
> >     >>> directly in the process exiting flow, which will save swap-out
> >     time
> >     >>> and alleviate the load of the process exiting.
> >     >>>
> >     >>> Signed-off-by: Zhiguo Jiang <justinjiang@vivo.com>
> >     >>> ---
> >     >>>
> >     >>> Change log:
> >     >>> v4->v5:
> >     >>> 1.Modify to skip non-shared anonymous folio only.
> >     >>> 2.Update comments for pra->referenced = -1.
> >     >>> v3->v4:
> >     >>> 1.Modify that the unshared folios mapped only in exiting task
> >     are skip.
> >     >>> v2->v3:
> >     >>> Nothing.
> >     >>> v1->v2:
> >     >>> 1.The VM_EXITING added in v1 patch is removed, because it will
> >     fail
> >     >>> to compile in 32-bit system.
> >     >>>
> >     >>>   mm/rmap.c   | 13 +++++++++++++
> >     >>>   mm/vmscan.c |  7 ++++++-
> >     >>>   2 files changed, 19 insertions(+), 1 deletion(-)
> >     >>>
> >     >>> diff --git a/mm/rmap.c b/mm/rmap.c
> >     >>> index 26806b49a86f..5b5281d71dbb
> >     >>> --- a/mm/rmap.c
> >     >>> +++ b/mm/rmap.c
> >     >>> @@ -843,6 +843,19 @@ static bool folio_referenced_one(struct
> >     folio
> >     >>> *folio,
> >     >>>          int referenced = 0;
> >     >>>          unsigned long start = address, ptes = 0;
> >     >>>
> >     >>> +       /*
> >     >>> +        * Skip the non-shared anonymous folio mapped solely by
> >     >>> +        * the single exiting process, and release it directly
> >     >>> +        * in the process exiting.
> >     >>> +        */
> >     >>> +       if ((!atomic_read(&vma->vm_mm->mm_users) ||
> >     >>> +               test_bit(MMF_OOM_SKIP, &vma->vm_mm->flags)) &&
> >     >>> +               folio_test_anon(folio) &&
> >     >>> folio_test_swapbacked(folio) &&
> >     >>> + !folio_likely_mapped_shared(folio)) {
> >     >>> +               pra->referenced = -1;
> >     >>> +               return false;
> >     >>> +       }
> >     >>> +
> >     >>>          while (page_vma_mapped_walk(&pvmw)) {
> >     >>>                  address = pvmw.address;
> >     > Sure, I agree with your modification suggestions. This way,
> >     using PTL
> >     > indeed sure
> >     > that the folio is mapped by this process.
> >     > Thanks
> >     >> As David suggested, what about the below?
> >     >>
> >     >> @@ -883,6 +870,21 @@ static bool folio_referenced_one(struct folio
> >     >> *folio,
> >     >>                          continue;
> >     >>                  }
> >     >>
> >     >> +               /*
> >     >> +                * Skip the non-shared anonymous folio mapped
> >     solely by
> >     >> +                * the single exiting process, and release it
> >     directly
> >     >> +                * in the process exiting.
> >     >> +                */
> >     >> +               if ((!atomic_read(&vma->vm_mm->mm_users) ||
> >     >> + test_bit(MMF_OOM_SKIP,
> >     >> &vma->vm_mm->flags)) &&
> >     >> + folio_test_anon(folio) &&
> >     >> folio_test_swapbacked(folio) &&
> >     >> + !folio_likely_mapped_shared(folio)) {
> >     >> +                       pra->referenced = -1;
> >     >> + page_vma_mapped_walk_done(&pvmw);
> >     >> +                       return false;
> >     >> +               }
> >     >> +
> >     >>                  if (pvmw.pte) {
> >     >>                          if (lru_gen_enabled() &&
> >     >> pte_young(ptep_get(pvmw.pte))) {
> >     >>
> >     >>
> >     >> By the way, I am not convinced that using test_bit(MMF_OOM_SKIP,
> >     >> &vma->vm_mm->flags) is
> >     >> correct (I think it is wrong).   For example, global_init can
> >     >> directly have it:
> >     >>                  if (is_global_init(p)) {
> >     >>                          can_oom_reap = false;
> >     >>                          set_bit(MMF_OOM_SKIP, &mm->flags);
> >     >>                          pr_info("oom killer %d (%s) has mm
> >     pinned by
> >     >> %d (%s)\n",
> >     >> task_pid_nr(victim),
> >     >> victim->comm,
> >     >> task_pid_nr(p), p->comm);
> >     >>                          continue;
> >     >>                  }
> >     >>
> >     >> And exit_mmap() automatically has MMF_OOM_SKIP.
> >     >>
> >     >> What is the purpose of this check? Is there a better way to
> >     determine
> >     >> if a process is an
> >     >> OOM target? What about check_stable_address_space() ?
> >     > 1.Sorry, I overlook the situation with if (is_global_init(p)),
> >     > MMF_OOM_SKIP is indeed not suitable.
> >     >
> >     > 2.check_stable_address_space() can indicate oom_reaper, but it
> >     seems
> >     > unable to identify the situation where the process exits normally.
> >     > What about task_is_dying()? static inline bool
> >     task_is_dying(void) {
> >     > return tsk_is_oom_victim(current) ||
> >     fatal_signal_pending(current) ||
> >     > (current->flags & PF_EXITING); } Thanks
> >     We can migrate task_is_dying() from mm/memcontrol.c to
> >     include/linux/oom.h
> >     > static inline bool task_is_dying(void)
> >     > {
> >     >     return tsk_is_oom_victim(current) ||
> >     fatal_signal_pending(current) ||
> >     >         (current->flags & PF_EXITING);
> >     > }
> >
> >
> > no. current is kswapd.
> Hi Barry,
>
> It seems feasible for check_stable_address_space() replacing MMF_OOM_SKIP.
> check_stable_address_space() can indicate oom kill, and
> !atomic_read(&vma->vm_mm->mm_users)
> can indicate the normal process exiting.
>
>          /*
>           * Skip the non-shared anonymous folio mapped solely by
>           * the single exiting process, and release it directly
>           * in the process exiting.
>           */
>          if ((!atomic_read(&vma->vm_mm->mm_users) ||
>              check_stable_address_space(vma->vm_mm)) &&
>              folio_test_anon(folio) && folio_test_swapbacked(folio) &&
>              !folio_likely_mapped_shared(folio)) {
>              pra->referenced = -1;
>              page_vma_mapped_walk_done(&pvmw);
>              return false;
>          }
>

Yes, + David, Willy (when you send a new version, please CC people who have
participated and describe how you have addressed comments from those
people.)

I also think we actually can remove "folio_test_anon(folio)".

So It could be,

@@ -883,6 +871,21 @@ static bool folio_referenced_one(struct folio *folio,
                        continue;
                }

+               /*
+                * Skip the non-shared swapbacked folio mapped solely by
+                * the exiting or OOM-reaped process. This avoids redundant
+                * swap-out followed by an immediate unmap.
+                */
+               if ((!atomic_read(&vma->vm_mm->mm_users) ||
+                    check_stable_address_space(vma->vm_mm)) &&
+                    folio_test_swapbacked(folio) &&
+                    !folio_likely_mapped_shared(folio)) {
+                       pra->referenced = -1;
+                       page_vma_mapped_walk_done(&pvmw);
+                       return false;
+               }
+
                if (pvmw.pte) {
                        if (lru_gen_enabled() &&
                            pte_young(ptep_get(pvmw.pte))) {

> Thanks
> Zhiguo
> >
> >
> >     >>
> >     >>
> >     >>> diff --git a/mm/vmscan.c b/mm/vmscan.c
> >     >>> index 0761f91b407f..bae7a8bf6b3d
> >     >>> --- a/mm/vmscan.c
> >     >>> +++ b/mm/vmscan.c
> >     >>> @@ -863,7 +863,12 @@ static enum folio_references
> >     >>> folio_check_references(struct folio *folio,
> >     >>>          if (vm_flags & VM_LOCKED)
> >     >>>                  return FOLIOREF_ACTIVATE;
> >     >>>
> >     >>> -       /* rmap lock contention: rotate */
> >     >>> +       /*
> >     >>> +        * There are two cases to consider.
> >     >>> +        * 1) Rmap lock contention: rotate.
> >     >>> +        * 2) Skip the non-shared anonymous folio mapped solely by
> >     >>> +        *    the single exiting process.
> >     >>> +        */
> >     >>>          if (referenced_ptes == -1)
> >     >>>                  return FOLIOREF_KEEP;
> >     >>>
> >     >>> --
> >     >>> 2.39.0
> >     >>>
> >     >> Thanks
> >     >> Barry
> >     >
> >
>

Thanks
Barry
zhiguojiang July 9, 2024, 4:23 a.m. UTC | #11
在 2024/7/9 5:34, Barry Song 写道:
> On Tue, Jul 9, 2024 at 1:11 AM zhiguojiang <justinjiang@vivo.com> wrote:
>>
>>
>> 在 2024/7/8 20:41, Barry Song 写道:
>>>
>>> zhiguojiang <justinjiang@vivo.com> 于 2024年7月9日周二 00:25写道:
>>>
>>>
>>>
>>>      在 2024/7/8 20:17, zhiguojiang 写道:
>>>      >
>>>      >
>>>      > 在 2024/7/8 19:02, Barry Song 写道:
>>>      >> On Mon, Jul 8, 2024 at 9:04 PM Zhiguo Jiang <justinjiang@vivo.com>
>>>      >> wrote:
>>>      >>> The releasing process of the non-shared anonymous folio mapped
>>>      >>> solely by
>>>      >>> an exiting process may go through two flows: 1) the anonymous
>>>      folio is
>>>      >>> firstly is swaped-out into swapspace and transformed into a
>>>      swp_entry
>>>      >>> in shrink_folio_list; 2) then the swp_entry is released in the
>>>      process
>>>      >>> exiting flow. This will increase the cpu load of releasing a
>>>      non-shared
>>>      >>> anonymous folio mapped solely by an exiting process, because
>>>      the folio
>>>      >>> go through swap-out and the releasing the swapspace and swp_entry.
>>>      >>>
>>>      >>> When system is low memory, it is more likely to occur, because
>>>      more
>>>      >>> backend applidatuions will be killed.
>>>      >>>
>>>      >>> The modification is that shrink skips the non-shared anonymous
>>>      folio
>>>      >>> solely mapped by an exting process and the folio is only released
>>>      >>> directly in the process exiting flow, which will save swap-out
>>>      time
>>>      >>> and alleviate the load of the process exiting.
>>>      >>>
>>>      >>> Signed-off-by: Zhiguo Jiang <justinjiang@vivo.com>
>>>      >>> ---
>>>      >>>
>>>      >>> Change log:
>>>      >>> v4->v5:
>>>      >>> 1.Modify to skip non-shared anonymous folio only.
>>>      >>> 2.Update comments for pra->referenced = -1.
>>>      >>> v3->v4:
>>>      >>> 1.Modify that the unshared folios mapped only in exiting task
>>>      are skip.
>>>      >>> v2->v3:
>>>      >>> Nothing.
>>>      >>> v1->v2:
>>>      >>> 1.The VM_EXITING added in v1 patch is removed, because it will
>>>      fail
>>>      >>> to compile in 32-bit system.
>>>      >>>
>>>      >>>   mm/rmap.c   | 13 +++++++++++++
>>>      >>>   mm/vmscan.c |  7 ++++++-
>>>      >>>   2 files changed, 19 insertions(+), 1 deletion(-)
>>>      >>>
>>>      >>> diff --git a/mm/rmap.c b/mm/rmap.c
>>>      >>> index 26806b49a86f..5b5281d71dbb
>>>      >>> --- a/mm/rmap.c
>>>      >>> +++ b/mm/rmap.c
>>>      >>> @@ -843,6 +843,19 @@ static bool folio_referenced_one(struct
>>>      folio
>>>      >>> *folio,
>>>      >>>          int referenced = 0;
>>>      >>>          unsigned long start = address, ptes = 0;
>>>      >>>
>>>      >>> +       /*
>>>      >>> +        * Skip the non-shared anonymous folio mapped solely by
>>>      >>> +        * the single exiting process, and release it directly
>>>      >>> +        * in the process exiting.
>>>      >>> +        */
>>>      >>> +       if ((!atomic_read(&vma->vm_mm->mm_users) ||
>>>      >>> +               test_bit(MMF_OOM_SKIP, &vma->vm_mm->flags)) &&
>>>      >>> +               folio_test_anon(folio) &&
>>>      >>> folio_test_swapbacked(folio) &&
>>>      >>> + !folio_likely_mapped_shared(folio)) {
>>>      >>> +               pra->referenced = -1;
>>>      >>> +               return false;
>>>      >>> +       }
>>>      >>> +
>>>      >>>          while (page_vma_mapped_walk(&pvmw)) {
>>>      >>>                  address = pvmw.address;
>>>      > Sure, I agree with your modification suggestions. This way,
>>>      using PTL
>>>      > indeed sure
>>>      > that the folio is mapped by this process.
>>>      > Thanks
>>>      >> As David suggested, what about the below?
>>>      >>
>>>      >> @@ -883,6 +870,21 @@ static bool folio_referenced_one(struct folio
>>>      >> *folio,
>>>      >>                          continue;
>>>      >>                  }
>>>      >>
>>>      >> +               /*
>>>      >> +                * Skip the non-shared anonymous folio mapped
>>>      solely by
>>>      >> +                * the single exiting process, and release it
>>>      directly
>>>      >> +                * in the process exiting.
>>>      >> +                */
>>>      >> +               if ((!atomic_read(&vma->vm_mm->mm_users) ||
>>>      >> + test_bit(MMF_OOM_SKIP,
>>>      >> &vma->vm_mm->flags)) &&
>>>      >> + folio_test_anon(folio) &&
>>>      >> folio_test_swapbacked(folio) &&
>>>      >> + !folio_likely_mapped_shared(folio)) {
>>>      >> +                       pra->referenced = -1;
>>>      >> + page_vma_mapped_walk_done(&pvmw);
>>>      >> +                       return false;
>>>      >> +               }
>>>      >> +
>>>      >>                  if (pvmw.pte) {
>>>      >>                          if (lru_gen_enabled() &&
>>>      >> pte_young(ptep_get(pvmw.pte))) {
>>>      >>
>>>      >>
>>>      >> By the way, I am not convinced that using test_bit(MMF_OOM_SKIP,
>>>      >> &vma->vm_mm->flags) is
>>>      >> correct (I think it is wrong).   For example, global_init can
>>>      >> directly have it:
>>>      >>                  if (is_global_init(p)) {
>>>      >>                          can_oom_reap = false;
>>>      >>                          set_bit(MMF_OOM_SKIP, &mm->flags);
>>>      >>                          pr_info("oom killer %d (%s) has mm
>>>      pinned by
>>>      >> %d (%s)\n",
>>>      >> task_pid_nr(victim),
>>>      >> victim->comm,
>>>      >> task_pid_nr(p), p->comm);
>>>      >>                          continue;
>>>      >>                  }
>>>      >>
>>>      >> And exit_mmap() automatically has MMF_OOM_SKIP.
>>>      >>
>>>      >> What is the purpose of this check? Is there a better way to
>>>      determine
>>>      >> if a process is an
>>>      >> OOM target? What about check_stable_address_space() ?
>>>      > 1.Sorry, I overlook the situation with if (is_global_init(p)),
>>>      > MMF_OOM_SKIP is indeed not suitable.
>>>      >
>>>      > 2.check_stable_address_space() can indicate oom_reaper, but it
>>>      seems
>>>      > unable to identify the situation where the process exits normally.
>>>      > What about task_is_dying()? static inline bool
>>>      task_is_dying(void) {
>>>      > return tsk_is_oom_victim(current) ||
>>>      fatal_signal_pending(current) ||
>>>      > (current->flags & PF_EXITING); } Thanks
>>>      We can migrate task_is_dying() from mm/memcontrol.c to
>>>      include/linux/oom.h
>>>      > static inline bool task_is_dying(void)
>>>      > {
>>>      >     return tsk_is_oom_victim(current) ||
>>>      fatal_signal_pending(current) ||
>>>      >         (current->flags & PF_EXITING);
>>>      > }
>>>
>>>
>>> no. current is kswapd.
>> Hi Barry,
>>
>> It seems feasible for check_stable_address_space() replacing MMF_OOM_SKIP.
>> check_stable_address_space() can indicate oom kill, and
>> !atomic_read(&vma->vm_mm->mm_users)
>> can indicate the normal process exiting.
>>
>>           /*
>>            * Skip the non-shared anonymous folio mapped solely by
>>            * the single exiting process, and release it directly
>>            * in the process exiting.
>>            */
>>           if ((!atomic_read(&vma->vm_mm->mm_users) ||
>>               check_stable_address_space(vma->vm_mm)) &&
>>               folio_test_anon(folio) && folio_test_swapbacked(folio) &&
>>               !folio_likely_mapped_shared(folio)) {
>>               pra->referenced = -1;
>>               page_vma_mapped_walk_done(&pvmw);
>>               return false;
>>           }
>>
> Yes, + David, Willy (when you send a new version, please CC people who have
> participated and describe how you have addressed comments from those
> people.)
>
> I also think we actually can remove "folio_test_anon(folio)".
>
> So It could be,
>
> @@ -883,6 +871,21 @@ static bool folio_referenced_one(struct folio *folio,
>                          continue;
>                  }
>
> +               /*
> +                * Skip the non-shared swapbacked folio mapped solely by
> +                * the exiting or OOM-reaped process. This avoids redundant
> +                * swap-out followed by an immediate unmap.
> +                */
> +               if ((!atomic_read(&vma->vm_mm->mm_users) ||
> +                    check_stable_address_space(vma->vm_mm)) &&
> +                    folio_test_swapbacked(folio) &&
> +                    !folio_likely_mapped_shared(folio)) {
> +                       pra->referenced = -1;
> +                       page_vma_mapped_walk_done(&pvmw);
> +                       return false;
> +               }
> +
>                  if (pvmw.pte) {
>                          if (lru_gen_enabled() &&
>                              pte_young(ptep_get(pvmw.pte))) {
Ok,  update in patch v6:
https://lore.kernel.org/linux-mm/20240709042122.631-1-justinjiang@vivo.com/

Thanks
>
>> Thanks
>> Zhiguo
>>>
>>>      >>
>>>      >>
>>>      >>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>      >>> index 0761f91b407f..bae7a8bf6b3d
>>>      >>> --- a/mm/vmscan.c
>>>      >>> +++ b/mm/vmscan.c
>>>      >>> @@ -863,7 +863,12 @@ static enum folio_references
>>>      >>> folio_check_references(struct folio *folio,
>>>      >>>          if (vm_flags & VM_LOCKED)
>>>      >>>                  return FOLIOREF_ACTIVATE;
>>>      >>>
>>>      >>> -       /* rmap lock contention: rotate */
>>>      >>> +       /*
>>>      >>> +        * There are two cases to consider.
>>>      >>> +        * 1) Rmap lock contention: rotate.
>>>      >>> +        * 2) Skip the non-shared anonymous folio mapped solely by
>>>      >>> +        *    the single exiting process.
>>>      >>> +        */
>>>      >>>          if (referenced_ptes == -1)
>>>      >>>                  return FOLIOREF_KEEP;
>>>      >>>
>>>      >>> --
>>>      >>> 2.39.0
>>>      >>>
>>>      >> Thanks
>>>      >> Barry
>>>      >
>>>
> Thanks
> Barry
diff mbox series

Patch

diff --git a/mm/rmap.c b/mm/rmap.c
index 26806b49a86f..5b5281d71dbb
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -843,6 +843,19 @@  static bool folio_referenced_one(struct folio *folio,
 	int referenced = 0;
 	unsigned long start = address, ptes = 0;
 
+	/*
+	 * Skip the non-shared anonymous folio mapped solely by
+	 * the single exiting process, and release it directly
+	 * in the process exiting.
+	 */
+	if ((!atomic_read(&vma->vm_mm->mm_users) ||
+		test_bit(MMF_OOM_SKIP, &vma->vm_mm->flags)) &&
+		folio_test_anon(folio) && folio_test_swapbacked(folio) &&
+		!folio_likely_mapped_shared(folio)) {
+		pra->referenced = -1;
+		return false;
+	}
+
 	while (page_vma_mapped_walk(&pvmw)) {
 		address = pvmw.address;
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 0761f91b407f..bae7a8bf6b3d
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -863,7 +863,12 @@  static enum folio_references folio_check_references(struct folio *folio,
 	if (vm_flags & VM_LOCKED)
 		return FOLIOREF_ACTIVATE;
 
-	/* rmap lock contention: rotate */
+	/*
+	 * There are two cases to consider.
+	 * 1) Rmap lock contention: rotate.
+	 * 2) Skip the non-shared anonymous folio mapped solely by
+	 *    the single exiting process.
+	 */
 	if (referenced_ptes == -1)
 		return FOLIOREF_KEEP;