mm/gup: don't check page lru flag before draining it

Message ID	1717498121-20926-1-git-send-email-yangge1116@126.com (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: yangge1116@126.com To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, baolin.wang@linux.alibaba.com, liuzixing@hygon.cn, yangge <yangge1116@126.com> Subject: [PATCH] mm/gup: don't check page lru flag before draining it Date: Tue, 4 Jun 2024 18:48:41 +0800 Message-Id: <1717498121-20926-1-git-send-email-yangge1116@126.com> Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	mm/gup: don't check page lru flag before draining it \| expand mm/gup: don't check page lru flag before draining it

Ge Yang June 4, 2024, 10:48 a.m. UTC

From: yangge <yangge1116@126.com>

If a page is added in pagevec, its ref count increases one, remove
the page from pagevec decreases one. Page migration requires the
page is not referenced by others except page mapping. Before
migrating a page, we should try to drain the page from pagevec in
case the page is in it, however, folio_test_lru() is not sufficient
to tell whether the page is in pagevec or not, if the page is in
pagevec, the migration will fail.

Remove the condition and drain lru once to ensure the page is not
referenced by pagevec.

Signed-off-by: yangge <yangge1116@126.com>
---
 mm/gup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

David Hildenbrand June 4, 2024, 1:47 p.m. UTC | #1

On 04.06.24 12:48, yangge1116@126.com wrote:
> From: yangge <yangge1116@126.com>
> 
> If a page is added in pagevec, its ref count increases one, remove
> the page from pagevec decreases one. Page migration requires the
> page is not referenced by others except page mapping. Before
> migrating a page, we should try to drain the page from pagevec in
> case the page is in it, however, folio_test_lru() is not sufficient
> to tell whether the page is in pagevec or not, if the page is in
> pagevec, the migration will fail.
> 
> Remove the condition and drain lru once to ensure the page is not
> referenced by pagevec.

What you are saying is that we might have a page on which 
folio_test_lru() succeeds, that was added to one of the cpu_fbatches, 
correct?

Can you describe under which circumstances that happens?

Ge Yang June 5, 2024, 1:18 a.m. UTC | #2

在 2024/6/4 下午9:47, David Hildenbrand 写道:
> On 04.06.24 12:48, yangge1116@126.com wrote:
>> From: yangge <yangge1116@126.com>
>>
>> If a page is added in pagevec, its ref count increases one, remove
>> the page from pagevec decreases one. Page migration requires the
>> page is not referenced by others except page mapping. Before
>> migrating a page, we should try to drain the page from pagevec in
>> case the page is in it, however, folio_test_lru() is not sufficient
>> to tell whether the page is in pagevec or not, if the page is in
>> pagevec, the migration will fail.
>>
>> Remove the condition and drain lru once to ensure the page is not
>> referenced by pagevec.
> 
> What you are saying is that we might have a page on which 
> folio_test_lru() succeeds, that was added to one of the cpu_fbatches, 
> correct?

Yes

> 
> Can you describe under which circumstances that happens?
> 

If we call folio_activate() to move a page from inactive LRU list to 
active LRU list, the page is not only in LRU list, but also in one of 
the cpu_fbatches.

void folio_activate(struct folio *folio)
{
     if (folio_test_lru(folio) && !folio_test_active(folio) &&
         !folio_test_unevictable(folio)) {
         struct folio_batch *fbatch;

         folio_get(folio);
         //After this, folio is in LRU list, and its ref count have 
increased one.

         local_lock(&cpu_fbatches.lock);
         fbatch = this_cpu_ptr(&cpu_fbatches.activate);
         folio_batch_add_and_move(fbatch, folio, folio_activate_fn);
         local_unlock(&cpu_fbatches.lock);
     }
}

David Hildenbrand June 5, 2024, 9:41 a.m. UTC | #3

On 05.06.24 03:18, yangge1116 wrote:
> 
> 
> 在 2024/6/4 下午9:47, David Hildenbrand 写道:
>> On 04.06.24 12:48, yangge1116@126.com wrote:
>>> From: yangge <yangge1116@126.com>
>>>
>>> If a page is added in pagevec, its ref count increases one, remove
>>> the page from pagevec decreases one. Page migration requires the
>>> page is not referenced by others except page mapping. Before
>>> migrating a page, we should try to drain the page from pagevec in
>>> case the page is in it, however, folio_test_lru() is not sufficient
>>> to tell whether the page is in pagevec or not, if the page is in
>>> pagevec, the migration will fail.
>>>
>>> Remove the condition and drain lru once to ensure the page is not
>>> referenced by pagevec.
>>
>> What you are saying is that we might have a page on which
>> folio_test_lru() succeeds, that was added to one of the cpu_fbatches,
>> correct?
> 
> Yes
> 
>>
>> Can you describe under which circumstances that happens?
>>
> 
> If we call folio_activate() to move a page from inactive LRU list to
> active LRU list, the page is not only in LRU list, but also in one of
> the cpu_fbatches.
> 
> void folio_activate(struct folio *folio)
> {
>       if (folio_test_lru(folio) && !folio_test_active(folio) &&
>           !folio_test_unevictable(folio)) {
>           struct folio_batch *fbatch;
> 
>           folio_get(folio);
>           //After this, folio is in LRU list, and its ref count have
> increased one.
> 
>           local_lock(&cpu_fbatches.lock);
>           fbatch = this_cpu_ptr(&cpu_fbatches.activate);
>           folio_batch_add_and_move(fbatch, folio, folio_activate_fn);
>           local_unlock(&cpu_fbatches.lock);
>       }
> }

Interesting, the !SMP variant does the folio_test_clear_lru().

It would be really helpful if we could reliably identify whether LRU 
batching code has a raised reference on a folio.

We have the same scenario in
* folio_deactivate()
* folio_mark_lazyfree()

In folio_batch_move_lru() we do the folio_test_clear_lru(folio).

No expert on that code, I'm wondering if we could move the 
folio_test_clear_lru() out, such that we can more reliably identify 
whether a folio is on the LRU batch or not.

David Hildenbrand June 5, 2024, 9:53 a.m. UTC | #4

On 05.06.24 11:41, David Hildenbrand wrote:
> On 05.06.24 03:18, yangge1116 wrote:
>>
>>
>> 在 2024/6/4 下午9:47, David Hildenbrand 写道:
>>> On 04.06.24 12:48, yangge1116@126.com wrote:
>>>> From: yangge <yangge1116@126.com>
>>>>
>>>> If a page is added in pagevec, its ref count increases one, remove
>>>> the page from pagevec decreases one. Page migration requires the
>>>> page is not referenced by others except page mapping. Before
>>>> migrating a page, we should try to drain the page from pagevec in
>>>> case the page is in it, however, folio_test_lru() is not sufficient
>>>> to tell whether the page is in pagevec or not, if the page is in
>>>> pagevec, the migration will fail.
>>>>
>>>> Remove the condition and drain lru once to ensure the page is not
>>>> referenced by pagevec.
>>>
>>> What you are saying is that we might have a page on which
>>> folio_test_lru() succeeds, that was added to one of the cpu_fbatches,
>>> correct?
>>
>> Yes
>>
>>>
>>> Can you describe under which circumstances that happens?
>>>
>>
>> If we call folio_activate() to move a page from inactive LRU list to
>> active LRU list, the page is not only in LRU list, but also in one of
>> the cpu_fbatches.
>>
>> void folio_activate(struct folio *folio)
>> {
>>        if (folio_test_lru(folio) && !folio_test_active(folio) &&
>>            !folio_test_unevictable(folio)) {
>>            struct folio_batch *fbatch;
>>
>>            folio_get(folio);
>>            //After this, folio is in LRU list, and its ref count have
>> increased one.
>>
>>            local_lock(&cpu_fbatches.lock);
>>            fbatch = this_cpu_ptr(&cpu_fbatches.activate);
>>            folio_batch_add_and_move(fbatch, folio, folio_activate_fn);
>>            local_unlock(&cpu_fbatches.lock);
>>        }
>> }
> 
> Interesting, the !SMP variant does the folio_test_clear_lru().
> 
> It would be really helpful if we could reliably identify whether LRU
> batching code has a raised reference on a folio.
> 
> We have the same scenario in
> * folio_deactivate()
> * folio_mark_lazyfree()
> 
> In folio_batch_move_lru() we do the folio_test_clear_lru(folio).
> 
> No expert on that code, I'm wondering if we could move the
> folio_test_clear_lru() out, such that we can more reliably identify
> whether a folio is on the LRU batch or not.

I'm sure there would be something extremely broken with the following
(I don't know what I'm doing ;) ), but I wonder if there would be a way
to make something like that work (and perform well enough?).

diff --git a/mm/swap.c b/mm/swap.c
index 67786cb771305..642e471c3ec5a 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -212,10 +212,6 @@ static void folio_batch_move_lru(struct folio_batch *fbatch, move_fn_t move_fn)
         for (i = 0; i < folio_batch_count(fbatch); i++) {
                 struct folio *folio = fbatch->folios[i];
  
-               /* block memcg migration while the folio moves between lru */
-               if (move_fn != lru_add_fn && !folio_test_clear_lru(folio))
-                       continue;
-
                 folio_lruvec_relock_irqsave(folio, &lruvec, &flags);
                 move_fn(lruvec, folio);
  
@@ -255,8 +251,9 @@ static void lru_move_tail_fn(struct lruvec *lruvec, struct folio *folio)
   */
  void folio_rotate_reclaimable(struct folio *folio)
  {
-       if (!folio_test_locked(folio) && !folio_test_dirty(folio) &&
-           !folio_test_unevictable(folio) && folio_test_lru(folio)) {
+       if (folio_test_lru(folio) && !folio_test_locked(folio) &&
+           !folio_test_dirty(folio) && !folio_test_unevictable(folio) &&
+           folio_test_clear_lru(folio)) {
                 struct folio_batch *fbatch;
                 unsigned long flags;
  
@@ -354,7 +351,7 @@ static void folio_activate_drain(int cpu)
  void folio_activate(struct folio *folio)
  {
         if (folio_test_lru(folio) && !folio_test_active(folio) &&
-           !folio_test_unevictable(folio)) {
+           !folio_test_unevictable(folio) && folio_test_clear_lru(folio)) {
                 struct folio_batch *fbatch;
  
                 folio_get(folio);
@@ -699,6 +696,8 @@ void deactivate_file_folio(struct folio *folio)
         /* Deactivating an unevictable folio will not accelerate reclaim */
         if (folio_test_unevictable(folio))
                 return;
+       if (!folio_test_clear_lru(folio))
+               return;
  
         folio_get(folio);
         local_lock(&cpu_fbatches.lock);
@@ -718,7 +717,8 @@ void deactivate_file_folio(struct folio *folio)
  void folio_deactivate(struct folio *folio)
  {
         if (folio_test_lru(folio) && !folio_test_unevictable(folio) &&
-           (folio_test_active(folio) || lru_gen_enabled())) {
+           (folio_test_active(folio) || lru_gen_enabled()) &&
+           folio_test_clear_lru(folio)) {
                 struct folio_batch *fbatch;
  
                 folio_get(folio);
@@ -740,7 +740,8 @@ void folio_mark_lazyfree(struct folio *folio)
  {
         if (folio_test_lru(folio) && folio_test_anon(folio) &&
             folio_test_swapbacked(folio) && !folio_test_swapcache(folio) &&
-           !folio_test_unevictable(folio)) {
+           !folio_test_unevictable(folio) &&
+           folio_test_clear_lru(folio)) {
                 struct folio_batch *fbatch;
  
                 folio_get(folio);

Baolin Wang June 5, 2024, 11:37 a.m. UTC | #5

On 2024/6/5 17:53, David Hildenbrand wrote:
> On 05.06.24 11:41, David Hildenbrand wrote:
>> On 05.06.24 03:18, yangge1116 wrote:
>>>
>>>
>>> 在 2024/6/4 下午9:47, David Hildenbrand 写道:
>>>> On 04.06.24 12:48, yangge1116@126.com wrote:
>>>>> From: yangge <yangge1116@126.com>
>>>>>
>>>>> If a page is added in pagevec, its ref count increases one, remove
>>>>> the page from pagevec decreases one. Page migration requires the
>>>>> page is not referenced by others except page mapping. Before
>>>>> migrating a page, we should try to drain the page from pagevec in
>>>>> case the page is in it, however, folio_test_lru() is not sufficient
>>>>> to tell whether the page is in pagevec or not, if the page is in
>>>>> pagevec, the migration will fail.
>>>>>
>>>>> Remove the condition and drain lru once to ensure the page is not
>>>>> referenced by pagevec.
>>>>
>>>> What you are saying is that we might have a page on which
>>>> folio_test_lru() succeeds, that was added to one of the cpu_fbatches,
>>>> correct?
>>>
>>> Yes
>>>
>>>>
>>>> Can you describe under which circumstances that happens?
>>>>
>>>
>>> If we call folio_activate() to move a page from inactive LRU list to
>>> active LRU list, the page is not only in LRU list, but also in one of
>>> the cpu_fbatches.
>>>
>>> void folio_activate(struct folio *folio)
>>> {
>>>        if (folio_test_lru(folio) && !folio_test_active(folio) &&
>>>            !folio_test_unevictable(folio)) {
>>>            struct folio_batch *fbatch;
>>>
>>>            folio_get(folio);
>>>            //After this, folio is in LRU list, and its ref count have
>>> increased one.
>>>
>>>            local_lock(&cpu_fbatches.lock);
>>>            fbatch = this_cpu_ptr(&cpu_fbatches.activate);
>>>            folio_batch_add_and_move(fbatch, folio, folio_activate_fn);
>>>            local_unlock(&cpu_fbatches.lock);
>>>        }
>>> }
>>
>> Interesting, the !SMP variant does the folio_test_clear_lru().
>>
>> It would be really helpful if we could reliably identify whether LRU
>> batching code has a raised reference on a folio.
>>
>> We have the same scenario in
>> * folio_deactivate()
>> * folio_mark_lazyfree()
>>
>> In folio_batch_move_lru() we do the folio_test_clear_lru(folio).
>>
>> No expert on that code, I'm wondering if we could move the
>> folio_test_clear_lru() out, such that we can more reliably identify
>> whether a folio is on the LRU batch or not.
> 
> I'm sure there would be something extremely broken with the following
> (I don't know what I'm doing ;) ), but I wonder if there would be a way
> to make something like that work (and perform well enough?).
> 
> diff --git a/mm/swap.c b/mm/swap.c
> index 67786cb771305..642e471c3ec5a 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -212,10 +212,6 @@ static void folio_batch_move_lru(struct folio_batch 
> *fbatch, move_fn_t move_fn)
>          for (i = 0; i < folio_batch_count(fbatch); i++) {
>                  struct folio *folio = fbatch->folios[i];
> 
> -               /* block memcg migration while the folio moves between 
> lru */
> -               if (move_fn != lru_add_fn && !folio_test_clear_lru(folio))
> -                       continue;
> -
>                  folio_lruvec_relock_irqsave(folio, &lruvec, &flags);
>                  move_fn(lruvec, folio);
> 
> @@ -255,8 +251,9 @@ static void lru_move_tail_fn(struct lruvec *lruvec, 
> struct folio *folio)
>    */
>   void folio_rotate_reclaimable(struct folio *folio)
>   {
> -       if (!folio_test_locked(folio) && !folio_test_dirty(folio) &&
> -           !folio_test_unevictable(folio) && folio_test_lru(folio)) {
> +       if (folio_test_lru(folio) && !folio_test_locked(folio) &&
> +           !folio_test_dirty(folio) && !folio_test_unevictable(folio) &&
> +           folio_test_clear_lru(folio)) {
>                  struct folio_batch *fbatch;
>                  unsigned long flags;
> 
> @@ -354,7 +351,7 @@ static void folio_activate_drain(int cpu)
>   void folio_activate(struct folio *folio)
>   {
>          if (folio_test_lru(folio) && !folio_test_active(folio) &&
> -           !folio_test_unevictable(folio)) {
> +           !folio_test_unevictable(folio) && 
> folio_test_clear_lru(folio)) {

IMO, this seems violate the semantics of the LRU flag, since it's clear 
that this folio is still in the LRU list.

With your changes, I think we should drain the page vectors before 
calling folio_test_lru(), otherwise some cases will fail to check 
folio_test_lru() if the folio remain in the page vectors for an extended 
period.

>                  struct folio_batch *fbatch;
> 
>                  folio_get(folio);
> @@ -699,6 +696,8 @@ void deactivate_file_folio(struct folio *folio)
>          /* Deactivating an unevictable folio will not accelerate 
> reclaim */
>          if (folio_test_unevictable(folio))
>                  return;
> +       if (!folio_test_clear_lru(folio))
> +               return;
> 
>          folio_get(folio);
>          local_lock(&cpu_fbatches.lock);
> @@ -718,7 +717,8 @@ void deactivate_file_folio(struct folio *folio)
>   void folio_deactivate(struct folio *folio)
>   {
>          if (folio_test_lru(folio) && !folio_test_unevictable(folio) &&
> -           (folio_test_active(folio) || lru_gen_enabled())) {
> +           (folio_test_active(folio) || lru_gen_enabled()) &&
> +           folio_test_clear_lru(folio)) {
>                  struct folio_batch *fbatch;
> 
>                  folio_get(folio);
> @@ -740,7 +740,8 @@ void folio_mark_lazyfree(struct folio *folio)
>   {
>          if (folio_test_lru(folio) && folio_test_anon(folio) &&
>              folio_test_swapbacked(folio) && 
> !folio_test_swapcache(folio) &&
> -           !folio_test_unevictable(folio)) {
> +           !folio_test_unevictable(folio) &&
> +           folio_test_clear_lru(folio)) {
>                  struct folio_batch *fbatch;
> 
>                  folio_get(folio);
> 
> 
>

David Hildenbrand June 5, 2024, 11:41 a.m. UTC | #6

On 05.06.24 13:37, Baolin Wang wrote:
> 
> 
> On 2024/6/5 17:53, David Hildenbrand wrote:
>> On 05.06.24 11:41, David Hildenbrand wrote:
>>> On 05.06.24 03:18, yangge1116 wrote:
>>>>
>>>>
>>>> 在 2024/6/4 下午9:47, David Hildenbrand 写道:
>>>>> On 04.06.24 12:48, yangge1116@126.com wrote:
>>>>>> From: yangge <yangge1116@126.com>
>>>>>>
>>>>>> If a page is added in pagevec, its ref count increases one, remove
>>>>>> the page from pagevec decreases one. Page migration requires the
>>>>>> page is not referenced by others except page mapping. Before
>>>>>> migrating a page, we should try to drain the page from pagevec in
>>>>>> case the page is in it, however, folio_test_lru() is not sufficient
>>>>>> to tell whether the page is in pagevec or not, if the page is in
>>>>>> pagevec, the migration will fail.
>>>>>>
>>>>>> Remove the condition and drain lru once to ensure the page is not
>>>>>> referenced by pagevec.
>>>>>
>>>>> What you are saying is that we might have a page on which
>>>>> folio_test_lru() succeeds, that was added to one of the cpu_fbatches,
>>>>> correct?
>>>>
>>>> Yes
>>>>
>>>>>
>>>>> Can you describe under which circumstances that happens?
>>>>>
>>>>
>>>> If we call folio_activate() to move a page from inactive LRU list to
>>>> active LRU list, the page is not only in LRU list, but also in one of
>>>> the cpu_fbatches.
>>>>
>>>> void folio_activate(struct folio *folio)
>>>> {
>>>>         if (folio_test_lru(folio) && !folio_test_active(folio) &&
>>>>             !folio_test_unevictable(folio)) {
>>>>             struct folio_batch *fbatch;
>>>>
>>>>             folio_get(folio);
>>>>             //After this, folio is in LRU list, and its ref count have
>>>> increased one.
>>>>
>>>>             local_lock(&cpu_fbatches.lock);
>>>>             fbatch = this_cpu_ptr(&cpu_fbatches.activate);
>>>>             folio_batch_add_and_move(fbatch, folio, folio_activate_fn);
>>>>             local_unlock(&cpu_fbatches.lock);
>>>>         }
>>>> }
>>>
>>> Interesting, the !SMP variant does the folio_test_clear_lru().
>>>
>>> It would be really helpful if we could reliably identify whether LRU
>>> batching code has a raised reference on a folio.
>>>
>>> We have the same scenario in
>>> * folio_deactivate()
>>> * folio_mark_lazyfree()
>>>
>>> In folio_batch_move_lru() we do the folio_test_clear_lru(folio).
>>>
>>> No expert on that code, I'm wondering if we could move the
>>> folio_test_clear_lru() out, such that we can more reliably identify
>>> whether a folio is on the LRU batch or not.
>>
>> I'm sure there would be something extremely broken with the following
>> (I don't know what I'm doing ;) ), but I wonder if there would be a way
>> to make something like that work (and perform well enough?).
>>
>> diff --git a/mm/swap.c b/mm/swap.c
>> index 67786cb771305..642e471c3ec5a 100644
>> --- a/mm/swap.c
>> +++ b/mm/swap.c
>> @@ -212,10 +212,6 @@ static void folio_batch_move_lru(struct folio_batch
>> *fbatch, move_fn_t move_fn)
>>           for (i = 0; i < folio_batch_count(fbatch); i++) {
>>                   struct folio *folio = fbatch->folios[i];
>>
>> -               /* block memcg migration while the folio moves between
>> lru */
>> -               if (move_fn != lru_add_fn && !folio_test_clear_lru(folio))
>> -                       continue;
>> -
>>                   folio_lruvec_relock_irqsave(folio, &lruvec, &flags);
>>                   move_fn(lruvec, folio);
>>
>> @@ -255,8 +251,9 @@ static void lru_move_tail_fn(struct lruvec *lruvec,
>> struct folio *folio)
>>     */
>>    void folio_rotate_reclaimable(struct folio *folio)
>>    {
>> -       if (!folio_test_locked(folio) && !folio_test_dirty(folio) &&
>> -           !folio_test_unevictable(folio) && folio_test_lru(folio)) {
>> +       if (folio_test_lru(folio) && !folio_test_locked(folio) &&
>> +           !folio_test_dirty(folio) && !folio_test_unevictable(folio) &&
>> +           folio_test_clear_lru(folio)) {
>>                   struct folio_batch *fbatch;
>>                   unsigned long flags;
>>
>> @@ -354,7 +351,7 @@ static void folio_activate_drain(int cpu)
>>    void folio_activate(struct folio *folio)
>>    {
>>           if (folio_test_lru(folio) && !folio_test_active(folio) &&
>> -           !folio_test_unevictable(folio)) {
>> +           !folio_test_unevictable(folio) &&
>> folio_test_clear_lru(folio)) {
> 
> IMO, this seems violate the semantics of the LRU flag, since it's clear
> that this folio is still in the LRU list.

Good point.

But regarding "violation": we already do clear the flag temporarily in 
there, so it's rather that we make the visible time where it is cleared 
"longer". (yes, it can be much longer :) )

David Hildenbrand June 5, 2024, 12:20 p.m. UTC | #7

On 05.06.24 13:41, David Hildenbrand wrote:
> On 05.06.24 13:37, Baolin Wang wrote:
>>
>>
>> On 2024/6/5 17:53, David Hildenbrand wrote:
>>> On 05.06.24 11:41, David Hildenbrand wrote:
>>>> On 05.06.24 03:18, yangge1116 wrote:
>>>>>
>>>>>
>>>>> 在 2024/6/4 下午9:47, David Hildenbrand 写道:
>>>>>> On 04.06.24 12:48, yangge1116@126.com wrote:
>>>>>>> From: yangge <yangge1116@126.com>
>>>>>>>
>>>>>>> If a page is added in pagevec, its ref count increases one, remove
>>>>>>> the page from pagevec decreases one. Page migration requires the
>>>>>>> page is not referenced by others except page mapping. Before
>>>>>>> migrating a page, we should try to drain the page from pagevec in
>>>>>>> case the page is in it, however, folio_test_lru() is not sufficient
>>>>>>> to tell whether the page is in pagevec or not, if the page is in
>>>>>>> pagevec, the migration will fail.
>>>>>>>
>>>>>>> Remove the condition and drain lru once to ensure the page is not
>>>>>>> referenced by pagevec.
>>>>>>
>>>>>> What you are saying is that we might have a page on which
>>>>>> folio_test_lru() succeeds, that was added to one of the cpu_fbatches,
>>>>>> correct?
>>>>>
>>>>> Yes
>>>>>
>>>>>>
>>>>>> Can you describe under which circumstances that happens?
>>>>>>
>>>>>
>>>>> If we call folio_activate() to move a page from inactive LRU list to
>>>>> active LRU list, the page is not only in LRU list, but also in one of
>>>>> the cpu_fbatches.
>>>>>
>>>>> void folio_activate(struct folio *folio)
>>>>> {
>>>>>          if (folio_test_lru(folio) && !folio_test_active(folio) &&
>>>>>              !folio_test_unevictable(folio)) {
>>>>>              struct folio_batch *fbatch;
>>>>>
>>>>>              folio_get(folio);
>>>>>              //After this, folio is in LRU list, and its ref count have
>>>>> increased one.
>>>>>
>>>>>              local_lock(&cpu_fbatches.lock);
>>>>>              fbatch = this_cpu_ptr(&cpu_fbatches.activate);
>>>>>              folio_batch_add_and_move(fbatch, folio, folio_activate_fn);
>>>>>              local_unlock(&cpu_fbatches.lock);
>>>>>          }
>>>>> }
>>>>
>>>> Interesting, the !SMP variant does the folio_test_clear_lru().
>>>>
>>>> It would be really helpful if we could reliably identify whether LRU
>>>> batching code has a raised reference on a folio.
>>>>
>>>> We have the same scenario in
>>>> * folio_deactivate()
>>>> * folio_mark_lazyfree()
>>>>
>>>> In folio_batch_move_lru() we do the folio_test_clear_lru(folio).
>>>>
>>>> No expert on that code, I'm wondering if we could move the
>>>> folio_test_clear_lru() out, such that we can more reliably identify
>>>> whether a folio is on the LRU batch or not.
>>>
>>> I'm sure there would be something extremely broken with the following
>>> (I don't know what I'm doing ;) ), but I wonder if there would be a way
>>> to make something like that work (and perform well enough?).
>>>
>>> diff --git a/mm/swap.c b/mm/swap.c
>>> index 67786cb771305..642e471c3ec5a 100644
>>> --- a/mm/swap.c
>>> +++ b/mm/swap.c
>>> @@ -212,10 +212,6 @@ static void folio_batch_move_lru(struct folio_batch
>>> *fbatch, move_fn_t move_fn)
>>>            for (i = 0; i < folio_batch_count(fbatch); i++) {
>>>                    struct folio *folio = fbatch->folios[i];
>>>
>>> -               /* block memcg migration while the folio moves between
>>> lru */
>>> -               if (move_fn != lru_add_fn && !folio_test_clear_lru(folio))
>>> -                       continue;
>>> -
>>>                    folio_lruvec_relock_irqsave(folio, &lruvec, &flags);
>>>                    move_fn(lruvec, folio);
>>>
>>> @@ -255,8 +251,9 @@ static void lru_move_tail_fn(struct lruvec *lruvec,
>>> struct folio *folio)
>>>      */
>>>     void folio_rotate_reclaimable(struct folio *folio)
>>>     {
>>> -       if (!folio_test_locked(folio) && !folio_test_dirty(folio) &&
>>> -           !folio_test_unevictable(folio) && folio_test_lru(folio)) {
>>> +       if (folio_test_lru(folio) && !folio_test_locked(folio) &&
>>> +           !folio_test_dirty(folio) && !folio_test_unevictable(folio) &&
>>> +           folio_test_clear_lru(folio)) {
>>>                    struct folio_batch *fbatch;
>>>                    unsigned long flags;
>>>
>>> @@ -354,7 +351,7 @@ static void folio_activate_drain(int cpu)
>>>     void folio_activate(struct folio *folio)
>>>     {
>>>            if (folio_test_lru(folio) && !folio_test_active(folio) &&
>>> -           !folio_test_unevictable(folio)) {
>>> +           !folio_test_unevictable(folio) &&
>>> folio_test_clear_lru(folio)) {
>>
>> IMO, this seems violate the semantics of the LRU flag, since it's clear
>> that this folio is still in the LRU list.
> 
> Good point.
> 
> But regarding "violation": we already do clear the flag temporarily in
> there, so it's rather that we make the visible time where it is cleared
> "longer". (yes, it can be much longer :) )

Some random thoughts about some folio_test_lru() users:

mm/khugepaged.c: skips pages if !folio_test_lru(), but would fail skip 
it either way if there is the unexpected reference from the LRU batch!

mm/compaction.c: skips pages if !folio_test_lru(), but would fail skip 
it either way if there is the unexpected reference from the LRU batch!

mm/memory.c: would love to identify this case and to a lru_add_drain() 
to free up that reference.

mm/huge_memory.c: splitting with the additional reference will fail 
already. Maybe we'd want to drain the LRU batch.

mm/madvise.c: skips pages if !folio_test_lru(). I wonder what happens if 
we have the same page twice in an LRU batch with different target goals ...


Some other users (there are not that many that don't use it for sanity 
checks though) might likely be a bit different.

Ge Yang June 6, 2024, 1:35 a.m. UTC | #8

在 2024/6/5 下午5:53, David Hildenbrand 写道:
> On 05.06.24 11:41, David Hildenbrand wrote:
>> On 05.06.24 03:18, yangge1116 wrote:
>>>
>>>
>>> 在 2024/6/4 下午9:47, David Hildenbrand 写道:
>>>> On 04.06.24 12:48, yangge1116@126.com wrote:
>>>>> From: yangge <yangge1116@126.com>
>>>>>
>>>>> If a page is added in pagevec, its ref count increases one, remove
>>>>> the page from pagevec decreases one. Page migration requires the
>>>>> page is not referenced by others except page mapping. Before
>>>>> migrating a page, we should try to drain the page from pagevec in
>>>>> case the page is in it, however, folio_test_lru() is not sufficient
>>>>> to tell whether the page is in pagevec or not, if the page is in
>>>>> pagevec, the migration will fail.
>>>>>
>>>>> Remove the condition and drain lru once to ensure the page is not
>>>>> referenced by pagevec.
>>>>
>>>> What you are saying is that we might have a page on which
>>>> folio_test_lru() succeeds, that was added to one of the cpu_fbatches,
>>>> correct?
>>>
>>> Yes
>>>
>>>>
>>>> Can you describe under which circumstances that happens?
>>>>
>>>
>>> If we call folio_activate() to move a page from inactive LRU list to
>>> active LRU list, the page is not only in LRU list, but also in one of
>>> the cpu_fbatches.
>>>
>>> void folio_activate(struct folio *folio)
>>> {
>>>        if (folio_test_lru(folio) && !folio_test_active(folio) &&
>>>            !folio_test_unevictable(folio)) {
>>>            struct folio_batch *fbatch;
>>>
>>>            folio_get(folio);
>>>            //After this, folio is in LRU list, and its ref count have
>>> increased one.
>>>
>>>            local_lock(&cpu_fbatches.lock);
>>>            fbatch = this_cpu_ptr(&cpu_fbatches.activate);
>>>            folio_batch_add_and_move(fbatch, folio, folio_activate_fn);
>>>            local_unlock(&cpu_fbatches.lock);
>>>        }
>>> }
>>
>> Interesting, the !SMP variant does the folio_test_clear_lru().
>>
>> It would be really helpful if we could reliably identify whether LRU
>> batching code has a raised reference on a folio.
>>
>> We have the same scenario in
>> * folio_deactivate()
>> * folio_mark_lazyfree()
>>
>> In folio_batch_move_lru() we do the folio_test_clear_lru(folio).
>>
>> No expert on that code, I'm wondering if we could move the
>> folio_test_clear_lru() out, such that we can more reliably identify
>> whether a folio is on the LRU batch or not.
> 
> I'm sure there would be something extremely broken with the following
> (I don't know what I'm doing ;) ), but I wonder if there would be a way
> to make something like that work (and perform well enough?).
> 
> diff --git a/mm/swap.c b/mm/swap.c
> index 67786cb771305..642e471c3ec5a 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -212,10 +212,6 @@ static void folio_batch_move_lru(struct folio_batch 
> *fbatch, move_fn_t move_fn)
>          for (i = 0; i < folio_batch_count(fbatch); i++) {
>                  struct folio *folio = fbatch->folios[i];
> 
> -               /* block memcg migration while the folio moves between 
> lru */
> -               if (move_fn != lru_add_fn && !folio_test_clear_lru(folio))
> -                       continue;
> -
>                  folio_lruvec_relock_irqsave(folio, &lruvec, &flags);
>                  move_fn(lruvec, folio);
> 
> @@ -255,8 +251,9 @@ static void lru_move_tail_fn(struct lruvec *lruvec, 
> struct folio *folio)
>    */
>   void folio_rotate_reclaimable(struct folio *folio)
>   {
> -       if (!folio_test_locked(folio) && !folio_test_dirty(folio) &&
> -           !folio_test_unevictable(folio) && folio_test_lru(folio)) {
> +       if (folio_test_lru(folio) && !folio_test_locked(folio) &&
> +           !folio_test_dirty(folio) && !folio_test_unevictable(folio) &&
> +           folio_test_clear_lru(folio)) {
>                  struct folio_batch *fbatch;
>                  unsigned long flags;
> 
> @@ -354,7 +351,7 @@ static void folio_activate_drain(int cpu)
>   void folio_activate(struct folio *folio)
>   {
>          if (folio_test_lru(folio) && !folio_test_active(folio) &&
> -           !folio_test_unevictable(folio)) {
> +           !folio_test_unevictable(folio) && 
> folio_test_clear_lru(folio)) {
>                  struct folio_batch *fbatch;
> 
>                  folio_get(folio);
> @@ -699,6 +696,8 @@ void deactivate_file_folio(struct folio *folio)
>          /* Deactivating an unevictable folio will not accelerate 
> reclaim */
>          if (folio_test_unevictable(folio))
>                  return;
> +       if (!folio_test_clear_lru(folio))
> +               return;
> 
>          folio_get(folio);
>          local_lock(&cpu_fbatches.lock);
> @@ -718,7 +717,8 @@ void deactivate_file_folio(struct folio *folio)
>   void folio_deactivate(struct folio *folio)
>   {
>          if (folio_test_lru(folio) && !folio_test_unevictable(folio) &&
> -           (folio_test_active(folio) || lru_gen_enabled())) {
> +           (folio_test_active(folio) || lru_gen_enabled()) &&
> +           folio_test_clear_lru(folio)) {
>                  struct folio_batch *fbatch;
> 
>                  folio_get(folio);
> @@ -740,7 +740,8 @@ void folio_mark_lazyfree(struct folio *folio)
>   {
>          if (folio_test_lru(folio) && folio_test_anon(folio) &&
>              folio_test_swapbacked(folio) && 
> !folio_test_swapcache(folio) &&
> -           !folio_test_unevictable(folio)) {
> +           !folio_test_unevictable(folio) &&
> +           folio_test_clear_lru(folio)) {
>                  struct folio_batch *fbatch;
> 
>                  folio_get(folio);

With your changes, we will call folio_test_clear_lru(folio) firstly to 
clear the LRU flag, and then call folio_get(folio) to pin the folio, 
seems a little unreasonable. Normally, folio_get(folio) is called 
firstly to pin the page, and then some other functions is called to 
handle the folio.

Baolin Wang June 6, 2024, 1:57 a.m. UTC | #9

On 2024/6/5 20:20, David Hildenbrand wrote:
> On 05.06.24 13:41, David Hildenbrand wrote:
>> On 05.06.24 13:37, Baolin Wang wrote:
>>>
>>>
>>> On 2024/6/5 17:53, David Hildenbrand wrote:
>>>> On 05.06.24 11:41, David Hildenbrand wrote:
>>>>> On 05.06.24 03:18, yangge1116 wrote:
>>>>>>
>>>>>>
>>>>>> 在 2024/6/4 下午9:47, David Hildenbrand 写道:
>>>>>>> On 04.06.24 12:48, yangge1116@126.com wrote:
>>>>>>>> From: yangge <yangge1116@126.com>
>>>>>>>>
>>>>>>>> If a page is added in pagevec, its ref count increases one, remove
>>>>>>>> the page from pagevec decreases one. Page migration requires the
>>>>>>>> page is not referenced by others except page mapping. Before
>>>>>>>> migrating a page, we should try to drain the page from pagevec in
>>>>>>>> case the page is in it, however, folio_test_lru() is not sufficient
>>>>>>>> to tell whether the page is in pagevec or not, if the page is in
>>>>>>>> pagevec, the migration will fail.
>>>>>>>>
>>>>>>>> Remove the condition and drain lru once to ensure the page is not
>>>>>>>> referenced by pagevec.
>>>>>>>
>>>>>>> What you are saying is that we might have a page on which
>>>>>>> folio_test_lru() succeeds, that was added to one of the 
>>>>>>> cpu_fbatches,
>>>>>>> correct?
>>>>>>
>>>>>> Yes
>>>>>>
>>>>>>>
>>>>>>> Can you describe under which circumstances that happens?
>>>>>>>
>>>>>>
>>>>>> If we call folio_activate() to move a page from inactive LRU list to
>>>>>> active LRU list, the page is not only in LRU list, but also in one of
>>>>>> the cpu_fbatches.
>>>>>>
>>>>>> void folio_activate(struct folio *folio)
>>>>>> {
>>>>>>          if (folio_test_lru(folio) && !folio_test_active(folio) &&
>>>>>>              !folio_test_unevictable(folio)) {
>>>>>>              struct folio_batch *fbatch;
>>>>>>
>>>>>>              folio_get(folio);
>>>>>>              //After this, folio is in LRU list, and its ref count 
>>>>>> have
>>>>>> increased one.
>>>>>>
>>>>>>              local_lock(&cpu_fbatches.lock);
>>>>>>              fbatch = this_cpu_ptr(&cpu_fbatches.activate);
>>>>>>              folio_batch_add_and_move(fbatch, folio, 
>>>>>> folio_activate_fn);
>>>>>>              local_unlock(&cpu_fbatches.lock);
>>>>>>          }
>>>>>> }
>>>>>
>>>>> Interesting, the !SMP variant does the folio_test_clear_lru().
>>>>>
>>>>> It would be really helpful if we could reliably identify whether LRU
>>>>> batching code has a raised reference on a folio.
>>>>>
>>>>> We have the same scenario in
>>>>> * folio_deactivate()
>>>>> * folio_mark_lazyfree()
>>>>>
>>>>> In folio_batch_move_lru() we do the folio_test_clear_lru(folio).
>>>>>
>>>>> No expert on that code, I'm wondering if we could move the
>>>>> folio_test_clear_lru() out, such that we can more reliably identify
>>>>> whether a folio is on the LRU batch or not.
>>>>
>>>> I'm sure there would be something extremely broken with the following
>>>> (I don't know what I'm doing ;) ), but I wonder if there would be a way
>>>> to make something like that work (and perform well enough?).
>>>>
>>>> diff --git a/mm/swap.c b/mm/swap.c
>>>> index 67786cb771305..642e471c3ec5a 100644
>>>> --- a/mm/swap.c
>>>> +++ b/mm/swap.c
>>>> @@ -212,10 +212,6 @@ static void folio_batch_move_lru(struct 
>>>> folio_batch
>>>> *fbatch, move_fn_t move_fn)
>>>>            for (i = 0; i < folio_batch_count(fbatch); i++) {
>>>>                    struct folio *folio = fbatch->folios[i];
>>>>
>>>> -               /* block memcg migration while the folio moves between
>>>> lru */
>>>> -               if (move_fn != lru_add_fn && 
>>>> !folio_test_clear_lru(folio))
>>>> -                       continue;
>>>> -
>>>>                    folio_lruvec_relock_irqsave(folio, &lruvec, &flags);
>>>>                    move_fn(lruvec, folio);
>>>>
>>>> @@ -255,8 +251,9 @@ static void lru_move_tail_fn(struct lruvec *lruvec,
>>>> struct folio *folio)
>>>>      */
>>>>     void folio_rotate_reclaimable(struct folio *folio)
>>>>     {
>>>> -       if (!folio_test_locked(folio) && !folio_test_dirty(folio) &&
>>>> -           !folio_test_unevictable(folio) && folio_test_lru(folio)) {
>>>> +       if (folio_test_lru(folio) && !folio_test_locked(folio) &&
>>>> +           !folio_test_dirty(folio) && 
>>>> !folio_test_unevictable(folio) &&
>>>> +           folio_test_clear_lru(folio)) {
>>>>                    struct folio_batch *fbatch;
>>>>                    unsigned long flags;
>>>>
>>>> @@ -354,7 +351,7 @@ static void folio_activate_drain(int cpu)
>>>>     void folio_activate(struct folio *folio)
>>>>     {
>>>>            if (folio_test_lru(folio) && !folio_test_active(folio) &&
>>>> -           !folio_test_unevictable(folio)) {
>>>> +           !folio_test_unevictable(folio) &&
>>>> folio_test_clear_lru(folio)) {
>>>
>>> IMO, this seems violate the semantics of the LRU flag, since it's clear
>>> that this folio is still in the LRU list.
>>
>> Good point.
>>
>> But regarding "violation": we already do clear the flag temporarily in
>> there, so it's rather that we make the visible time where it is cleared
>> "longer". (yes, it can be much longer :) )
> 
> Some random thoughts about some folio_test_lru() users:
> 
> mm/khugepaged.c: skips pages if !folio_test_lru(), but would fail skip 
> it either way if there is the unexpected reference from the LRU batch!
> 
> mm/compaction.c: skips pages if !folio_test_lru(), but would fail skip 
> it either way if there is the unexpected reference from the LRU batch!
> 
> mm/memory.c: would love to identify this case and to a lru_add_drain() 
> to free up that reference.
> 
> mm/huge_memory.c: splitting with the additional reference will fail 
> already. Maybe we'd want to drain the LRU batch.

Agree.

> 
> mm/madvise.c: skips pages if !folio_test_lru(). I wonder what happens if 
> we have the same page twice in an LRU batch with different target goals ...

IIUC, LRU batch can ignore this folio since it's LRU flag is cleared by 
folio_isolate_lru(), then will call folios_put() to frop the reference.


> Some other users (there are not that many that don't use it for sanity 
> checks though) might likely be a bit different.

mm/page_isolation.c: fail to set pageblock migratetype to isolate if 
!folio_test_lru(), then alloc_contig_range_noprof() can be failed. But 
the original code could set pageblock migratetype to isolate, then 
calling drain_all_pages() in alloc_contig_range_noprof() to drop 
reference of the LRU batch.

mm/vmscan.c: will call lru_add_drain() before calling 
isolate_lru_folios(), so seems no impact.

BTW, we also need to look at the usage of folio_isolate_lru().

It doesn’t seem to have major obstacles, but there are many details to 
analyze :)

David Hildenbrand June 6, 2024, 7:39 a.m. UTC | #10

On 06.06.24 03:35, yangge1116 wrote:
> 
> 
> 在 2024/6/5 下午5:53, David Hildenbrand 写道:
>> On 05.06.24 11:41, David Hildenbrand wrote:
>>> On 05.06.24 03:18, yangge1116 wrote:
>>>>
>>>>
>>>> 在 2024/6/4 下午9:47, David Hildenbrand 写道:
>>>>> On 04.06.24 12:48, yangge1116@126.com wrote:
>>>>>> From: yangge <yangge1116@126.com>
>>>>>>
>>>>>> If a page is added in pagevec, its ref count increases one, remove
>>>>>> the page from pagevec decreases one. Page migration requires the
>>>>>> page is not referenced by others except page mapping. Before
>>>>>> migrating a page, we should try to drain the page from pagevec in
>>>>>> case the page is in it, however, folio_test_lru() is not sufficient
>>>>>> to tell whether the page is in pagevec or not, if the page is in
>>>>>> pagevec, the migration will fail.
>>>>>>
>>>>>> Remove the condition and drain lru once to ensure the page is not
>>>>>> referenced by pagevec.
>>>>>
>>>>> What you are saying is that we might have a page on which
>>>>> folio_test_lru() succeeds, that was added to one of the cpu_fbatches,
>>>>> correct?
>>>>
>>>> Yes
>>>>
>>>>>
>>>>> Can you describe under which circumstances that happens?
>>>>>
>>>>
>>>> If we call folio_activate() to move a page from inactive LRU list to
>>>> active LRU list, the page is not only in LRU list, but also in one of
>>>> the cpu_fbatches.
>>>>
>>>> void folio_activate(struct folio *folio)
>>>> {
>>>>         if (folio_test_lru(folio) && !folio_test_active(folio) &&
>>>>             !folio_test_unevictable(folio)) {
>>>>             struct folio_batch *fbatch;
>>>>
>>>>             folio_get(folio);
>>>>             //After this, folio is in LRU list, and its ref count have
>>>> increased one.
>>>>
>>>>             local_lock(&cpu_fbatches.lock);
>>>>             fbatch = this_cpu_ptr(&cpu_fbatches.activate);
>>>>             folio_batch_add_and_move(fbatch, folio, folio_activate_fn);
>>>>             local_unlock(&cpu_fbatches.lock);
>>>>         }
>>>> }
>>>
>>> Interesting, the !SMP variant does the folio_test_clear_lru().
>>>
>>> It would be really helpful if we could reliably identify whether LRU
>>> batching code has a raised reference on a folio.
>>>
>>> We have the same scenario in
>>> * folio_deactivate()
>>> * folio_mark_lazyfree()
>>>
>>> In folio_batch_move_lru() we do the folio_test_clear_lru(folio).
>>>
>>> No expert on that code, I'm wondering if we could move the
>>> folio_test_clear_lru() out, such that we can more reliably identify
>>> whether a folio is on the LRU batch or not.
>>
>> I'm sure there would be something extremely broken with the following
>> (I don't know what I'm doing ;) ), but I wonder if there would be a way
>> to make something like that work (and perform well enough?).
>>
>> diff --git a/mm/swap.c b/mm/swap.c
>> index 67786cb771305..642e471c3ec5a 100644
>> --- a/mm/swap.c
>> +++ b/mm/swap.c
>> @@ -212,10 +212,6 @@ static void folio_batch_move_lru(struct folio_batch
>> *fbatch, move_fn_t move_fn)
>>           for (i = 0; i < folio_batch_count(fbatch); i++) {
>>                   struct folio *folio = fbatch->folios[i];
>>
>> -               /* block memcg migration while the folio moves between
>> lru */
>> -               if (move_fn != lru_add_fn && !folio_test_clear_lru(folio))
>> -                       continue;
>> -
>>                   folio_lruvec_relock_irqsave(folio, &lruvec, &flags);
>>                   move_fn(lruvec, folio);
>>
>> @@ -255,8 +251,9 @@ static void lru_move_tail_fn(struct lruvec *lruvec,
>> struct folio *folio)
>>     */
>>    void folio_rotate_reclaimable(struct folio *folio)
>>    {
>> -       if (!folio_test_locked(folio) && !folio_test_dirty(folio) &&
>> -           !folio_test_unevictable(folio) && folio_test_lru(folio)) {
>> +       if (folio_test_lru(folio) && !folio_test_locked(folio) &&
>> +           !folio_test_dirty(folio) && !folio_test_unevictable(folio) &&
>> +           folio_test_clear_lru(folio)) {
>>                   struct folio_batch *fbatch;
>>                   unsigned long flags;
>>
>> @@ -354,7 +351,7 @@ static void folio_activate_drain(int cpu)
>>    void folio_activate(struct folio *folio)
>>    {
>>           if (folio_test_lru(folio) && !folio_test_active(folio) &&
>> -           !folio_test_unevictable(folio)) {
>> +           !folio_test_unevictable(folio) &&
>> folio_test_clear_lru(folio)) {
>>                   struct folio_batch *fbatch;
>>
>>                   folio_get(folio);
>> @@ -699,6 +696,8 @@ void deactivate_file_folio(struct folio *folio)
>>           /* Deactivating an unevictable folio will not accelerate
>> reclaim */
>>           if (folio_test_unevictable(folio))
>>                   return;
>> +       if (!folio_test_clear_lru(folio))
>> +               return;
>>
>>           folio_get(folio);
>>           local_lock(&cpu_fbatches.lock);
>> @@ -718,7 +717,8 @@ void deactivate_file_folio(struct folio *folio)
>>    void folio_deactivate(struct folio *folio)
>>    {
>>           if (folio_test_lru(folio) && !folio_test_unevictable(folio) &&
>> -           (folio_test_active(folio) || lru_gen_enabled())) {
>> +           (folio_test_active(folio) || lru_gen_enabled()) &&
>> +           folio_test_clear_lru(folio)) {
>>                   struct folio_batch *fbatch;
>>
>>                   folio_get(folio);
>> @@ -740,7 +740,8 @@ void folio_mark_lazyfree(struct folio *folio)
>>    {
>>           if (folio_test_lru(folio) && folio_test_anon(folio) &&
>>               folio_test_swapbacked(folio) &&
>> !folio_test_swapcache(folio) &&
>> -           !folio_test_unevictable(folio)) {
>> +           !folio_test_unevictable(folio) &&
>> +           folio_test_clear_lru(folio)) {
>>                   struct folio_batch *fbatch;
>>
>>                   folio_get(folio);
> 
> With your changes, we will call folio_test_clear_lru(folio) firstly to
> clear the LRU flag, and then call folio_get(folio) to pin the folio,
> seems a little unreasonable. Normally, folio_get(folio) is called
> firstly to pin the page, and then some other functions is called to
> handle the folio.

Right, if that really matters (not sure if it does) we could do

if (folio_test_lru(folio) && ...
	folio_get(folio);
	if (!folio_test_clear_lru(folio)) {
		folio_put(folio)
	} else {
		struct folio_batch *fbatch;
	}
}

David Hildenbrand June 6, 2024, 7:56 a.m. UTC | #11

>> Some random thoughts about some folio_test_lru() users:
>>
>> mm/khugepaged.c: skips pages if !folio_test_lru(), but would fail skip
>> it either way if there is the unexpected reference from the LRU batch!
>>
>> mm/compaction.c: skips pages if !folio_test_lru(), but would fail skip
>> it either way if there is the unexpected reference from the LRU batch!
>>
>> mm/memory.c: would love to identify this case and to a lru_add_drain()
>> to free up that reference.
>>
>> mm/huge_memory.c: splitting with the additional reference will fail
>> already. Maybe we'd want to drain the LRU batch.
> 
> Agree.
> 
>>
>> mm/madvise.c: skips pages if !folio_test_lru(). I wonder what happens if
>> we have the same page twice in an LRU batch with different target goals ...
> 
> IIUC, LRU batch can ignore this folio since it's LRU flag is cleared by
> folio_isolate_lru(), then will call folios_put() to frop the reference.
> 

I think what's interesting to highlight in the current design is that a 
folio might end up in multiple LRU batches, and whatever the result will 
be is determined by the sequence of them getting flushed. Doesn't sound 
quite right but maybe there was a reason for it (which could just have 
been "simpler implementation").

> 
>> Some other users (there are not that many that don't use it for sanity
>> checks though) might likely be a bit different.

There are also some PageLRU checks, but not that many.

> 
> mm/page_isolation.c: fail to set pageblock migratetype to isolate if
> !folio_test_lru(), then alloc_contig_range_noprof() can be failed. But
> the original code could set pageblock migratetype to isolate, then
> calling drain_all_pages() in alloc_contig_range_noprof() to drop
> reference of the LRU batch.
> 
> mm/vmscan.c: will call lru_add_drain() before calling
> isolate_lru_folios(), so seems no impact.

lru_add_drain() will only drain the local CPU. So if the folio would be 
stuck on another CPU's LRU batch, right now we could isolate it. When 
processing that LRU batch while the folio is still isolated, it would 
currently simply skip the operation.

So right now we can call isolate_lru_folios() even if the folio is stuck 
on another CPU's LRU batch.

We cannot really reclaim the folio as long is it is in another CPU's LRU 
batch, though (unexpected reference).

> 
> BTW, we also need to look at the usage of folio_isolate_lru().

Yes.

> 
> It doesn’t seem to have major obstacles, but there are many details to
> analyze :)

Yes, we're only scratching the surface.

Having a way to identify "this folio is very likely some CPU's LRU 
batch"  could end up being quite valuable, because likely we don't want 
to blindly drain the LRU simply because there is some unexpected 
reference on a folio [as we would in this patch].

Ge Yang June 6, 2024, 8:50 a.m. UTC | #12

在 2024/6/6 下午3:39, David Hildenbrand 写道:
> On 06.06.24 03:35, yangge1116 wrote:
>>
>>
>> 在 2024/6/5 下午5:53, David Hildenbrand 写道:
>>> On 05.06.24 11:41, David Hildenbrand wrote:
>>>> On 05.06.24 03:18, yangge1116 wrote:
>>>>>
>>>>>
>>>>> 在 2024/6/4 下午9:47, David Hildenbrand 写道:
>>>>>> On 04.06.24 12:48, yangge1116@126.com wrote:
>>>>>>> From: yangge <yangge1116@126.com>
>>>>>>>
>>>>>>> If a page is added in pagevec, its ref count increases one, remove
>>>>>>> the page from pagevec decreases one. Page migration requires the
>>>>>>> page is not referenced by others except page mapping. Before
>>>>>>> migrating a page, we should try to drain the page from pagevec in
>>>>>>> case the page is in it, however, folio_test_lru() is not sufficient
>>>>>>> to tell whether the page is in pagevec or not, if the page is in
>>>>>>> pagevec, the migration will fail.
>>>>>>>
>>>>>>> Remove the condition and drain lru once to ensure the page is not
>>>>>>> referenced by pagevec.
>>>>>>
>>>>>> What you are saying is that we might have a page on which
>>>>>> folio_test_lru() succeeds, that was added to one of the cpu_fbatches,
>>>>>> correct?
>>>>>
>>>>> Yes
>>>>>
>>>>>>
>>>>>> Can you describe under which circumstances that happens?
>>>>>>
>>>>>
>>>>> If we call folio_activate() to move a page from inactive LRU list to
>>>>> active LRU list, the page is not only in LRU list, but also in one of
>>>>> the cpu_fbatches.
>>>>>
>>>>> void folio_activate(struct folio *folio)
>>>>> {
>>>>>         if (folio_test_lru(folio) && !folio_test_active(folio) &&
>>>>>             !folio_test_unevictable(folio)) {
>>>>>             struct folio_batch *fbatch;
>>>>>
>>>>>             folio_get(folio);
>>>>>             //After this, folio is in LRU list, and its ref count have
>>>>> increased one.
>>>>>
>>>>>             local_lock(&cpu_fbatches.lock);
>>>>>             fbatch = this_cpu_ptr(&cpu_fbatches.activate);
>>>>>             folio_batch_add_and_move(fbatch, folio, 
>>>>> folio_activate_fn);
>>>>>             local_unlock(&cpu_fbatches.lock);
>>>>>         }
>>>>> }
>>>>
>>>> Interesting, the !SMP variant does the folio_test_clear_lru().
>>>>
>>>> It would be really helpful if we could reliably identify whether LRU
>>>> batching code has a raised reference on a folio.
>>>>
>>>> We have the same scenario in
>>>> * folio_deactivate()
>>>> * folio_mark_lazyfree()
>>>>
>>>> In folio_batch_move_lru() we do the folio_test_clear_lru(folio).
>>>>
>>>> No expert on that code, I'm wondering if we could move the
>>>> folio_test_clear_lru() out, such that we can more reliably identify
>>>> whether a folio is on the LRU batch or not.
>>>
>>> I'm sure there would be something extremely broken with the following
>>> (I don't know what I'm doing ;) ), but I wonder if there would be a way
>>> to make something like that work (and perform well enough?).
>>>
>>> diff --git a/mm/swap.c b/mm/swap.c
>>> index 67786cb771305..642e471c3ec5a 100644
>>> --- a/mm/swap.c
>>> +++ b/mm/swap.c
>>> @@ -212,10 +212,6 @@ static void folio_batch_move_lru(struct folio_batch
>>> *fbatch, move_fn_t move_fn)
>>>           for (i = 0; i < folio_batch_count(fbatch); i++) {
>>>                   struct folio *folio = fbatch->folios[i];
>>>
>>> -               /* block memcg migration while the folio moves between
>>> lru */
>>> -               if (move_fn != lru_add_fn && 
>>> !folio_test_clear_lru(folio))
>>> -                       continue;
>>> -
>>>                   folio_lruvec_relock_irqsave(folio, &lruvec, &flags);
>>>                   move_fn(lruvec, folio);
>>>
>>> @@ -255,8 +251,9 @@ static void lru_move_tail_fn(struct lruvec *lruvec,
>>> struct folio *folio)
>>>     */
>>>    void folio_rotate_reclaimable(struct folio *folio)
>>>    {
>>> -       if (!folio_test_locked(folio) && !folio_test_dirty(folio) &&
>>> -           !folio_test_unevictable(folio) && folio_test_lru(folio)) {
>>> +       if (folio_test_lru(folio) && !folio_test_locked(folio) &&
>>> +           !folio_test_dirty(folio) && 
>>> !folio_test_unevictable(folio) &&
>>> +           folio_test_clear_lru(folio)) {
>>>                   struct folio_batch *fbatch;
>>>                   unsigned long flags;
>>>
>>> @@ -354,7 +351,7 @@ static void folio_activate_drain(int cpu)
>>>    void folio_activate(struct folio *folio)
>>>    {
>>>           if (folio_test_lru(folio) && !folio_test_active(folio) &&
>>> -           !folio_test_unevictable(folio)) {
>>> +           !folio_test_unevictable(folio) &&
>>> folio_test_clear_lru(folio)) {
>>>                   struct folio_batch *fbatch;
>>>
>>>                   folio_get(folio);
>>> @@ -699,6 +696,8 @@ void deactivate_file_folio(struct folio *folio)
>>>           /* Deactivating an unevictable folio will not accelerate
>>> reclaim */
>>>           if (folio_test_unevictable(folio))
>>>                   return;
>>> +       if (!folio_test_clear_lru(folio))
>>> +               return;
>>>
>>>           folio_get(folio);
>>>           local_lock(&cpu_fbatches.lock);
>>> @@ -718,7 +717,8 @@ void deactivate_file_folio(struct folio *folio)
>>>    void folio_deactivate(struct folio *folio)
>>>    {
>>>           if (folio_test_lru(folio) && !folio_test_unevictable(folio) &&
>>> -           (folio_test_active(folio) || lru_gen_enabled())) {
>>> +           (folio_test_active(folio) || lru_gen_enabled()) &&
>>> +           folio_test_clear_lru(folio)) {
>>>                   struct folio_batch *fbatch;
>>>
>>>                   folio_get(folio);
>>> @@ -740,7 +740,8 @@ void folio_mark_lazyfree(struct folio *folio)
>>>    {
>>>           if (folio_test_lru(folio) && folio_test_anon(folio) &&
>>>               folio_test_swapbacked(folio) &&
>>> !folio_test_swapcache(folio) &&
>>> -           !folio_test_unevictable(folio)) {
>>> +           !folio_test_unevictable(folio) &&
>>> +           folio_test_clear_lru(folio)) {
>>>                   struct folio_batch *fbatch;
>>>
>>>                   folio_get(folio);
>>
>> With your changes, we will call folio_test_clear_lru(folio) firstly to
>> clear the LRU flag, and then call folio_get(folio) to pin the folio,
>> seems a little unreasonable. Normally, folio_get(folio) is called
>> firstly to pin the page, and then some other functions is called to
>> handle the folio.
> 
> Right, if that really matters (not sure if it does) we could do
> 
> if (folio_test_lru(folio) && ...
>      folio_get(folio);
>      if (!folio_test_clear_lru(folio)) {
>          folio_put(folio)
>      } else {
>          struct folio_batch *fbatch;
>      }
> }
> 

Right, it seems can work.
As discussed above, it will make the visible time where it is cleared
"longer". Other users of lru_add_drain_all() don't check whether folio 
is in lru, seems there's no need to check here.

Ge Yang June 8, 2024, 4:38 a.m. UTC | #13

在 2024/6/6 下午3:56, David Hildenbrand 写道:
>>> Some random thoughts about some folio_test_lru() users:
>>>
>>> mm/khugepaged.c: skips pages if !folio_test_lru(), but would fail skip
>>> it either way if there is the unexpected reference from the LRU batch!
>>>
>>> mm/compaction.c: skips pages if !folio_test_lru(), but would fail skip
>>> it either way if there is the unexpected reference from the LRU batch!
>>>
>>> mm/memory.c: would love to identify this case and to a lru_add_drain()
>>> to free up that reference.
>>>
>>> mm/huge_memory.c: splitting with the additional reference will fail
>>> already. Maybe we'd want to drain the LRU batch.
>>
>> Agree.
>>
>>>
>>> mm/madvise.c: skips pages if !folio_test_lru(). I wonder what happens if
>>> we have the same page twice in an LRU batch with different target 
>>> goals ...
>>
>> IIUC, LRU batch can ignore this folio since it's LRU flag is cleared by
>> folio_isolate_lru(), then will call folios_put() to frop the reference.
>>
> 
> I think what's interesting to highlight in the current design is that a 
> folio might end up in multiple LRU batches, and whatever the result will 
> be is determined by the sequence of them getting flushed. Doesn't sound 
> quite right but maybe there was a reason for it (which could just have 
> been "simpler implementation").
> 
>>
>>> Some other users (there are not that many that don't use it for sanity
>>> checks though) might likely be a bit different.
> 
> There are also some PageLRU checks, but not that many.
> 
>>
>> mm/page_isolation.c: fail to set pageblock migratetype to isolate if
>> !folio_test_lru(), then alloc_contig_range_noprof() can be failed. But
>> the original code could set pageblock migratetype to isolate, then
>> calling drain_all_pages() in alloc_contig_range_noprof() to drop
>> reference of the LRU batch.
>>
>> mm/vmscan.c: will call lru_add_drain() before calling
>> isolate_lru_folios(), so seems no impact.
> 
> lru_add_drain() will only drain the local CPU. So if the folio would be 
> stuck on another CPU's LRU batch, right now we could isolate it. When 
> processing that LRU batch while the folio is still isolated, it would 
> currently simply skip the operation.
> 
> So right now we can call isolate_lru_folios() even if the folio is stuck 
> on another CPU's LRU batch.
> 
> We cannot really reclaim the folio as long is it is in another CPU's LRU 
> batch, though (unexpected reference).
> 
>>
>> BTW, we also need to look at the usage of folio_isolate_lru().
> 
> Yes.
> 
>>
>> It doesn’t seem to have major obstacles, but there are many details to
>> analyze :)
> 
> Yes, we're only scratching the surface.
> 
> Having a way to identify "this folio is very likely some CPU's LRU 
> batch"  could end up being quite valuable, because likely we don't want 
> to blindly drain the LRU simply because there is some unexpected 
> reference on a folio [as we would in this patch].
> 

Can we add a PG_lru_batch flag to determine whether a page is in lru 
batch? If we can, seems this problem will be easier.

Matthew Wilcox June 8, 2024, 3:15 p.m. UTC | #14

On Sat, Jun 08, 2024 at 12:38:49PM +0800, yangge1116 wrote:
> Can we add a PG_lru_batch flag to determine whether a page is in lru batch?
> If we can, seems this problem will be easier.

Page flags are in short supply.  You'd need a really good justification.

David Hildenbrand June 8, 2024, 4:03 p.m. UTC | #15

On 08.06.24 17:15, Matthew Wilcox wrote:
> On Sat, Jun 08, 2024 at 12:38:49PM +0800, yangge1116 wrote:
>> Can we add a PG_lru_batch flag to determine whether a page is in lru batch?
>> If we can, seems this problem will be easier.
> 
> Page flags are in short supply.  You'd need a really good justification.
> 

A flag would not be able to handle the "part of multiple LRU batches" 
that should currently possible (when to clear the flag?). Well, if we 
have to keep supporting that. If we only to be part in a single LRU 
batch, a new flag could work and we could still allow isolating a folio 
from LRU while in some LRU batch.

If we could handle it using the existing flags, that would of course be 
better (wondering if we could store more information in the existing 
flags by using a different encoding for the different states).

The temporary clearing of the LRU flag we do right now tells me that 
it's already not 100% reliable, so the question is how much more 
unreliable we can make it before it would hurt :)

Ge Yang June 11, 2024, 11:20 a.m. UTC | #16

在 2024/6/9 上午12:03, David Hildenbrand 写道:
> On 08.06.24 17:15, Matthew Wilcox wrote:
>> On Sat, Jun 08, 2024 at 12:38:49PM +0800, yangge1116 wrote:
>>> Can we add a PG_lru_batch flag to determine whether a page is in lru 
>>> batch?
>>> If we can, seems this problem will be easier.
>>
>> Page flags are in short supply.  You'd need a really good justification.
>>
> 
> A flag would not be able to handle the "part of multiple LRU batches" 
> that should currently possible (when to clear the flag?). Well, if we 
> have to keep supporting that. If we only to be part in a single LRU 
> batch, a new flag could work and we could still allow isolating a folio 
> from LRU while in some LRU batch.

Yes, before adding a folio to LRU batch, check whether the folio has 
been added. Add the folio to LRU batch only if the folio has not been 
added.

> 
> If we could handle it using the existing flags, that would of course be 
> better (wondering if we could store more information in the existing 
> flags by using a different encoding for the different states).

If a folio contains more than one page, the folio will not be added to 
LRU batch. Can we use folio_test_large(folio) to filter?

if (!folio_test_large(folio) && drain_allow) {
	lru_add_drain_all();
	drain_allow = false;
}

> 
> The temporary clearing of the LRU flag we do right now tells me that 
> it's already not 100% reliable, so the question is how much more 
> unreliable we can make it before it would hurt :)
>

David Hildenbrand June 12, 2024, 7:32 a.m. UTC | #17

On 11.06.24 13:20, yangge1116 wrote:
> 
> 
> 在 2024/6/9 上午12:03, David Hildenbrand 写道:
>> On 08.06.24 17:15, Matthew Wilcox wrote:
>>> On Sat, Jun 08, 2024 at 12:38:49PM +0800, yangge1116 wrote:
>>>> Can we add a PG_lru_batch flag to determine whether a page is in lru
>>>> batch?
>>>> If we can, seems this problem will be easier.
>>>
>>> Page flags are in short supply.  You'd need a really good justification.
>>>
>>
>> A flag would not be able to handle the "part of multiple LRU batches"
>> that should currently possible (when to clear the flag?). Well, if we
>> have to keep supporting that. If we only to be part in a single LRU
>> batch, a new flag could work and we could still allow isolating a folio
>> from LRU while in some LRU batch.
> 
> Yes, before adding a folio to LRU batch, check whether the folio has
> been added. Add the folio to LRU batch only if the folio has not been
> added.
> 
>>
>> If we could handle it using the existing flags, that would of course be
>> better (wondering if we could store more information in the existing
>> flags by using a different encoding for the different states).
> 
> If a folio contains more than one page, the folio will not be added to
> LRU batch. Can we use folio_test_large(folio) to filter?
> 
> if (!folio_test_large(folio) && drain_allow) {
> 	lru_add_drain_all();
> 	drain_allow = false;
> }

I think we should do better than this, and not do arbitrary 
lru_add_drain_all() calls.

Ge Yang June 15, 2024, 11:44 a.m. UTC | #18

在 2024/6/12 下午3:32, David Hildenbrand 写道:
> On 11.06.24 13:20, yangge1116 wrote:
>>
>>
>> 在 2024/6/9 上午12:03, David Hildenbrand 写道:
>>> On 08.06.24 17:15, Matthew Wilcox wrote:
>>>> On Sat, Jun 08, 2024 at 12:38:49PM +0800, yangge1116 wrote:
>>>>> Can we add a PG_lru_batch flag to determine whether a page is in lru
>>>>> batch?
>>>>> If we can, seems this problem will be easier.
>>>>
>>>> Page flags are in short supply.  You'd need a really good 
>>>> justification.
>>>>
>>>
>>> A flag would not be able to handle the "part of multiple LRU batches"
>>> that should currently possible (when to clear the flag?). Well, if we
>>> have to keep supporting that. If we only to be part in a single LRU
>>> batch, a new flag could work and we could still allow isolating a folio
>>> from LRU while in some LRU batch.
>>
>> Yes, before adding a folio to LRU batch, check whether the folio has
>> been added. Add the folio to LRU batch only if the folio has not been
>> added.
>>
>>>
>>> If we could handle it using the existing flags, that would of course be
>>> better (wondering if we could store more information in the existing
>>> flags by using a different encoding for the different states).
>>
>> If a folio contains more than one page, the folio will not be added to
>> LRU batch. Can we use folio_test_large(folio) to filter?
>>
>> if (!folio_test_large(folio) && drain_allow) {
>>     lru_add_drain_all();
>>     drain_allow = false;
>> }
> 
> I think we should do better than this, and not do arbitrary 
> lru_add_drain_all() calls.
> 

Thanks, I will prepare the V2.

Ge Yang June 17, 2024, 9:50 a.m. UTC | #19

在 2024/6/12 下午3:32, David Hildenbrand 写道:
> On 11.06.24 13:20, yangge1116 wrote:
>>
>>
>> 在 2024/6/9 上午12:03, David Hildenbrand 写道:
>>> On 08.06.24 17:15, Matthew Wilcox wrote:
>>>> On Sat, Jun 08, 2024 at 12:38:49PM +0800, yangge1116 wrote:
>>>>> Can we add a PG_lru_batch flag to determine whether a page is in lru
>>>>> batch?
>>>>> If we can, seems this problem will be easier.
>>>>
>>>> Page flags are in short supply.  You'd need a really good 
>>>> justification.
>>>>
>>>
>>> A flag would not be able to handle the "part of multiple LRU batches"
>>> that should currently possible (when to clear the flag?). Well, if we
>>> have to keep supporting that. If we only to be part in a single LRU
>>> batch, a new flag could work and we could still allow isolating a folio
>>> from LRU while in some LRU batch.
>>
>> Yes, before adding a folio to LRU batch, check whether the folio has
>> been added. Add the folio to LRU batch only if the folio has not been
>> added.
>>
>>>
>>> If we could handle it using the existing flags, that would of course be
>>> better (wondering if we could store more information in the existing
>>> flags by using a different encoding for the different states).
>>
>> If a folio contains more than one page, the folio will not be added to
>> LRU batch. Can we use folio_test_large(folio) to filter?
>>
>> if (!folio_test_large(folio) && drain_allow) {
>>     lru_add_drain_all();
>>     drain_allow = false;
>> }
> 
> I think we should do better than this, and not do arbitrary 
> lru_add_drain_all() calls.
> 

Thanks, I've got another idea.

If we add GUP_PIN_COUNTING_BIAS to folio's ref count before adding to 
LRU batch, we can use folio_maybe_dma_pinned(folio) to check whether the 
folio is in LRU batch. I wonder if it's feasible?


static void folio_batch_add_and_move(struct folio_batch *fbatch,
     struct folio *folio, move_fn_t move_fn)
{
     if (!folio_test_large(folio)) {
         folio_ref_add(folio, GUP_PIN_COUNTING_BIAS);

         if (folio_batch_add(fbatch, folio) && !lru_cache_disabled())
             return;
     }

     folio_batch_move_lru(fbatch, move_fn);
}

if (!folio_test_large(folio) && folio_maybe_dma_pinned(folio) &&
     drain_allow) {
     lru_add_drain_all();
     drain_allow = false;
}

David Hildenbrand June 17, 2024, 9:52 a.m. UTC | #20

On 17.06.24 11:50, yangge1116 wrote:
> 
> 
> 在 2024/6/12 下午3:32, David Hildenbrand 写道:
>> On 11.06.24 13:20, yangge1116 wrote:
>>>
>>>
>>> 在 2024/6/9 上午12:03, David Hildenbrand 写道:
>>>> On 08.06.24 17:15, Matthew Wilcox wrote:
>>>>> On Sat, Jun 08, 2024 at 12:38:49PM +0800, yangge1116 wrote:
>>>>>> Can we add a PG_lru_batch flag to determine whether a page is in lru
>>>>>> batch?
>>>>>> If we can, seems this problem will be easier.
>>>>>
>>>>> Page flags are in short supply.  You'd need a really good
>>>>> justification.
>>>>>
>>>>
>>>> A flag would not be able to handle the "part of multiple LRU batches"
>>>> that should currently possible (when to clear the flag?). Well, if we
>>>> have to keep supporting that. If we only to be part in a single LRU
>>>> batch, a new flag could work and we could still allow isolating a folio
>>>> from LRU while in some LRU batch.
>>>
>>> Yes, before adding a folio to LRU batch, check whether the folio has
>>> been added. Add the folio to LRU batch only if the folio has not been
>>> added.
>>>
>>>>
>>>> If we could handle it using the existing flags, that would of course be
>>>> better (wondering if we could store more information in the existing
>>>> flags by using a different encoding for the different states).
>>>
>>> If a folio contains more than one page, the folio will not be added to
>>> LRU batch. Can we use folio_test_large(folio) to filter?
>>>
>>> if (!folio_test_large(folio) && drain_allow) {
>>>      lru_add_drain_all();
>>>      drain_allow = false;
>>> }
>>
>> I think we should do better than this, and not do arbitrary
>> lru_add_drain_all() calls.
>>
> 
> Thanks, I've got another idea.
> 
> If we add GUP_PIN_COUNTING_BIAS to folio's ref count before adding to
> LRU batch, we can use folio_maybe_dma_pinned(folio) to check whether the
> folio is in LRU batch. I wonder if it's feasible?

Why would we want to make folio_maybe_dma_pinned() detection that worse?

Ge Yang June 17, 2024, 11:22 a.m. UTC | #21

在 2024/6/17 下午5:52, David Hildenbrand 写道:
> Why would we want to make folio_maybe_dma_pinned() detection that worse

Just want to fix it using the existing function, seems a little 
unreasonable. I will prepare the V2 using folio_test_lru(folio) to check.

static unsigned long collect_longterm_unpinnable_pages(...)
{
...
     if (!folio_test_lru(folio) && drain_allow) {
         lru_add_drain_all();
         drain_allow = false;
     }
...
}

void folio_mark_lazyfree(struct folio *folio)
{
     if (folio_test_anon(folio) && folio_test_swapbacked(folio) &&
        !folio_test_swapcache(folio) && !folio_test_unevictable(folio)) {
         struct folio_batch *fbatch;

         folio_get(folio);
         if (!folio_test_clear_lru(folio)) {
             folio_put(folio);
             return;
         }

         local_lock(&cpu_fbatches.lock);
         fbatch = this_cpu_ptr(&cpu_fbatches.lru_lazyfree);
         folio_batch_add_and_move(fbatch, folio, lru_lazyfree_fn);
         local_unlock(&cpu_fbatches.lock);
     }
}

mm/gup: don't check page lru flag before draining it

Commit Message

Comments

Patch