mbox series

[v4,0/1] mm: vmascan: retry folios written back while isolated for traditional LRU

Message ID 20241209083618.2889145-1-chenridong@huaweicloud.com (mailing list archive)
Headers show
Series mm: vmascan: retry folios written back while isolated for traditional LRU | expand

Message

Chen Ridong Dec. 9, 2024, 8:36 a.m. UTC
From: Chen Ridong <chenridong@huawei.com>

The commit 359a5e1416ca ("mm: multi-gen LRU: retry folios written back
while isolated") only fixed the issue for mglru. However, this issue
also exists in the traditional active/inactive LRU. Fix this issue
in the same way for active/inactive lru.

What is fixed:
The page reclaim isolates a batch of folios from the tail of one of the
LRU lists and works on those folios one by one.  For a suitable
swap-backed folio, if the swap device is async, it queues that folio for
writeback.  After the page reclaim finishes an entire batch, it puts back
the folios it queued for writeback to the head of the original LRU list.

In the meantime, the page writeback flushes the queued folios also by
batches.  Its batching logic is independent from that of the page reclaim.
For each of the folios it writes back, the page writeback calls
folio_rotate_reclaimable() which tries to rotate a folio to the tail.

folio_rotate_reclaimable() only works for a folio after the page reclaim
has put it back.  If an async swap device is fast enough, the page
writeback can finish with that folio while the page reclaim is still
working on the rest of the batch containing it.  In this case, that folio
will remain at the head and the page reclaim will not retry it before
reaching there.

---
v4:
 - conbine patch 1 and patch 2 together in v3.
 - refine commit msg.
 - fix builds errors reported-by: kernel test robot <lkp@intel.com>.
v3:
 - fix this issue in the same with way as multi-gen LRU.

v2:
 - detect folios whose writeback has done and move them to the tail
    of lru. suggested by Barry Song
[2] https://lore.kernel.org/linux-kernel/CAGsJ_4zqL8ZHNRZ44o_CC69kE7DBVXvbZfvmQxMGiFqRxqHQdA@mail.gmail.com/

v1:
[1] https://lore.kernel.org/linux-kernel/20241010081802.290893-1-chenridong@huaweicloud.com/

Chen Ridong (1):
  mm: vmascan: retry folios written back while isolated for traditional
    LRU

 include/linux/mmzone.h |   3 +-
 mm/vmscan.c            | 108 +++++++++++++++++++++++++++++------------
 2 files changed, 77 insertions(+), 34 deletions(-)

Comments

Andrew Morton Dec. 10, 2024, 2:13 a.m. UTC | #1
On Mon,  9 Dec 2024 08:36:17 +0000 Chen Ridong <chenridong@huaweicloud.com> wrote:

> The commit 359a5e1416ca ("mm: multi-gen LRU: retry folios written back
> while isolated") only fixed the issue for mglru. However, this issue
> also exists in the traditional active/inactive LRU. Fix this issue
> in the same way for active/inactive lru.
> 
> What is fixed:
> The page reclaim isolates a batch of folios from the tail of one of the
> LRU lists and works on those folios one by one.  For a suitable
> swap-backed folio, if the swap device is async, it queues that folio for
> writeback.  After the page reclaim finishes an entire batch, it puts back
> the folios it queued for writeback to the head of the original LRU list.
> 
> In the meantime, the page writeback flushes the queued folios also by
> batches.  Its batching logic is independent from that of the page reclaim.
> For each of the folios it writes back, the page writeback calls
> folio_rotate_reclaimable() which tries to rotate a folio to the tail.
> 
> folio_rotate_reclaimable() only works for a folio after the page reclaim
> has put it back.  If an async swap device is fast enough, the page
> writeback can finish with that folio while the page reclaim is still
> working on the rest of the batch containing it.  In this case, that folio
> will remain at the head and the page reclaim will not retry it before
> reaching there.

For a single patch series I think it's best to just make it a single
patch!  No need for a [0/n]: just put all the info into the patch's
changelog.

The patch doesn't apply to current development kernels.  Please check
the mm-unstable branch of
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/, or
linux-next.

Please replace vmascan with vmscan in the title.
chenridong Dec. 10, 2024, 6:42 a.m. UTC | #2
On 2024/12/10 10:13, Andrew Morton wrote:
> On Mon,  9 Dec 2024 08:36:17 +0000 Chen Ridong <chenridong@huaweicloud.com> wrote:
> 
>> The commit 359a5e1416ca ("mm: multi-gen LRU: retry folios written back
>> while isolated") only fixed the issue for mglru. However, this issue
>> also exists in the traditional active/inactive LRU. Fix this issue
>> in the same way for active/inactive lru.
>>
>> What is fixed:
>> The page reclaim isolates a batch of folios from the tail of one of the
>> LRU lists and works on those folios one by one.  For a suitable
>> swap-backed folio, if the swap device is async, it queues that folio for
>> writeback.  After the page reclaim finishes an entire batch, it puts back
>> the folios it queued for writeback to the head of the original LRU list.
>>
>> In the meantime, the page writeback flushes the queued folios also by
>> batches.  Its batching logic is independent from that of the page reclaim.
>> For each of the folios it writes back, the page writeback calls
>> folio_rotate_reclaimable() which tries to rotate a folio to the tail.
>>
>> folio_rotate_reclaimable() only works for a folio after the page reclaim
>> has put it back.  If an async swap device is fast enough, the page
>> writeback can finish with that folio while the page reclaim is still
>> working on the rest of the batch containing it.  In this case, that folio
>> will remain at the head and the page reclaim will not retry it before
>> reaching there.
> 
> For a single patch series I think it's best to just make it a single
> patch!  No need for a [0/n]: just put all the info into the patch's
> changelog.
> 
> The patch doesn't apply to current development kernels.  Please check
> the mm-unstable branch of
> https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/, or
> linux-next.
> 
> Please replace vmascan with vmscan in the title.

Thanks, Will update.

Best regards,
Ridong