Message ID | 20241209083618.2889145-1-chenridong@huaweicloud.com (mailing list archive) |
---|---|
Headers | show |
Series | mm: vmascan: retry folios written back while isolated for traditional LRU | expand |
On Mon, 9 Dec 2024 08:36:17 +0000 Chen Ridong <chenridong@huaweicloud.com> wrote: > The commit 359a5e1416ca ("mm: multi-gen LRU: retry folios written back > while isolated") only fixed the issue for mglru. However, this issue > also exists in the traditional active/inactive LRU. Fix this issue > in the same way for active/inactive lru. > > What is fixed: > The page reclaim isolates a batch of folios from the tail of one of the > LRU lists and works on those folios one by one. For a suitable > swap-backed folio, if the swap device is async, it queues that folio for > writeback. After the page reclaim finishes an entire batch, it puts back > the folios it queued for writeback to the head of the original LRU list. > > In the meantime, the page writeback flushes the queued folios also by > batches. Its batching logic is independent from that of the page reclaim. > For each of the folios it writes back, the page writeback calls > folio_rotate_reclaimable() which tries to rotate a folio to the tail. > > folio_rotate_reclaimable() only works for a folio after the page reclaim > has put it back. If an async swap device is fast enough, the page > writeback can finish with that folio while the page reclaim is still > working on the rest of the batch containing it. In this case, that folio > will remain at the head and the page reclaim will not retry it before > reaching there. For a single patch series I think it's best to just make it a single patch! No need for a [0/n]: just put all the info into the patch's changelog. The patch doesn't apply to current development kernels. Please check the mm-unstable branch of https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/, or linux-next. Please replace vmascan with vmscan in the title.
On 2024/12/10 10:13, Andrew Morton wrote: > On Mon, 9 Dec 2024 08:36:17 +0000 Chen Ridong <chenridong@huaweicloud.com> wrote: > >> The commit 359a5e1416ca ("mm: multi-gen LRU: retry folios written back >> while isolated") only fixed the issue for mglru. However, this issue >> also exists in the traditional active/inactive LRU. Fix this issue >> in the same way for active/inactive lru. >> >> What is fixed: >> The page reclaim isolates a batch of folios from the tail of one of the >> LRU lists and works on those folios one by one. For a suitable >> swap-backed folio, if the swap device is async, it queues that folio for >> writeback. After the page reclaim finishes an entire batch, it puts back >> the folios it queued for writeback to the head of the original LRU list. >> >> In the meantime, the page writeback flushes the queued folios also by >> batches. Its batching logic is independent from that of the page reclaim. >> For each of the folios it writes back, the page writeback calls >> folio_rotate_reclaimable() which tries to rotate a folio to the tail. >> >> folio_rotate_reclaimable() only works for a folio after the page reclaim >> has put it back. If an async swap device is fast enough, the page >> writeback can finish with that folio while the page reclaim is still >> working on the rest of the batch containing it. In this case, that folio >> will remain at the head and the page reclaim will not retry it before >> reaching there. > > For a single patch series I think it's best to just make it a single > patch! No need for a [0/n]: just put all the info into the patch's > changelog. > > The patch doesn't apply to current development kernels. Please check > the mm-unstable branch of > https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/, or > linux-next. > > Please replace vmascan with vmscan in the title. Thanks, Will update. Best regards, Ridong
From: Chen Ridong <chenridong@huawei.com> The commit 359a5e1416ca ("mm: multi-gen LRU: retry folios written back while isolated") only fixed the issue for mglru. However, this issue also exists in the traditional active/inactive LRU. Fix this issue in the same way for active/inactive lru. What is fixed: The page reclaim isolates a batch of folios from the tail of one of the LRU lists and works on those folios one by one. For a suitable swap-backed folio, if the swap device is async, it queues that folio for writeback. After the page reclaim finishes an entire batch, it puts back the folios it queued for writeback to the head of the original LRU list. In the meantime, the page writeback flushes the queued folios also by batches. Its batching logic is independent from that of the page reclaim. For each of the folios it writes back, the page writeback calls folio_rotate_reclaimable() which tries to rotate a folio to the tail. folio_rotate_reclaimable() only works for a folio after the page reclaim has put it back. If an async swap device is fast enough, the page writeback can finish with that folio while the page reclaim is still working on the rest of the batch containing it. In this case, that folio will remain at the head and the page reclaim will not retry it before reaching there. --- v4: - conbine patch 1 and patch 2 together in v3. - refine commit msg. - fix builds errors reported-by: kernel test robot <lkp@intel.com>. v3: - fix this issue in the same with way as multi-gen LRU. v2: - detect folios whose writeback has done and move them to the tail of lru. suggested by Barry Song [2] https://lore.kernel.org/linux-kernel/CAGsJ_4zqL8ZHNRZ44o_CC69kE7DBVXvbZfvmQxMGiFqRxqHQdA@mail.gmail.com/ v1: [1] https://lore.kernel.org/linux-kernel/20241010081802.290893-1-chenridong@huaweicloud.com/ Chen Ridong (1): mm: vmascan: retry folios written back while isolated for traditional LRU include/linux/mmzone.h | 3 +- mm/vmscan.c | 108 +++++++++++++++++++++++++++++------------ 2 files changed, 77 insertions(+), 34 deletions(-)