Message ID | 20240124124308.461-1-justinjiang@vivo.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm:vmscan: shrink skip folios in the exiting task | expand |
On Wed, Jan 24, 2024 at 08:43:07PM +0800, Zhiguo Jiang wrote: > If the shrinking folio is belong to the exiting task, this folio should > be freed in the task exit flow rather than being reclaimed in the shrink > flow, because the former takes less time. > > If the folio which is belong to the exiting task is reclaimed in the > shrink flow, such as the anon folio, the anon folio needs to be first > written to the swap partition by swap-in in shrink flow, and then the > corresponding swap folio needs to be released in the task exiting flow. > As is well known, releasing a swap folio will task more time than > releasing directly an anon folio. > > In the scenarios of the low memory system and mutil backed-applications, > the time-consuming problem caused by shrinking the exiting task's folios > will be more severe. What testing have you done of this patch? How often does it happen? Are there particular workloads that benefit from this? (I'm not sure what "mutil backed-applications" are?) And I do mean specifically of this patch, because to my eyes it shouldn't even compile.
在 2024/1/24 21:20, Matthew Wilcox 写道: > On Wed, Jan 24, 2024 at 08:43:07PM +0800, Zhiguo Jiang wrote: >> If the shrinking folio is belong to the exiting task, this folio should >> be freed in the task exit flow rather than being reclaimed in the shrink >> flow, because the former takes less time. >> >> If the folio which is belong to the exiting task is reclaimed in the >> shrink flow, such as the anon folio, the anon folio needs to be first >> written to the swap partition by swap-in in shrink flow, and then the >> corresponding swap folio needs to be released in the task exiting flow. >> As is well known, releasing a swap folio will task more time than >> releasing directly an anon folio. >> >> In the scenarios of the low memory system and mutil backed-applications, >> the time-consuming problem caused by shrinking the exiting task's folios >> will be more severe. > What testing have you done of this patch? How often does it happen? > Are there particular workloads that benefit from this? (I'm not sure > what "mutil backed-applications" are?) 1 Yes, this patch has been tested. 2 When the exiting tasks and shrink_inactive_list occur at the same time, the folios which shrink_inactive_list reclaims may be the exiting tasks's folios in lruvecs. And when system is low memory, it more likely to occur, because more backend applidatuions will be killed. The shrink_inactive_list reclaims the exiting tasks's folios in lruvecs and transforms the exiting tasks's anon folios into swap memory, which leads to the increasing load of the current exiting tasks. 3 This patch can alleviate the load of the tasks exiting process. This patch can make that the exiting tasks release its anon folios faster instead of releasing its swap memory from its anon folios swap-in in shrink_inactive_list. 4 "mutil backed-applications" means that there are a large number of the backend applications in the system. Thanks > > And I do mean specifically of this patch, because to my eyes it > shouldn't even compile. Has been tested.
On Thu, Jan 25, 2024 at 09:34:53AM +0800, zhiguojiang wrote: > > > In the scenarios of the low memory system and mutil backed-applications, > > > the time-consuming problem caused by shrinking the exiting task's folios > > > will be more severe. > > What testing have you done of this patch? How often does it happen? > > Are there particular workloads that benefit from this? (I'm not sure > > what "mutil backed-applications" are?) > 1 Yes, this patch has been tested. > > 2 When the exiting tasks and shrink_inactive_list occur at the same time, > the folios which shrink_inactive_list reclaims may be the exiting tasks's > folios > in lruvecs. And when system is low memory, it more likely to occur, > because > more backend applidatuions will be killed. > The shrink_inactive_list reclaims the exiting tasks's folios in lruvecs > and > transforms the exiting tasks's anon folios into swap memory, which leads > to the increasing load of the current exiting tasks. Ah, we're talking about an OOM scenario. OK, that makes some sense. You should have mentioned that in the patch description. > 3 This patch can alleviate the load of the tasks exiting process. This patch > can make that the exiting tasks release its anon folios faster instead of > releasing its swap memory from its anon folios swap-in in > shrink_inactive_list. > > 4 "mutil backed-applications" means that there are a large number of > the backend applications in the system. > > Thanks > > > > And I do mean specifically of this patch, because to my eyes it > > shouldn't even compile. > Has been tested. That's odd. I thought GCC used to complain about long x = 0x100000000; but it does the right thing. Except on 32-bit where it'll say "warning: integer constant out of range". In any case, the number you chose is not going to work on 32-bit systems or on ARM or x86. It conflicts with protection keys on x86 and MTE on ARM. We can do it without any new magic numbers. I'm not sure this is a great approach, but this should work: +++ b/mm/rmap.c @@ -840,6 +840,12 @@ static bool folio_referenced_one(struct folio *folio, int referenced = 0; unsigned long start = address, ptes = 0; + /* Skip this folio if it's mapped by an exiting task */ + if (test_bit(MMF_OOM_SKIP, &vma->vm_mm->flags)) { + pra->referenced = -1; + return false; + } + while (page_vma_mapped_walk(&pvmw)) { address = pvmw.address;
在 2024/1/28 14:35, Matthew Wilcox 写道: > On Thu, Jan 25, 2024 at 09:34:53AM +0800, zhiguojiang wrote: >>>> In the scenarios of the low memory system and mutil backed-applications, >>>> the time-consuming problem caused by shrinking the exiting task's folios >>>> will be more severe. >>> What testing have you done of this patch? How often does it happen? >>> Are there particular workloads that benefit from this? (I'm not sure >>> what "mutil backed-applications" are?) >> 1 Yes, this patch has been tested. >> >> 2 When the exiting tasks and shrink_inactive_list occur at the same time, >> the folios which shrink_inactive_list reclaims may be the exiting tasks's >> folios >> in lruvecs. And when system is low memory, it more likely to occur, >> because >> more backend applidatuions will be killed. >> The shrink_inactive_list reclaims the exiting tasks's folios in lruvecs >> and >> transforms the exiting tasks's anon folios into swap memory, which leads >> to the increasing load of the current exiting tasks. > Ah, we're talking about an OOM scenario. OK, that makes some sense. > You should have mentioned that in the patch description. Hi, 1 Ok, I will update a more comprehensive description in next version. 2 I think this issue can occur not only in OOM scenario, but also in normal task exit scenario. So: 1) if (unlikely(!atomic_read(&vma->vm_mm->mm_users))) represents the scenario where the task exits normally. 2) if(test_bit(MMF_OOM_SKIP, &vma->vm_mm->flags)) represents the OOM Reaper scenario. 3 MMF_OOM_SKIP can only represent OOM scenario and cannot represent normal task exit scenario, as when MMF_OOM_SKIP is set in normal task exit scenario, the memory folios of the task have already been released. And the shrink_inactive_list should recognize these lru folios in exiting task before this exiting task releases its memory folios. tlb_gather_mmu_fullmm(&tlb, mm); /* update_hiwater_rss(mm) here? but nobody should be looking */ /* Use ULONG_MAX here to ensure all VMAs in the mm are unmapped */ unmap_vmas(&tlb, &mas, vma, 0, ULONG_MAX, ULONG_MAX, false); mmap_read_unlock(mm); /* * Set MMF_OOM_SKIP to hide this task from the oom killer/reaper * because the memory has been already freed. */ set_bit(MMF_OOM_SKIP, &mm->flags); >> 3 This patch can alleviate the load of the tasks exiting process. This patch >> can make that the exiting tasks release its anon folios faster instead of >> releasing its swap memory from its anon folios swap-in in >> shrink_inactive_list. >> >> 4 "mutil backed-applications" means that there are a large number of >> the backend applications in the system. >> >> Thanks >>> And I do mean specifically of this patch, because to my eyes it >>> shouldn't even compile. >> Has been tested. > That's odd. I thought GCC used to complain about > > long x = 0x100000000; > > but it does the right thing. Except on 32-bit where it'll say > "warning: integer constant out of range". > > In any case, the number you chose is not going to work on 32-bit systems > or on ARM or x86. It conflicts with protection keys on x86 and MTE on > ARM. You're right, I overlooked the situation with the 32-bit system. > We can do it without any new magic numbers. I'm not sure this is a > great approach, but this should work: > > +++ b/mm/rmap.c > @@ -840,6 +840,12 @@ static bool folio_referenced_one(struct folio *folio, > int referenced = 0; > unsigned long start = address, ptes = 0; > > + /* Skip this folio if it's mapped by an exiting task */ > + if (test_bit(MMF_OOM_SKIP, &vma->vm_mm->flags)) { > + pra->referenced = -1; > + return false; > + } > + > while (page_vma_mapped_walk(&pvmw)) { > address = pvmw.address; > I agree with your point of view. I think this is currently the best solution, but I think it also needs to be added with: if (unlikely(!atomic_read(&vma->vm_mm->mm_users))) Please help review it again. Best Regards.
diff --git a/include/linux/mm.h b/include/linux/mm.h index 42652bc8ceca..67e84b7d1928 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -312,6 +312,7 @@ extern unsigned int kobjsize(const void *objp); #define VM_HUGEPAGE 0x20000000 /* MADV_HUGEPAGE marked this vma */ #define VM_NOHUGEPAGE 0x40000000 /* MADV_NOHUGEPAGE marked this vma */ #define VM_MERGEABLE 0x80000000 /* KSM may merge identical pages */ +#define VM_EXITING 0x100000000 /* The task which this vma belongs to is exiting */ #ifdef CONFIG_ARCH_USES_HIGH_VMA_FLAGS #define VM_HIGH_ARCH_BIT_0 32 /* bit only usable on 64-bit architectures */ diff --git a/mm/rmap.c b/mm/rmap.c index 1cf2bffa48ed..deb419fd095b --- a/mm/rmap.c +++ b/mm/rmap.c @@ -943,6 +943,18 @@ static bool invalid_folio_referenced_vma(struct vm_area_struct *vma, void *arg) if (memcg && !mm_match_cgroup(vma->vm_mm, memcg)) return true; + /* + * Skip counting references of the folios in the exiting task, and + * these folios will be freeed in the task exit flow rather than being + * reclaimed in shrink flow, which is shorter time consumption. + */ + if (likely(vma->vm_mm) && + unlikely(!atomic_read(&vma->vm_mm->mm_users) || + test_bit(MMF_OOM_SKIP, &vma->vm_mm->flags))) { + pra->vm_flags |= VM_EXITING; + return true; + } + return false; } diff --git a/mm/vmscan.c b/mm/vmscan.c index b3ed3c9c3e3d..40ea4128044c 100755 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -887,6 +887,10 @@ static enum folio_references folio_check_references(struct folio *folio, return FOLIOREF_KEEP; } + /* The folio belongs to the exiting task: rotate */ + if (vm_flags & VM_EXITING) + return FOLIOREF_KEEP; + /* Reclaim if clean, defer dirty folios to writeback */ if (referenced_folio && folio_is_file_lru(folio)) return FOLIOREF_RECLAIM_CLEAN;
If the shrinking folio is belong to the exiting task, this folio should be freed in the task exit flow rather than being reclaimed in the shrink flow, because the former takes less time. If the folio which is belong to the exiting task is reclaimed in the shrink flow, such as the anon folio, the anon folio needs to be first written to the swap partition by swap-in in shrink flow, and then the corresponding swap folio needs to be released in the task exiting flow. As is well known, releasing a swap folio will task more time than releasing directly an anon folio. In the scenarios of the low memory system and mutil backed-applications, the time-consuming problem caused by shrinking the exiting task's folios will be more severe. Signed-off-by: Zhiguo Jiang <justinjiang@vivo.com> --- include/linux/mm.h | 1 + mm/rmap.c | 12 ++++++++++++ mm/vmscan.c | 4 ++++ 3 files changed, 17 insertions(+) mode change 100644 => 100755 include/linux/mm.h mode change 100644 => 100755 mm/rmap.c