Message ID | 1687861992-8722-1-git-send-email-quic_charante@quicinc.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [V2] mm: madvise: fix uneven accounting of psi | expand |
On Tue, Jun 27, 2023 at 04:03:12PM +0530, Charan Teja Kalla wrote: > A folio turns into a Workingset during: > 1) shrink_active_list() placing the folio from active to inactive list. > 2) When a workingset transition is happening during the folio refault. > > And when Workingset is set on a folio, PSI for memory can be accounted > during a) That folio is being reclaimed and b) Refault of that folio. > Please help me understand why PSI for memory (I understood it as the time spent in psi_memstall_enter() to psi_memstall_leave()) would be accounted in (a) i.e during reclaim. I understand that when a working The (b) part is very clear. > This accounting of PSI for memory is not consistent in the cases where > clients use madvise(COLD/PAGEOUT) to deactivate or proactively reclaim a > folio: > a) A folio started at inactive and moved to active as part of accesses. > Workingset is absent on the folio thus madvise(MADV_PAGEOUT) don't > account such folios for PSI. > > b) When the same folio transition from inactive->active and then to > inactive through shrink_active_list(). Workingset is set on the folio > thus madvise(MADV_PAGEOUT) account such folios for PSI. > > c) When the same folio is part of active list directly as a result of > folio refault and this was a workingset folio prior to eviction. > Workingset is set on the folio thus both the operations of MADV_PAGEOUT > and reclaim of the MADV_COLD operated folio account for PSI. > > d) madvise(MADV_COLD) transfers the folio from active list to inactive > list. Such folios may not have the Workingset thus reclaim operation > on such folio doesn't account for PSI. > > As said above, the MADV_PAGEOUT on a folio is accounts for memory PSI in > b) and c) but not in a). Reclaim of a folio on which MADV_COLD is > performed accounts memory PSI in c) but not in d) which is an > inconsistent behaviour. Make this PSI accounting always consistent by > turning a folio into a workingset one whenever it is leaving the active > list. Also, accounting of PSI on a folio whenever it leaves the > active list as part of the MADV_COLD/PAGEOUT operation helps the users > whether they are operating on proper folios[1]. I understood the problem from V1 discussions. But the references to "madvise account such folios for PSI" is confusing. Why would madvise(PAGEOUT) be accounting anything related to PSI. I get that madvise() is messing up PSI accuracy indirectly.. > > [1] https://lore.kernel.org/all/20230605180013.GD221380@cmpxchg.org/ > > Suggested-by: Suren Baghdasaryan <surenb@google.com> > Reported-by: Sai Manobhiram Manapragada <quic_smanapra@quicinc.com> > Reported-by: Pavan Kondeti <quic_pkondeti@quicinc.com> > Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com> > --- > V2: Made changes as per the comments from Johannes/Suren. > > V1: https://lore.kernel.org/all/1685531374-6091-1-git-send-email-quic_charante@quicinc.com/ > > mm/madvise.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/mm/madvise.c b/mm/madvise.c > index d9e7b42..76fb31f 100644 > --- a/mm/madvise.c > +++ b/mm/madvise.c > @@ -413,6 +413,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, > > folio_clear_referenced(folio); > folio_test_clear_young(folio); > + folio_set_workingset(folio); > if (pageout) { > if (folio_isolate_lru(folio)) { > if (folio_test_unevictable(folio)) > @@ -512,6 +513,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, > */ > folio_clear_referenced(folio); > folio_test_clear_young(folio); > + folio_set_workingset(folio); > if (pageout) { > if (folio_isolate_lru(folio)) { > if (folio_test_unevictable(folio)) > -- > 2.7.4 > This is not limited to madvise(PAGEOUT) right, anywhere an active page is reclaimed we have the same problem. For ex: damon_pa_pageout() and __alloc_contig_migrate_range()->reclaim_clean_pages_from_list(). If that is the case, can we set mark a folio as a workingset when it is activated? That way, we don't have make madvise() as a special case? Thanks, Pavan
Hi Charan, thanks for fixing this. One comment: On Tue, Jun 27, 2023 at 04:03:12PM +0530, Charan Teja Kalla wrote: > @@ -413,6 +413,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, > > folio_clear_referenced(folio); > folio_test_clear_young(folio); > + folio_set_workingset(folio); Unless I'm missing something, this also includes inactive pages, which is undesirable. Shouldn't this be: if (folio_test_active(folio)) folio_set_workingset(folio); > @@ -512,6 +513,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, > */ > folio_clear_referenced(folio); > folio_test_clear_young(folio); > + folio_set_workingset(folio); Here as well.
Hi Pavan, On 6/27/2023 7:26 PM, Pavan Kondeti wrote: >> A folio turns into a Workingset during: >> 1) shrink_active_list() placing the folio from active to inactive list. >> 2) When a workingset transition is happening during the folio refault. >> >> And when Workingset is set on a folio, PSI for memory can be accounted >> during a) That folio is being reclaimed and b) Refault of that folio. >> > Please help me understand why PSI for memory (I understood it as the > time spent in psi_memstall_enter() to psi_memstall_leave()) would be > accounted in (a) i.e during reclaim. I understand that when a working > > The (b) part is very clear. > I meant to say, for usual reclaim, PSI is accounted on a folio for both reclaim and as well during the refault operation when Workingset is set on a folio i.e., both a) and b) cases above. >> This accounting of PSI for memory is not consistent in the cases where >> clients use madvise(COLD/PAGEOUT) to deactivate or proactively reclaim a >> folio: Seems I need to be explicit here. How about the below? This accounting of PSI for memory is not consistent for reclaim + refault operation between usual reclaim and madvise(COLD/PAGEOUT) which deactivate or proactively reclaim a folio: lmk for any better rephrasing? >> a) A folio started at inactive and moved to active as part of accesses. >> Workingset is absent on the folio thus madvise(MADV_PAGEOUT) don't >> account such folios for PSI. >> >> b) When the same folio transition from inactive->active and then to >> inactive through shrink_active_list(). Workingset is set on the folio >> thus madvise(MADV_PAGEOUT) account such folios for PSI. >> >> c) When the same folio is part of active list directly as a result of >> folio refault and this was a workingset folio prior to eviction. >> Workingset is set on the folio thus both the operations of MADV_PAGEOUT >> and reclaim of the MADV_COLD operated folio account for PSI. >> >> d) madvise(MADV_COLD) transfers the folio from active list to inactive >> list. Such folios may not have the Workingset thus reclaim operation >> on such folio doesn't account for PSI. > This is not limited to madvise(PAGEOUT) right, anywhere an active page > is reclaimed we have the same problem. For ex: damon_pa_pageout() and > __alloc_contig_migrate_range()->reclaim_clean_pages_from_list(). >> If that is the case, can we set mark a folio as a workingset when it is > activated? That way, we don't have make madvise() as a special case? I think marking the folio as a workingset when it sits on the active is not a correct thing. For the same example you mentioned, a simple CMA allocation will be dropping the clean pages instead of migration. PSI accounting on refault of those pages don't reveal anything to the user. Where as in the madvise() cases, this PSI tells the user about the type of pages that he is working on.[1] BTW, damon_pa_pageout() seems a valid case above. let me fix it in the next patch. [1]https://lore.kernel.org/all/20230605180013.GD221380@cmpxchg.org/
Thanks Johannes!! On 6/27/2023 8:16 PM, Johannes Weiner wrote: > Unless I'm missing something, this also includes inactive pages, which > is undesirable. Shouldn't this be: > > if (folio_test_active(folio)) My bad. Let me fix it. > folio_set_workingset(folio);
On Wed, Jun 28, 2023 at 04:19:01PM +0530, Charan Teja Kalla wrote: > Hi Pavan, > > On 6/27/2023 7:26 PM, Pavan Kondeti wrote: > >> A folio turns into a Workingset during: > >> 1) shrink_active_list() placing the folio from active to inactive list. > >> 2) When a workingset transition is happening during the folio refault. > >> > >> And when Workingset is set on a folio, PSI for memory can be accounted > >> during a) That folio is being reclaimed and b) Refault of that folio. > >> > > Please help me understand why PSI for memory (I understood it as the > > time spent in psi_memstall_enter() to psi_memstall_leave()) would be > > accounted in (a) i.e during reclaim. I understand that when a working > > > > The (b) part is very clear. > > > I meant to say, for usual reclaim, PSI is accounted on a folio for both > reclaim and as well during the refault operation when Workingset is set > on a folio i.e., both a) and b) cases above. > Got it. > >> This accounting of PSI for memory is not consistent in the cases where > >> clients use madvise(COLD/PAGEOUT) to deactivate or proactively reclaim a > >> folio: > > Seems I need to be explicit here. How about the below? > > This accounting of PSI for memory is not consistent for reclaim + > refault operation between usual reclaim and madvise(COLD/PAGEOUT) which > deactivate or proactively reclaim a folio: > Looks good. > lmk for any better rephrasing? > >> a) A folio started at inactive and moved to active as part of accesses. > >> Workingset is absent on the folio thus madvise(MADV_PAGEOUT) don't > >> account such folios for PSI. > >> > >> b) When the same folio transition from inactive->active and then to > >> inactive through shrink_active_list(). Workingset is set on the folio > >> thus madvise(MADV_PAGEOUT) account such folios for PSI. > >> > >> c) When the same folio is part of active list directly as a result of > >> folio refault and this was a workingset folio prior to eviction. > >> Workingset is set on the folio thus both the operations of MADV_PAGEOUT > >> and reclaim of the MADV_COLD operated folio account for PSI. > >> > >> d) madvise(MADV_COLD) transfers the folio from active list to inactive > >> list. Such folios may not have the Workingset thus reclaim operation > >> on such folio doesn't account for PSI. > > This is not limited to madvise(PAGEOUT) right, anywhere an active page > > is reclaimed we have the same problem. For ex: damon_pa_pageout() and > > __alloc_contig_migrate_range()->reclaim_clean_pages_from_list(). > >> If that is the case, can we set mark a folio as a workingset when it is > > activated? That way, we don't have make madvise() as a special case? > I think marking the folio as a workingset when it sits on the active is > not a correct thing. For the same example you mentioned, a simple CMA > allocation will be dropping the clean pages instead of migration. PSI > accounting on refault of those pages don't reveal anything to the user. > Agreed. Thanks for the clarification. > Where as in the madvise() cases, this PSI tells the user about the type > of pages that he is working on.[1] > > BTW, damon_pa_pageout() seems a valid case above. let me fix it in the > next patch. Looks good. Thanks, Pavan
Hi Pavan, On 6/28/2023 4:19 PM, Charan Teja Kalla wrote: > I think marking the folio as a workingset when it sits on the active is > not a correct thing. For the same example you mentioned, a simple CMA > allocation will be dropping the clean pages instead of migration. PSI > accounting on refault of those pages don't reveal anything to the user. > > Where as in the madvise() cases, this PSI tells the user about the type > of pages that he is working on.[1] > > BTW, damon_pa_pageout() seems a valid case above. let me fix it in the > next patch. I did look a little bit more at the damon code and IIUC it: DAMON monitors the ranges it is asked to operate as regions and operate(reclaim) on the region that has less number of accesses, IOW, damon won't do pageout operation on a folio if it is really under use, CMIW. This is unlike the case with the madvise() operation where Workingset helps in accounting PSI that helps user the type of folios he is operating on. Assume that damon is operating on wrong set of regions and Workingset helps in giving a PSI. This got no help to user and just telling the internals of damon. No? Having said that, theoretically it seems correct to me to set workingset on folios as they leave the active list, but I don't have any strong reason to say what happens if we won't. Moreover, this patch is mostly talks about the madvise() operated folios not inline with the usual reclaim. May be a separate change can be raised for damon() operated folios once we agree upon the importance of Workingset to these folios. WDYT? Thanks,
diff --git a/mm/madvise.c b/mm/madvise.c index d9e7b42..76fb31f 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -413,6 +413,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, folio_clear_referenced(folio); folio_test_clear_young(folio); + folio_set_workingset(folio); if (pageout) { if (folio_isolate_lru(folio)) { if (folio_test_unevictable(folio)) @@ -512,6 +513,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, */ folio_clear_referenced(folio); folio_test_clear_young(folio); + folio_set_workingset(folio); if (pageout) { if (folio_isolate_lru(folio)) { if (folio_test_unevictable(folio))
A folio turns into a Workingset during: 1) shrink_active_list() placing the folio from active to inactive list. 2) When a workingset transition is happening during the folio refault. And when Workingset is set on a folio, PSI for memory can be accounted during a) That folio is being reclaimed and b) Refault of that folio. This accounting of PSI for memory is not consistent in the cases where clients use madvise(COLD/PAGEOUT) to deactivate or proactively reclaim a folio: a) A folio started at inactive and moved to active as part of accesses. Workingset is absent on the folio thus madvise(MADV_PAGEOUT) don't account such folios for PSI. b) When the same folio transition from inactive->active and then to inactive through shrink_active_list(). Workingset is set on the folio thus madvise(MADV_PAGEOUT) account such folios for PSI. c) When the same folio is part of active list directly as a result of folio refault and this was a workingset folio prior to eviction. Workingset is set on the folio thus both the operations of MADV_PAGEOUT and reclaim of the MADV_COLD operated folio account for PSI. d) madvise(MADV_COLD) transfers the folio from active list to inactive list. Such folios may not have the Workingset thus reclaim operation on such folio doesn't account for PSI. As said above, the MADV_PAGEOUT on a folio is accounts for memory PSI in b) and c) but not in a). Reclaim of a folio on which MADV_COLD is performed accounts memory PSI in c) but not in d) which is an inconsistent behaviour. Make this PSI accounting always consistent by turning a folio into a workingset one whenever it is leaving the active list. Also, accounting of PSI on a folio whenever it leaves the active list as part of the MADV_COLD/PAGEOUT operation helps the users whether they are operating on proper folios[1]. [1] https://lore.kernel.org/all/20230605180013.GD221380@cmpxchg.org/ Suggested-by: Suren Baghdasaryan <surenb@google.com> Reported-by: Sai Manobhiram Manapragada <quic_smanapra@quicinc.com> Reported-by: Pavan Kondeti <quic_pkondeti@quicinc.com> Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com> --- V2: Made changes as per the comments from Johannes/Suren. V1: https://lore.kernel.org/all/1685531374-6091-1-git-send-email-quic_charante@quicinc.com/ mm/madvise.c | 2 ++ 1 file changed, 2 insertions(+)