diff mbox series

mm/vmscan: check references from all memcgs for swapbacked memory

Message ID 20221004233446.787056-1-yosryahmed@google.com (mailing list archive)
State New
Headers show
Series mm/vmscan: check references from all memcgs for swapbacked memory | expand

Commit Message

Yosry Ahmed Oct. 4, 2022, 11:34 p.m. UTC
During page/folio reclaim, we check folio is referenced using
folio_referenced() to avoid reclaiming folios that have been recently
accessed (hot memory). The ratinale is that this memory is likely to be
accessed soon, and hence reclaiming it will cause a refault.

For memcg reclaim, we pass in sc->target_mem_cgroup to
folio_referenced(), which means we only check accesses to the folio
from processes in the subtree of the target memcg. This behavior was
originally introduced by commit bed7161a519a ("Memory controller: make
page_referenced() cgroup aware") a long time ago. Back then, refaulted
pages would get charged to the memcg of the process that was faulting them
in. It made sense to only consider accesses coming from processes in the
subtree of target_mem_cgroup. If a page was charged to memcg A but only
being accessed by a sibling memcg B, we would reclaim it if memcg A is
under pressure. memcg B can then fault it back in and get charged for it
appropriately.

Today, this behavior still makes sense for file pages. However, unlike
file pages, when swapbacked pages are refaulted they are charged to the
memcg that was originally charged for them during swapout. Which
means that if a swapbacked page is charged to memcg A but only used by
memcg B, and we reclaim it when memcg A is under pressure, it would
simply be faulted back in and charged again to memcg A once memcg B
accesses it. In that sense, accesses from all memcgs matter equally when
considering if a swapbacked page/folio is a viable reclaim target.

Add folio_referenced_memcg() which decides what memcg we should pass to
folio_referenced() based on the folio type, and includes an elaborate
comment about why we should do so. This should help reclaim make better
decision and reduce refaults when reclaiming swapbacked memory that is
used by multiple memcgs.

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
 mm/vmscan.c | 38 ++++++++++++++++++++++++++++++++++----
 1 file changed, 34 insertions(+), 4 deletions(-)

Comments

Yosry Ahmed Oct. 5, 2022, 2:11 a.m. UTC | #1
On Tue, Oct 4, 2022 at 4:34 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> During page/folio reclaim, we check folio is referenced using
> folio_referenced() to avoid reclaiming folios that have been recently
> accessed (hot memory). The ratinale is that this memory is likely to be

s/ratinale/rationale

> accessed soon, and hence reclaiming it will cause a refault.
>
> For memcg reclaim, we pass in sc->target_mem_cgroup to
> folio_referenced(), which means we only check accesses to the folio
> from processes in the subtree of the target memcg. This behavior was
> originally introduced by commit bed7161a519a ("Memory controller: make
> page_referenced() cgroup aware") a long time ago. Back then, refaulted
> pages would get charged to the memcg of the process that was faulting them
> in. It made sense to only consider accesses coming from processes in the
> subtree of target_mem_cgroup. If a page was charged to memcg A but only
> being accessed by a sibling memcg B, we would reclaim it if memcg A is
> under pressure. memcg B can then fault it back in and get charged for it
> appropriately.
>
> Today, this behavior still makes sense for file pages. However, unlike
> file pages, when swapbacked pages are refaulted they are charged to the
> memcg that was originally charged for them during swapout. Which
> means that if a swapbacked page is charged to memcg A but only used by
> memcg B, and we reclaim it when memcg A is under pressure, it would
> simply be faulted back in and charged again to memcg A once memcg B
> accesses it. In that sense, accesses from all memcgs matter equally when
> considering if a swapbacked page/folio is a viable reclaim target.
>
> Add folio_referenced_memcg() which decides what memcg we should pass to
> folio_referenced() based on the folio type, and includes an elaborate
> comment about why we should do so. This should help reclaim make better
> decision and reduce refaults when reclaiming swapbacked memory that is

s/decision/decisions

> used by multiple memcgs.
>
> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> ---
>  mm/vmscan.c | 38 ++++++++++++++++++++++++++++++++++----
>  1 file changed, 34 insertions(+), 4 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index c5a4bff11da6..f9fa0f9287e5 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1443,14 +1443,43 @@ enum folio_references {
>         FOLIOREF_ACTIVATE,
>  };
>
> +/* What memcg should we pass to folio_referenced()? */
> +static struct mem_cgroup *folio_referenced_memcg(struct folio *folio,
> +                                                struct mem_cgroup *target_memcg)
> +{
> +       /*
> +        * We check references to folios to make sure we don't reclaim hot
> +        * folios that are likely to be refaulted soon. We pass a memcg to
> +        * folio_referenced() to only check references coming from processes in
> +        * that memcg's subtree.
> +        *
> +        * For file folios, we only consider references from processes in the
> +        * subtree of the target memcg. If a folio is charged to
> +        * memcg A but is only referenced by processes in memcg B, we reclaim it
> +        * if memcg A is under pressure. If it is later accessed by memcg B it
> +        * will be faulted back in and charged to memcg B. For memcg A, this is
> +        * called memory that should be reclaimed.

s/called/cold

> +        *
> +        * On the other hand, when swapbacked folios are faulted in, they get
> +        * charged to the memcg that was originally charged for them at the time
> +        * of swapping out. This means that if a folio that is charged to
> +        * memcg A gets swapped out, it will get charged back to A when *any*
> +        * memcg accesses it. In that sense, we need to consider references from
> +        * *all* processes when considering whether to reclaim a swapbacked
> +        * folio.
> +        */
> +       return folio_test_swapbacked(folio) ? NULL : target_memcg;
> +}
> +
>  static enum folio_references folio_check_references(struct folio *folio,
>                                                   struct scan_control *sc)
>  {
>         int referenced_ptes, referenced_folio;
>         unsigned long vm_flags;
> +       struct mem_cgroup *memcg = folio_referenced_memcg(folio,
> +                                               sc->target_mem_cgroup);
>
> -       referenced_ptes = folio_referenced(folio, 1, sc->target_mem_cgroup,
> -                                          &vm_flags);
> +       referenced_ptes = folio_referenced(folio, 1, memcg, &vm_flags);
>         referenced_folio = folio_test_clear_referenced(folio);
>
>         /*
> @@ -2581,6 +2610,7 @@ static void shrink_active_list(unsigned long nr_to_scan,
>
>         while (!list_empty(&l_hold)) {
>                 struct folio *folio;
> +               struct mem_cgroup *memcg;
>
>                 cond_resched();
>                 folio = lru_to_folio(&l_hold);
> @@ -2600,8 +2630,8 @@ static void shrink_active_list(unsigned long nr_to_scan,
>                 }
>
>                 /* Referenced or rmap lock contention: rotate */
> -               if (folio_referenced(folio, 0, sc->target_mem_cgroup,
> -                                    &vm_flags) != 0) {
> +               memcg = folio_referenced_memcg(folio, sc->target_mem_cgroup);
> +               if (folio_referenced(folio, 0, memcg, &vm_flags) != 0) {
>                         /*
>                          * Identify referenced, file-backed active folios and
>                          * give them one more trip around the active list. So
> --
> 2.38.0.rc1.362.ged0d419d3c-goog
>
Johannes Weiner Oct. 5, 2022, 2:04 p.m. UTC | #2
Hi Yosry,

On Tue, Oct 04, 2022 at 11:34:46PM +0000, Yosry Ahmed wrote:
> During page/folio reclaim, we check folio is referenced using
> folio_referenced() to avoid reclaiming folios that have been recently
> accessed (hot memory). The ratinale is that this memory is likely to be
> accessed soon, and hence reclaiming it will cause a refault.
> 
> For memcg reclaim, we pass in sc->target_mem_cgroup to
> folio_referenced(), which means we only check accesses to the folio
> from processes in the subtree of the target memcg. This behavior was
> originally introduced by commit bed7161a519a ("Memory controller: make
> page_referenced() cgroup aware") a long time ago. Back then, refaulted
> pages would get charged to the memcg of the process that was faulting them
> in. It made sense to only consider accesses coming from processes in the
> subtree of target_mem_cgroup. If a page was charged to memcg A but only
> being accessed by a sibling memcg B, we would reclaim it if memcg A is
> under pressure. memcg B can then fault it back in and get charged for it
> appropriately.
> 
> Today, this behavior still makes sense for file pages. However, unlike
> file pages, when swapbacked pages are refaulted they are charged to the
> memcg that was originally charged for them during swapout. Which
> means that if a swapbacked page is charged to memcg A but only used by
> memcg B, and we reclaim it when memcg A is under pressure, it would
> simply be faulted back in and charged again to memcg A once memcg B
> accesses it. In that sense, accesses from all memcgs matter equally when
> considering if a swapbacked page/folio is a viable reclaim target.
> 
> Add folio_referenced_memcg() which decides what memcg we should pass to
> folio_referenced() based on the folio type, and includes an elaborate
> comment about why we should do so. This should help reclaim make better
> decision and reduce refaults when reclaiming swapbacked memory that is
> used by multiple memcgs.

Great observation, and I agree with this change.

Just one nitpick:

> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> ---
>  mm/vmscan.c | 38 ++++++++++++++++++++++++++++++++++----
>  1 file changed, 34 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index c5a4bff11da6..f9fa0f9287e5 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1443,14 +1443,43 @@ enum folio_references {
>  	FOLIOREF_ACTIVATE,
>  };
>  
> +/* What memcg should we pass to folio_referenced()? */
> +static struct mem_cgroup *folio_referenced_memcg(struct folio *folio,
> +						 struct mem_cgroup *target_memcg)
> +{
> +	/*
> +	 * We check references to folios to make sure we don't reclaim hot
> +	 * folios that are likely to be refaulted soon. We pass a memcg to
> +	 * folio_referenced() to only check references coming from processes in
> +	 * that memcg's subtree.
> +	 *
> +	 * For file folios, we only consider references from processes in the
> +	 * subtree of the target memcg. If a folio is charged to
> +	 * memcg A but is only referenced by processes in memcg B, we reclaim it
> +	 * if memcg A is under pressure. If it is later accessed by memcg B it
> +	 * will be faulted back in and charged to memcg B. For memcg A, this is
> +	 * called memory that should be reclaimed.
> +	 *
> +	 * On the other hand, when swapbacked folios are faulted in, they get
> +	 * charged to the memcg that was originally charged for them at the time
> +	 * of swapping out. This means that if a folio that is charged to
> +	 * memcg A gets swapped out, it will get charged back to A when *any*
> +	 * memcg accesses it. In that sense, we need to consider references from
> +	 * *all* processes when considering whether to reclaim a swapbacked
> +	 * folio.
> +	 */
> +	return folio_test_swapbacked(folio) ? NULL : target_memcg;
> +}
> +
>  static enum folio_references folio_check_references(struct folio *folio,
>  						  struct scan_control *sc)
>  {
>  	int referenced_ptes, referenced_folio;
>  	unsigned long vm_flags;
> +	struct mem_cgroup *memcg = folio_referenced_memcg(folio,
> +						sc->target_mem_cgroup);
>  
> -	referenced_ptes = folio_referenced(folio, 1, sc->target_mem_cgroup,
> -					   &vm_flags);
> +	referenced_ptes = folio_referenced(folio, 1, memcg, &vm_flags);
>  	referenced_folio = folio_test_clear_referenced(folio);
>  
>  	/*
> @@ -2581,6 +2610,7 @@ static void shrink_active_list(unsigned long nr_to_scan,
>  
>  	while (!list_empty(&l_hold)) {
>  		struct folio *folio;
> +		struct mem_cgroup *memcg;
>  
>  		cond_resched();
>  		folio = lru_to_folio(&l_hold);
> @@ -2600,8 +2630,8 @@ static void shrink_active_list(unsigned long nr_to_scan,
>  		}
>  
>  		/* Referenced or rmap lock contention: rotate */
> -		if (folio_referenced(folio, 0, sc->target_mem_cgroup,
> -				     &vm_flags) != 0) {
> +		memcg = folio_referenced_memcg(folio, sc->target_mem_cgroup);
> +		if (folio_referenced(folio, 0, memcg, &vm_flags) != 0) {

Would you mind moving this to folio_referenced() directly? There is
already a comment and branch in there that IMO would extend quite
naturally to cover the new exception:

	/*
	 * If we are reclaiming on behalf of a cgroup, skip
	 * counting on behalf of references from different
	 * cgroups
	 */
	if (memcg) {
		rwc.invalid_vma = invalid_folio_referenced_vma;
	}

That would keep the decision-making and doc in one place.

Thanks!
Johannnes
Yosry Ahmed Oct. 5, 2022, 2:54 p.m. UTC | #3
On Wed, Oct 5, 2022 at 7:04 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> Hi Yosry,
>
> On Tue, Oct 04, 2022 at 11:34:46PM +0000, Yosry Ahmed wrote:
> > During page/folio reclaim, we check folio is referenced using
> > folio_referenced() to avoid reclaiming folios that have been recently
> > accessed (hot memory). The ratinale is that this memory is likely to be
> > accessed soon, and hence reclaiming it will cause a refault.
> >
> > For memcg reclaim, we pass in sc->target_mem_cgroup to
> > folio_referenced(), which means we only check accesses to the folio
> > from processes in the subtree of the target memcg. This behavior was
> > originally introduced by commit bed7161a519a ("Memory controller: make
> > page_referenced() cgroup aware") a long time ago. Back then, refaulted
> > pages would get charged to the memcg of the process that was faulting them
> > in. It made sense to only consider accesses coming from processes in the
> > subtree of target_mem_cgroup. If a page was charged to memcg A but only
> > being accessed by a sibling memcg B, we would reclaim it if memcg A is
> > under pressure. memcg B can then fault it back in and get charged for it
> > appropriately.
> >
> > Today, this behavior still makes sense for file pages. However, unlike
> > file pages, when swapbacked pages are refaulted they are charged to the
> > memcg that was originally charged for them during swapout. Which
> > means that if a swapbacked page is charged to memcg A but only used by
> > memcg B, and we reclaim it when memcg A is under pressure, it would
> > simply be faulted back in and charged again to memcg A once memcg B
> > accesses it. In that sense, accesses from all memcgs matter equally when
> > considering if a swapbacked page/folio is a viable reclaim target.
> >
> > Add folio_referenced_memcg() which decides what memcg we should pass to
> > folio_referenced() based on the folio type, and includes an elaborate
> > comment about why we should do so. This should help reclaim make better
> > decision and reduce refaults when reclaiming swapbacked memory that is
> > used by multiple memcgs.
>
> Great observation, and I agree with this change.
>
> Just one nitpick:
>
> > Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> > ---
> >  mm/vmscan.c | 38 ++++++++++++++++++++++++++++++++++----
> >  1 file changed, 34 insertions(+), 4 deletions(-)
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index c5a4bff11da6..f9fa0f9287e5 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -1443,14 +1443,43 @@ enum folio_references {
> >       FOLIOREF_ACTIVATE,
> >  };
> >
> > +/* What memcg should we pass to folio_referenced()? */
> > +static struct mem_cgroup *folio_referenced_memcg(struct folio *folio,
> > +                                              struct mem_cgroup *target_memcg)
> > +{
> > +     /*
> > +      * We check references to folios to make sure we don't reclaim hot
> > +      * folios that are likely to be refaulted soon. We pass a memcg to
> > +      * folio_referenced() to only check references coming from processes in
> > +      * that memcg's subtree.
> > +      *
> > +      * For file folios, we only consider references from processes in the
> > +      * subtree of the target memcg. If a folio is charged to
> > +      * memcg A but is only referenced by processes in memcg B, we reclaim it
> > +      * if memcg A is under pressure. If it is later accessed by memcg B it
> > +      * will be faulted back in and charged to memcg B. For memcg A, this is
> > +      * called memory that should be reclaimed.
> > +      *
> > +      * On the other hand, when swapbacked folios are faulted in, they get
> > +      * charged to the memcg that was originally charged for them at the time
> > +      * of swapping out. This means that if a folio that is charged to
> > +      * memcg A gets swapped out, it will get charged back to A when *any*
> > +      * memcg accesses it. In that sense, we need to consider references from
> > +      * *all* processes when considering whether to reclaim a swapbacked
> > +      * folio.
> > +      */
> > +     return folio_test_swapbacked(folio) ? NULL : target_memcg;
> > +}
> > +
> >  static enum folio_references folio_check_references(struct folio *folio,
> >                                                 struct scan_control *sc)
> >  {
> >       int referenced_ptes, referenced_folio;
> >       unsigned long vm_flags;
> > +     struct mem_cgroup *memcg = folio_referenced_memcg(folio,
> > +                                             sc->target_mem_cgroup);
> >
> > -     referenced_ptes = folio_referenced(folio, 1, sc->target_mem_cgroup,
> > -                                        &vm_flags);
> > +     referenced_ptes = folio_referenced(folio, 1, memcg, &vm_flags);
> >       referenced_folio = folio_test_clear_referenced(folio);
> >
> >       /*
> > @@ -2581,6 +2610,7 @@ static void shrink_active_list(unsigned long nr_to_scan,
> >
> >       while (!list_empty(&l_hold)) {
> >               struct folio *folio;
> > +             struct mem_cgroup *memcg;
> >
> >               cond_resched();
> >               folio = lru_to_folio(&l_hold);
> > @@ -2600,8 +2630,8 @@ static void shrink_active_list(unsigned long nr_to_scan,
> >               }
> >
> >               /* Referenced or rmap lock contention: rotate */
> > -             if (folio_referenced(folio, 0, sc->target_mem_cgroup,
> > -                                  &vm_flags) != 0) {
> > +             memcg = folio_referenced_memcg(folio, sc->target_mem_cgroup);
> > +             if (folio_referenced(folio, 0, memcg, &vm_flags) != 0) {
>
> Would you mind moving this to folio_referenced() directly? There is
> already a comment and branch in there that IMO would extend quite
> naturally to cover the new exception:
>
>         /*
>          * If we are reclaiming on behalf of a cgroup, skip
>          * counting on behalf of references from different
>          * cgroups
>          */
>         if (memcg) {
>                 rwc.invalid_vma = invalid_folio_referenced_vma;
>         }
>
> That would keep the decision-making and doc in one place.

Hi Johannes,

Thanks for taking a look!

I originally wanted to make the change in folio_referenced(). My only
concern was that it wouldn't be clear for people looking at reclaim
code in mm/vmscan.c. It would appear as if we are passing in the
target memcg to folio_referenced(), and only if you look within you
would realize that sometimes it ignores the passed memcg.

It seemed to me that deciding whether we want to check references from
one memcg or all of them is a reclaim decision, while
folio_referenced() is just an rmap API that does what it is told: "if
I am passed a memcg, I only look at references coming from this
memcg". On the other hand, it looks like the doc has always lived in
folio_referenced()/page_referenced(), so I might be overthinking this
(I have been known to do this).

Anyway, I don't feel strongly, just wanted to share my thoughts on the
matter :)
Let me know what you prefer.

>
> Thanks!
> Johannnes
Johannes Weiner Oct. 5, 2022, 3:51 p.m. UTC | #4
On Wed, Oct 05, 2022 at 07:54:25AM -0700, Yosry Ahmed wrote:
> On Wed, Oct 5, 2022 at 7:04 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > Would you mind moving this to folio_referenced() directly? There is
> > already a comment and branch in there that IMO would extend quite
> > naturally to cover the new exception:
> >
> >         /*
> >          * If we are reclaiming on behalf of a cgroup, skip
> >          * counting on behalf of references from different
> >          * cgroups
> >          */
> >         if (memcg) {
> >                 rwc.invalid_vma = invalid_folio_referenced_vma;
> >         }
> >
> > That would keep the decision-making and doc in one place.
> 
> Hi Johannes,
> 
> Thanks for taking a look!
> 
> I originally wanted to make the change in folio_referenced(). My only
> concern was that it wouldn't be clear for people looking at reclaim
> code in mm/vmscan.c. It would appear as if we are passing in the
> target memcg to folio_referenced(), and only if you look within you
> would realize that sometimes it ignores the passed memcg.
> 
> It seemed to me that deciding whether we want to check references from
> one memcg or all of them is a reclaim decision, while
> folio_referenced() is just an rmap API that does what it is told: "if
> I am passed a memcg, I only look at references coming from this
> memcg". On the other hand, it looks like the doc has always lived in
> folio_referenced()/page_referenced(), so I might be overthinking this
> (I have been known to do this).

I agree it would be nicer to have this policy in vmscan.c. OTOH it's a
policy that applies to all folio_referenced() callers, and it's
fragile to require them to opt into it individually.

Vmscan is the only user of the function, so it's not the worst thing
to treat it as an extension of the reclaim code.

If it helps convince you, there is another, actually quite similar
reclaim policy already encoded in folio_referenced():

			if (ptep_clear_flush_young_notify(vma, address,
						pvmw.pte)) {
				/*
				 * Don't treat a reference through
				 * a sequentially read mapping as such.
				 * If the folio has been used in another mapping,
				 * we will catch it; if this other mapping is
				 * already gone, the unmap path will have set
				 * the referenced flag or activated the folio.
				 */
				if (likely(!(vma->vm_flags & VM_SEQ_READ)))
					referenced++;
			}
Yosry Ahmed Oct. 5, 2022, 3:55 p.m. UTC | #5
On Wed, Oct 5, 2022 at 8:51 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Wed, Oct 05, 2022 at 07:54:25AM -0700, Yosry Ahmed wrote:
> > On Wed, Oct 5, 2022 at 7:04 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > Would you mind moving this to folio_referenced() directly? There is
> > > already a comment and branch in there that IMO would extend quite
> > > naturally to cover the new exception:
> > >
> > >         /*
> > >          * If we are reclaiming on behalf of a cgroup, skip
> > >          * counting on behalf of references from different
> > >          * cgroups
> > >          */
> > >         if (memcg) {
> > >                 rwc.invalid_vma = invalid_folio_referenced_vma;
> > >         }
> > >
> > > That would keep the decision-making and doc in one place.
> >
> > Hi Johannes,
> >
> > Thanks for taking a look!
> >
> > I originally wanted to make the change in folio_referenced(). My only
> > concern was that it wouldn't be clear for people looking at reclaim
> > code in mm/vmscan.c. It would appear as if we are passing in the
> > target memcg to folio_referenced(), and only if you look within you
> > would realize that sometimes it ignores the passed memcg.
> >
> > It seemed to me that deciding whether we want to check references from
> > one memcg or all of them is a reclaim decision, while
> > folio_referenced() is just an rmap API that does what it is told: "if
> > I am passed a memcg, I only look at references coming from this
> > memcg". On the other hand, it looks like the doc has always lived in
> > folio_referenced()/page_referenced(), so I might be overthinking this
> > (I have been known to do this).
>
> I agree it would be nicer to have this policy in vmscan.c. OTOH it's a
> policy that applies to all folio_referenced() callers, and it's
> fragile to require them to opt into it individually.
>
> Vmscan is the only user of the function, so it's not the worst thing
> to treat it as an extension of the reclaim code.
>
> If it helps convince you, there is another, actually quite similar
> reclaim policy already encoded in folio_referenced():
>
>                         if (ptep_clear_flush_young_notify(vma, address,
>                                                 pvmw.pte)) {
>                                 /*
>                                  * Don't treat a reference through
>                                  * a sequentially read mapping as such.
>                                  * If the folio has been used in another mapping,
>                                  * we will catch it; if this other mapping is
>                                  * already gone, the unmap path will have set
>                                  * the referenced flag or activated the folio.
>                                  */
>                                 if (likely(!(vma->vm_flags & VM_SEQ_READ)))
>                                         referenced++;
>                         }

Thanks for clarifying. Will send v2 later today :)
diff mbox series

Patch

diff --git a/mm/vmscan.c b/mm/vmscan.c
index c5a4bff11da6..f9fa0f9287e5 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1443,14 +1443,43 @@  enum folio_references {
 	FOLIOREF_ACTIVATE,
 };
 
+/* What memcg should we pass to folio_referenced()? */
+static struct mem_cgroup *folio_referenced_memcg(struct folio *folio,
+						 struct mem_cgroup *target_memcg)
+{
+	/*
+	 * We check references to folios to make sure we don't reclaim hot
+	 * folios that are likely to be refaulted soon. We pass a memcg to
+	 * folio_referenced() to only check references coming from processes in
+	 * that memcg's subtree.
+	 *
+	 * For file folios, we only consider references from processes in the
+	 * subtree of the target memcg. If a folio is charged to
+	 * memcg A but is only referenced by processes in memcg B, we reclaim it
+	 * if memcg A is under pressure. If it is later accessed by memcg B it
+	 * will be faulted back in and charged to memcg B. For memcg A, this is
+	 * called memory that should be reclaimed.
+	 *
+	 * On the other hand, when swapbacked folios are faulted in, they get
+	 * charged to the memcg that was originally charged for them at the time
+	 * of swapping out. This means that if a folio that is charged to
+	 * memcg A gets swapped out, it will get charged back to A when *any*
+	 * memcg accesses it. In that sense, we need to consider references from
+	 * *all* processes when considering whether to reclaim a swapbacked
+	 * folio.
+	 */
+	return folio_test_swapbacked(folio) ? NULL : target_memcg;
+}
+
 static enum folio_references folio_check_references(struct folio *folio,
 						  struct scan_control *sc)
 {
 	int referenced_ptes, referenced_folio;
 	unsigned long vm_flags;
+	struct mem_cgroup *memcg = folio_referenced_memcg(folio,
+						sc->target_mem_cgroup);
 
-	referenced_ptes = folio_referenced(folio, 1, sc->target_mem_cgroup,
-					   &vm_flags);
+	referenced_ptes = folio_referenced(folio, 1, memcg, &vm_flags);
 	referenced_folio = folio_test_clear_referenced(folio);
 
 	/*
@@ -2581,6 +2610,7 @@  static void shrink_active_list(unsigned long nr_to_scan,
 
 	while (!list_empty(&l_hold)) {
 		struct folio *folio;
+		struct mem_cgroup *memcg;
 
 		cond_resched();
 		folio = lru_to_folio(&l_hold);
@@ -2600,8 +2630,8 @@  static void shrink_active_list(unsigned long nr_to_scan,
 		}
 
 		/* Referenced or rmap lock contention: rotate */
-		if (folio_referenced(folio, 0, sc->target_mem_cgroup,
-				     &vm_flags) != 0) {
+		memcg = folio_referenced_memcg(folio, sc->target_mem_cgroup);
+		if (folio_referenced(folio, 0, memcg, &vm_flags) != 0) {
 			/*
 			 * Identify referenced, file-backed active folios and
 			 * give them one more trip around the active list. So