diff mbox series

mm: add swappiness=max arg to memory.reclaim for only anon reclaim

Message ID 20250318135330.3358345-1-hezhongkun.hzk@bytedance.com (mailing list archive)
State New
Headers show
Series mm: add swappiness=max arg to memory.reclaim for only anon reclaim | expand

Commit Message

Zhongkun He March 18, 2025, 1:53 p.m. UTC
With this patch 'commit <68cd9050d871> ("mm: add swappiness= arg to
memory.reclaim")', we can submit an additional swappiness=<val> argument
to memory.reclaim. It is very useful because we can dynamically adjust
the reclamation ratio based on the anonymous folios and file folios of
each cgroup. For example,when swappiness is set to 0, we only reclaim
from file folios.

However,we have also encountered a new issue: when swappiness is set to
the MAX_SWAPPINESS, it may still only reclaim file folios.

So, we hope to add a new arg 'swappiness=max' in memory.reclaim where
proactive memory reclaim only reclaims from anonymous folios when
swappiness is set to max. The swappiness semantics from a user
perspective remain unchanged.

For example, something like this:

echo "2M swappiness=max" > /sys/fs/cgroup/memory.reclaim

will perform reclaim on the rootcg with a swappiness setting of 'max' (a
new mode) regardless of the file folios. Users have a more comprehensive
view of the application's memory distribution because there are many
metrics available. For example, if we find that a certain cgroup has a
large number of inactive anon folios, we can reclaim only those and skip
file folios, because with the zram/zswap, the IO tradeoff that
cache_trim_mode or other file first logic is making doesn't hold -
file refaults will cause IO, whereas anon decompression will not.

With this patch, the swappiness argument of memory.reclaim has a new
mode 'max', means reclaiming just from anonymous folios both in traditional
LRU and MGLRU.

Here is the previous discussion:
https://lore.kernel.org/all/20250314033350.1156370-1-hezhongkun.hzk@bytedance.com/
https://lore.kernel.org/all/20250312094337.2296278-1-hezhongkun.hzk@bytedance.com/

Suggested-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Signed-off-by: Zhongkun He <hezhongkun.hzk@bytedance.com>
---
 Documentation/admin-guide/cgroup-v2.rst |  4 ++++
 include/linux/swap.h                    |  4 ++++
 mm/memcontrol.c                         |  5 +++++
 mm/vmscan.c                             | 10 ++++++++++
 4 files changed, 23 insertions(+)

Comments

Yosry Ahmed March 18, 2025, 2:10 p.m. UTC | #1
On Tue, Mar 18, 2025 at 09:53:30PM +0800, Zhongkun He wrote:
> With this patch 'commit <68cd9050d871> ("mm: add swappiness= arg to
> memory.reclaim")', we can submit an additional swappiness=<val> argument
> to memory.reclaim. It is very useful because we can dynamically adjust
> the reclamation ratio based on the anonymous folios and file folios of
> each cgroup. For example,when swappiness is set to 0, we only reclaim
> from file folios.
> 
> However,we have also encountered a new issue: when swappiness is set to
> the MAX_SWAPPINESS, it may still only reclaim file folios.
> 
> So, we hope to add a new arg 'swappiness=max' in memory.reclaim where
> proactive memory reclaim only reclaims from anonymous folios when
> swappiness is set to max. The swappiness semantics from a user
> perspective remain unchanged.
> 
> For example, something like this:
> 
> echo "2M swappiness=max" > /sys/fs/cgroup/memory.reclaim
> 
> will perform reclaim on the rootcg with a swappiness setting of 'max' (a
> new mode) regardless of the file folios. Users have a more comprehensive
> view of the application's memory distribution because there are many
> metrics available. For example, if we find that a certain cgroup has a
> large number of inactive anon folios, we can reclaim only those and skip
> file folios, because with the zram/zswap, the IO tradeoff that
> cache_trim_mode or other file first logic is making doesn't hold -
> file refaults will cause IO, whereas anon decompression will not.
> 
> With this patch, the swappiness argument of memory.reclaim has a new
> mode 'max', means reclaiming just from anonymous folios both in traditional
> LRU and MGLRU.

Is MGLRU handled in this patch?

> 
> Here is the previous discussion:
> https://lore.kernel.org/all/20250314033350.1156370-1-hezhongkun.hzk@bytedance.com/
> https://lore.kernel.org/all/20250312094337.2296278-1-hezhongkun.hzk@bytedance.com/
> 
> Suggested-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> Signed-off-by: Zhongkun He <hezhongkun.hzk@bytedance.com>
> ---
>  Documentation/admin-guide/cgroup-v2.rst |  4 ++++
>  include/linux/swap.h                    |  4 ++++
>  mm/memcontrol.c                         |  5 +++++
>  mm/vmscan.c                             | 10 ++++++++++
>  4 files changed, 23 insertions(+)
> 
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index cb1b4e759b7e..c39ef4314499 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -1343,6 +1343,10 @@ The following nested keys are defined.
>  	same semantics as vm.swappiness applied to memcg reclaim with
>  	all the existing limitations and potential future extensions.
>  
> +	If set swappiness=max, memory reclamation will exclusively
> +	target the anonymous folio list for both traditional LRU and
> +	MGLRU reclamation algorithms.
> +

I don't think we need to specify LRU and MGLRU here. What about:

Setting swappiness=max exclusively reclaims anonymous memory.

>    memory.peak
>  	A read-write single value file which exists on non-root cgroups.
>  
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index b13b72645db3..a94efac10fe5 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -419,6 +419,10 @@ extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
>  #define MEMCG_RECLAIM_PROACTIVE (1 << 2)
>  #define MIN_SWAPPINESS 0
>  #define MAX_SWAPPINESS 200
> +
> +/* Just recliam from anon folios in proactive memory reclaim */
> +#define ONLY_ANON_RECLAIM_MODE (MAX_SWAPPINESS + 1)
> +

This is a swappiness value so let's keep that clear, e.g.
SWAPPINESS_ANON_ONLY or similar.

>  extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg,
>  						  unsigned long nr_pages,
>  						  gfp_t gfp_mask,
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 4de6acb9b8ec..0d0400f141d1 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -4291,11 +4291,13 @@ static ssize_t memory_oom_group_write(struct kernfs_open_file *of,
>  
>  enum {
>  	MEMORY_RECLAIM_SWAPPINESS = 0,
> +	MEMORY_RECLAIM_ONLY_ANON_MODE,
>  	MEMORY_RECLAIM_NULL,
>  };
>  
>  static const match_table_t tokens = {
>  	{ MEMORY_RECLAIM_SWAPPINESS, "swappiness=%d"},
> +	{ MEMORY_RECLAIM_ONLY_ANON_MODE, "swappiness=max"},

MEMORY_RECLAIM_SWAPPINESS_MAX?

>  	{ MEMORY_RECLAIM_NULL, NULL },
>  };
>  
> @@ -4329,6 +4331,9 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf,
>  			if (swappiness < MIN_SWAPPINESS || swappiness > MAX_SWAPPINESS)
>  				return -EINVAL;
>  			break;
> +		case MEMORY_RECLAIM_ONLY_ANON_MODE:
> +			swappiness = ONLY_ANON_RECLAIM_MODE;
> +			break;
>  		default:
>  			return -EINVAL;
>  		}
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index c767d71c43d7..779a9a3cf715 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2438,6 +2438,16 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
>  		goto out;
>  	}
>  
> +	/*
> +	 * Do not bother scanning file folios if the memory reclaim
> +	 * invoked by userspace through memory.reclaim and set
> +	 * 'swappiness=max'.
> +	 */

/* Proactive reclaim initiated by userspace for anonymous memory only */

> +	if (sc->proactive && (swappiness == ONLY_ANON_RECLAIM_MODE)) {

Do we need to check sc->proactive here? Supposedly this swappiness value
can only be passed in from proactive reclaim. Instead of silently
ignoring the value from other paths, I wonder if we should WARN on
!sc->proactive instead.

> +		scan_balance = SCAN_ANON;
> +		goto out;
> +	}
> +
>  	/*
>  	 * Do not apply any pressure balancing cleverness when the
>  	 * system is close to OOM, scan both anon and file equally
> -- 
> 2.39.5
>
Zhongkun He March 19, 2025, 2:34 a.m. UTC | #2
On Tue, Mar 18, 2025 at 10:10 PM Yosry Ahmed <yosry.ahmed@linux.dev> wrote:
>
> On Tue, Mar 18, 2025 at 09:53:30PM +0800, Zhongkun He wrote:
> > With this patch 'commit <68cd9050d871> ("mm: add swappiness= arg to
> > memory.reclaim")', we can submit an additional swappiness=<val> argument
> > to memory.reclaim. It is very useful because we can dynamically adjust
> > the reclamation ratio based on the anonymous folios and file folios of
> > each cgroup. For example,when swappiness is set to 0, we only reclaim
> > from file folios.
> >
> > However,we have also encountered a new issue: when swappiness is set to
> > the MAX_SWAPPINESS, it may still only reclaim file folios.
> >
> > So, we hope to add a new arg 'swappiness=max' in memory.reclaim where
> > proactive memory reclaim only reclaims from anonymous folios when
> > swappiness is set to max. The swappiness semantics from a user
> > perspective remain unchanged.
> >
> > For example, something like this:
> >
> > echo "2M swappiness=max" > /sys/fs/cgroup/memory.reclaim
> >
> > will perform reclaim on the rootcg with a swappiness setting of 'max' (a
> > new mode) regardless of the file folios. Users have a more comprehensive
> > view of the application's memory distribution because there are many
> > metrics available. For example, if we find that a certain cgroup has a
> > large number of inactive anon folios, we can reclaim only those and skip
> > file folios, because with the zram/zswap, the IO tradeoff that
> > cache_trim_mode or other file first logic is making doesn't hold -
> > file refaults will cause IO, whereas anon decompression will not.
> >
> > With this patch, the swappiness argument of memory.reclaim has a new
> > mode 'max', means reclaiming just from anonymous folios both in traditional
> > LRU and MGLRU.
>
> Is MGLRU handled in this patch?

Yes, The value of ONLY_ANON_RECLAIM_MODE is 201, and the MGLRU select the
evictable type like this:

#define evictable_min_seq(min_seq, swappiness)              \
    min((min_seq)[!(swappiness)], (min_seq)[(swappiness) <= MAX_SWAPPINESS])

#define for_each_evictable_type(type, swappiness)           \
    for ((type) = !(swappiness); (type) <= ((swappiness) <=
MAX_SWAPPINESS); (type)++)

if the swappiness=0, the type is LRU_GEN_FILE(1);

if the swappiness=201 (>MAX_SWAPPINESS),
  for ((type) = 0; (type) <= 0); (type)++)
The type is always LRU_GEN_ANON(0).

>
> >
> > Here is the previous discussion:
> > https://lore.kernel.org/all/20250314033350.1156370-1-hezhongkun.hzk@bytedance.com/
> > https://lore.kernel.org/all/20250312094337.2296278-1-hezhongkun.hzk@bytedance.com/
> >
> > Suggested-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> > Signed-off-by: Zhongkun He <hezhongkun.hzk@bytedance.com>
> > ---
> >  Documentation/admin-guide/cgroup-v2.rst |  4 ++++
> >  include/linux/swap.h                    |  4 ++++
> >  mm/memcontrol.c                         |  5 +++++
> >  mm/vmscan.c                             | 10 ++++++++++
> >  4 files changed, 23 insertions(+)
> >
> > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> > index cb1b4e759b7e..c39ef4314499 100644
> > --- a/Documentation/admin-guide/cgroup-v2.rst
> > +++ b/Documentation/admin-guide/cgroup-v2.rst
> > @@ -1343,6 +1343,10 @@ The following nested keys are defined.
> >       same semantics as vm.swappiness applied to memcg reclaim with
> >       all the existing limitations and potential future extensions.
> >
> > +     If set swappiness=max, memory reclamation will exclusively
> > +     target the anonymous folio list for both traditional LRU and
> > +     MGLRU reclamation algorithms.
> > +
>
> I don't think we need to specify LRU and MGLRU here. What about:
>
> Setting swappiness=max exclusively reclaims anonymous memory.
>

Agree, thanks.

> >    memory.peak
> >       A read-write single value file which exists on non-root cgroups.
> >
> > diff --git a/include/linux/swap.h b/include/linux/swap.h
> > index b13b72645db3..a94efac10fe5 100644
> > --- a/include/linux/swap.h
> > +++ b/include/linux/swap.h
> > @@ -419,6 +419,10 @@ extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
> >  #define MEMCG_RECLAIM_PROACTIVE (1 << 2)
> >  #define MIN_SWAPPINESS 0
> >  #define MAX_SWAPPINESS 200
> > +
> > +/* Just recliam from anon folios in proactive memory reclaim */
> > +#define ONLY_ANON_RECLAIM_MODE (MAX_SWAPPINESS + 1)
> > +
>
> This is a swappiness value so let's keep that clear, e.g.
> SWAPPINESS_ANON_ONLY or similar.
>

OK.

> >  extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg,
> >                                                 unsigned long nr_pages,
> >                                                 gfp_t gfp_mask,
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 4de6acb9b8ec..0d0400f141d1 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -4291,11 +4291,13 @@ static ssize_t memory_oom_group_write(struct kernfs_open_file *of,
> >
> >  enum {
> >       MEMORY_RECLAIM_SWAPPINESS = 0,
> > +     MEMORY_RECLAIM_ONLY_ANON_MODE,
> >       MEMORY_RECLAIM_NULL,
> >  };
> >
> >  static const match_table_t tokens = {
> >       { MEMORY_RECLAIM_SWAPPINESS, "swappiness=%d"},
> > +     { MEMORY_RECLAIM_ONLY_ANON_MODE, "swappiness=max"},
>
> MEMORY_RECLAIM_SWAPPINESS_MAX?
>

OK.

> >       { MEMORY_RECLAIM_NULL, NULL },
> >  };
> >
> > @@ -4329,6 +4331,9 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf,
> >                       if (swappiness < MIN_SWAPPINESS || swappiness > MAX_SWAPPINESS)
> >                               return -EINVAL;
> >                       break;
> > +             case MEMORY_RECLAIM_ONLY_ANON_MODE:
> > +                     swappiness = ONLY_ANON_RECLAIM_MODE;
> > +                     break;
> >               default:
> >                       return -EINVAL;
> >               }
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index c767d71c43d7..779a9a3cf715 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -2438,6 +2438,16 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
> >               goto out;
> >       }
> >
> > +     /*
> > +      * Do not bother scanning file folios if the memory reclaim
> > +      * invoked by userspace through memory.reclaim and set
> > +      * 'swappiness=max'.
> > +      */
>
> /* Proactive reclaim initiated by userspace for anonymous memory only */
>

Looks clearer.

> > +     if (sc->proactive && (swappiness == ONLY_ANON_RECLAIM_MODE)) {
>
> Do we need to check sc->proactive here? Supposedly this swappiness value
> can only be passed in from proactive reclaim. Instead of silently
> ignoring the value from other paths, I wonder if we should WARN on
> !sc->proactive instead.
>

I'm also hesitating on how to handle this judgment. WARN looks good.

> > +             scan_balance = SCAN_ANON;
> > +             goto out;
> > +     }
> > +
> >       /*
> >        * Do not apply any pressure balancing cleverness when the
> >        * system is close to OOM, scan both anon and file equally
> > --
> > 2.39.5
> >
Yosry Ahmed March 19, 2025, 5:28 a.m. UTC | #3
On Wed, Mar 19, 2025 at 10:34:54AM +0800, Zhongkun He wrote:
> On Tue, Mar 18, 2025 at 10:10 PM Yosry Ahmed <yosry.ahmed@linux.dev> wrote:
> >
> > On Tue, Mar 18, 2025 at 09:53:30PM +0800, Zhongkun He wrote:
> > > With this patch 'commit <68cd9050d871> ("mm: add swappiness= arg to
> > > memory.reclaim")', we can submit an additional swappiness=<val> argument
> > > to memory.reclaim. It is very useful because we can dynamically adjust
> > > the reclamation ratio based on the anonymous folios and file folios of
> > > each cgroup. For example,when swappiness is set to 0, we only reclaim
> > > from file folios.
> > >
> > > However,we have also encountered a new issue: when swappiness is set to
> > > the MAX_SWAPPINESS, it may still only reclaim file folios.
> > >
> > > So, we hope to add a new arg 'swappiness=max' in memory.reclaim where
> > > proactive memory reclaim only reclaims from anonymous folios when
> > > swappiness is set to max. The swappiness semantics from a user
> > > perspective remain unchanged.
> > >
> > > For example, something like this:
> > >
> > > echo "2M swappiness=max" > /sys/fs/cgroup/memory.reclaim
> > >
> > > will perform reclaim on the rootcg with a swappiness setting of 'max' (a
> > > new mode) regardless of the file folios. Users have a more comprehensive
> > > view of the application's memory distribution because there are many
> > > metrics available. For example, if we find that a certain cgroup has a
> > > large number of inactive anon folios, we can reclaim only those and skip
> > > file folios, because with the zram/zswap, the IO tradeoff that
> > > cache_trim_mode or other file first logic is making doesn't hold -
> > > file refaults will cause IO, whereas anon decompression will not.
> > >
> > > With this patch, the swappiness argument of memory.reclaim has a new
> > > mode 'max', means reclaiming just from anonymous folios both in traditional
> > > LRU and MGLRU.
> >
> > Is MGLRU handled in this patch?
> 
> Yes, The value of ONLY_ANON_RECLAIM_MODE is 201, and the MGLRU select the
> evictable type like this:
> 
> #define evictable_min_seq(min_seq, swappiness)              \
>     min((min_seq)[!(swappiness)], (min_seq)[(swappiness) <= MAX_SWAPPINESS])
> 
> #define for_each_evictable_type(type, swappiness)           \
>     for ((type) = !(swappiness); (type) <= ((swappiness) <=
> MAX_SWAPPINESS); (type)++)
> 
> if the swappiness=0, the type is LRU_GEN_FILE(1);
> 
> if the swappiness=201 (>MAX_SWAPPINESS),
>   for ((type) = 0; (type) <= 0); (type)++)
> The type is always LRU_GEN_ANON(0).

Zhongkun, I see that you already sent a new version. Please wait until
discussions on a patch are resolved before sending out newer versions,
and allow more time for reviews in general.

I think this is too subtle, and it's easy to miss. Looking at the MGLRU
code it seems like there's a lot of swappiness <= MAX_SWAPPINESS checks,
and I am not sure why these already exist given that swappiness should
never exceed MAX_SWAPPINESS before this change.

Are there other parts of the MGLRU code that are already using
swappiness values > MAX_SWAPPINESS?

Yu, could you help us making things clearer here? I would like to avoid
relying on current implementation details that could easily be missed
when making changes. Ideally we'd explicitly check for
SWAPPINESS_ANON_ONLY.
Zhongkun He March 19, 2025, 12:52 p.m. UTC | #4
On Wed, Mar 19, 2025 at 1:29 PM Yosry Ahmed <yosry.ahmed@linux.dev> wrote:
>
> On Wed, Mar 19, 2025 at 10:34:54AM +0800, Zhongkun He wrote:
> > On Tue, Mar 18, 2025 at 10:10 PM Yosry Ahmed <yosry.ahmed@linux.dev> wrote:
> > >
> > > On Tue, Mar 18, 2025 at 09:53:30PM +0800, Zhongkun He wrote:
> > > > With this patch 'commit <68cd9050d871> ("mm: add swappiness= arg to
> > > > memory.reclaim")', we can submit an additional swappiness=<val> argument
> > > > to memory.reclaim. It is very useful because we can dynamically adjust
> > > > the reclamation ratio based on the anonymous folios and file folios of
> > > > each cgroup. For example,when swappiness is set to 0, we only reclaim
> > > > from file folios.
> > > >
> > > > However,we have also encountered a new issue: when swappiness is set to
> > > > the MAX_SWAPPINESS, it may still only reclaim file folios.
> > > >
> > > > So, we hope to add a new arg 'swappiness=max' in memory.reclaim where
> > > > proactive memory reclaim only reclaims from anonymous folios when
> > > > swappiness is set to max. The swappiness semantics from a user
> > > > perspective remain unchanged.
> > > >
> > > > For example, something like this:
> > > >
> > > > echo "2M swappiness=max" > /sys/fs/cgroup/memory.reclaim
> > > >
> > > > will perform reclaim on the rootcg with a swappiness setting of 'max' (a
> > > > new mode) regardless of the file folios. Users have a more comprehensive
> > > > view of the application's memory distribution because there are many
> > > > metrics available. For example, if we find that a certain cgroup has a
> > > > large number of inactive anon folios, we can reclaim only those and skip
> > > > file folios, because with the zram/zswap, the IO tradeoff that
> > > > cache_trim_mode or other file first logic is making doesn't hold -
> > > > file refaults will cause IO, whereas anon decompression will not.
> > > >
> > > > With this patch, the swappiness argument of memory.reclaim has a new
> > > > mode 'max', means reclaiming just from anonymous folios both in traditional
> > > > LRU and MGLRU.
> > >
> > > Is MGLRU handled in this patch?
> >
> > Yes, The value of ONLY_ANON_RECLAIM_MODE is 201, and the MGLRU select the
> > evictable type like this:
> >
> > #define evictable_min_seq(min_seq, swappiness)              \
> >     min((min_seq)[!(swappiness)], (min_seq)[(swappiness) <= MAX_SWAPPINESS])
> >
> > #define for_each_evictable_type(type, swappiness)           \
> >     for ((type) = !(swappiness); (type) <= ((swappiness) <=
> > MAX_SWAPPINESS); (type)++)
> >
> > if the swappiness=0, the type is LRU_GEN_FILE(1);
> >
> > if the swappiness=201 (>MAX_SWAPPINESS),
> >   for ((type) = 0; (type) <= 0); (type)++)
> > The type is always LRU_GEN_ANON(0).
>
> Zhongkun, I see that you already sent a new version. Please wait until
> discussions on a patch are resolved before sending out newer versions,
> and allow more time for reviews in general.

Got it, thanks.

>
> I think this is too subtle, and it's easy to miss. Looking at the MGLRU
> code it seems like there's a lot of swappiness <= MAX_SWAPPINESS checks,
> and I am not sure why these already exist given that swappiness should
> never exceed MAX_SWAPPINESS before this change.
>
> Are there other parts of the MGLRU code that are already using
> swappiness values > MAX_SWAPPINESS?

IIUC, The MGLRU can already use the value of MAX_SWAPPINESS + 1 to
reclaim only anonymous folios. Please have a look:
 lru_gen_seq_write()->run_cmd():
    else if (swappiness > MAX_SWAPPINESS + 1)
        goto done;  /*so MAX_SWAPPINESS + 1 is OK */

in inc_min_seq():
  if (type ? swappiness > MAX_SWAPPINESS : !swappiness)
        goto done;  //skip LRU_GEN_FILE when swappiness is
                           //MAX_SWAPPINESS + 1

//Skip  LRU_GEN_FILE when swappiness is MAX_SWAPPINESS + 1.
 #define for_each_evictable_type(type, swappiness)           \
     for ((type) = !(swappiness); (type) <= ((swappiness) <=
    MAX_SWAPPINESS); (type)++)

So the /sys/kernel/debug/lru_gen can accept the value of swappiness + 1
for proactive reclamation, meaning it only reclaims anonymous pages.

But the above statement is just my guess. It would be great if Yu could clarify.
If my description is incorrect, please correct me.

>
> Yu, could you help us making things clearer here? I would like to avoid
> relying on current implementation details that could easily be missed
> when making changes. Ideally we'd explicitly check for
> SWAPPINESS_ANON_ONLY.
>

Looking forward to Yu's reply.

Thanks.
diff mbox series

Patch

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index cb1b4e759b7e..c39ef4314499 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1343,6 +1343,10 @@  The following nested keys are defined.
 	same semantics as vm.swappiness applied to memcg reclaim with
 	all the existing limitations and potential future extensions.
 
+	If set swappiness=max, memory reclamation will exclusively
+	target the anonymous folio list for both traditional LRU and
+	MGLRU reclamation algorithms.
+
   memory.peak
 	A read-write single value file which exists on non-root cgroups.
 
diff --git a/include/linux/swap.h b/include/linux/swap.h
index b13b72645db3..a94efac10fe5 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -419,6 +419,10 @@  extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
 #define MEMCG_RECLAIM_PROACTIVE (1 << 2)
 #define MIN_SWAPPINESS 0
 #define MAX_SWAPPINESS 200
+
+/* Just recliam from anon folios in proactive memory reclaim */
+#define ONLY_ANON_RECLAIM_MODE (MAX_SWAPPINESS + 1)
+
 extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg,
 						  unsigned long nr_pages,
 						  gfp_t gfp_mask,
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 4de6acb9b8ec..0d0400f141d1 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4291,11 +4291,13 @@  static ssize_t memory_oom_group_write(struct kernfs_open_file *of,
 
 enum {
 	MEMORY_RECLAIM_SWAPPINESS = 0,
+	MEMORY_RECLAIM_ONLY_ANON_MODE,
 	MEMORY_RECLAIM_NULL,
 };
 
 static const match_table_t tokens = {
 	{ MEMORY_RECLAIM_SWAPPINESS, "swappiness=%d"},
+	{ MEMORY_RECLAIM_ONLY_ANON_MODE, "swappiness=max"},
 	{ MEMORY_RECLAIM_NULL, NULL },
 };
 
@@ -4329,6 +4331,9 @@  static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf,
 			if (swappiness < MIN_SWAPPINESS || swappiness > MAX_SWAPPINESS)
 				return -EINVAL;
 			break;
+		case MEMORY_RECLAIM_ONLY_ANON_MODE:
+			swappiness = ONLY_ANON_RECLAIM_MODE;
+			break;
 		default:
 			return -EINVAL;
 		}
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c767d71c43d7..779a9a3cf715 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2438,6 +2438,16 @@  static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
 		goto out;
 	}
 
+	/*
+	 * Do not bother scanning file folios if the memory reclaim
+	 * invoked by userspace through memory.reclaim and set
+	 * 'swappiness=max'.
+	 */
+	if (sc->proactive && (swappiness == ONLY_ANON_RECLAIM_MODE)) {
+		scan_balance = SCAN_ANON;
+		goto out;
+	}
+
 	/*
 	 * Do not apply any pressure balancing cleverness when the
 	 * system is close to OOM, scan both anon and file equally