diff mbox series

[v2,4/9] mm/swap: always account swapped in page into current memcg

Message ID 20240102175338.62012-5-ryncsn@gmail.com (mailing list archive)
State New
Headers show
Series swapin refactor for optimization and unified readahead | expand

Commit Message

Kairui Song Jan. 2, 2024, 5:53 p.m. UTC
From: Kairui Song <kasong@tencent.com>

Currently, mem_cgroup_swapin_charge_folio is always called with
mm argument as NULL, except in swapin_direct.

swapin_direct is used when swapin should skip readahead and
swapcache (SWP_SYNCHRONOUS_IO). Other caller paths of
mem_cgroup_swapin_charge_folio are for swapin that should
not skip readahead and cache.

This could cause swapin charging to behave differently depending
on swap device. This currently didn't happen because the only call
path of swapin_direct is the direct anon page fault path, where mm
equals to current->mm, but will no longer be true if swapin_direct
is shared and have other callers (eg, swapoff).

So make swapin_direct also passes NULL for mm, no feature change.

Signed-off-by: Kairui Song <kasong@tencent.com>
---
 mm/swap_state.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Huang, Ying Jan. 5, 2024, 7:14 a.m. UTC | #1
Kairui Song <ryncsn@gmail.com> writes:

> From: Kairui Song <kasong@tencent.com>
>
> Currently, mem_cgroup_swapin_charge_folio is always called with
> mm argument as NULL, except in swapin_direct.
>
> swapin_direct is used when swapin should skip readahead and
> swapcache (SWP_SYNCHRONOUS_IO). Other caller paths of
> mem_cgroup_swapin_charge_folio are for swapin that should
> not skip readahead and cache.
>
> This could cause swapin charging to behave differently depending
> on swap device. This currently didn't happen because the only call
> path of swapin_direct is the direct anon page fault path, where mm
> equals to current->mm, but will no longer be true if swapin_direct
> is shared and have other callers (eg, swapoff).
>
> So make swapin_direct also passes NULL for mm, no feature change.
>
> Signed-off-by: Kairui Song <kasong@tencent.com>
> ---
>  mm/swap_state.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index 6130de8d5226..d39c5369da21 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -881,7 +881,7 @@ struct folio *swapin_direct(swp_entry_t entry, gfp_t gfp_mask,
>  	folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0,
>  				vma, vmf->address, false);
>  	if (folio) {
> -		if (mem_cgroup_swapin_charge_folio(folio, vma->vm_mm,
> +		if (mem_cgroup_swapin_charge_folio(folio, NULL,
>  						   GFP_KERNEL, entry)) {
>  			folio_put(folio);
>  			return NULL;

I think that why not provide "mm" when it's available?  For
swapin_direct() called by do_swap_page(), mm can be provided.  While,
for swapin_direct() called by shmem swapin, mm will be NULL.  We can
even provide "mm" for __read_swap_cache_async() for VMA based swapin and
for the fault address for cluster swapin.

--
Best Regards,
Huang, Ying
Kairui Song Jan. 5, 2024, 7:33 a.m. UTC | #2
Huang, Ying <ying.huang@intel.com> 于2024年1月5日周五 15:16写道:
>
> Kairui Song <ryncsn@gmail.com> writes:
>
> > From: Kairui Song <kasong@tencent.com>
> >
> > Currently, mem_cgroup_swapin_charge_folio is always called with
> > mm argument as NULL, except in swapin_direct.
> >
> > swapin_direct is used when swapin should skip readahead and
> > swapcache (SWP_SYNCHRONOUS_IO). Other caller paths of
> > mem_cgroup_swapin_charge_folio are for swapin that should
> > not skip readahead and cache.
> >
> > This could cause swapin charging to behave differently depending
> > on swap device. This currently didn't happen because the only call
> > path of swapin_direct is the direct anon page fault path, where mm
> > equals to current->mm, but will no longer be true if swapin_direct
> > is shared and have other callers (eg, swapoff).
> >
> > So make swapin_direct also passes NULL for mm, no feature change.
> >
> > Signed-off-by: Kairui Song <kasong@tencent.com>
> > ---
> >  mm/swap_state.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/mm/swap_state.c b/mm/swap_state.c
> > index 6130de8d5226..d39c5369da21 100644
> > --- a/mm/swap_state.c
> > +++ b/mm/swap_state.c
> > @@ -881,7 +881,7 @@ struct folio *swapin_direct(swp_entry_t entry, gfp_t gfp_mask,
> >       folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0,
> >                               vma, vmf->address, false);
> >       if (folio) {
> > -             if (mem_cgroup_swapin_charge_folio(folio, vma->vm_mm,
> > +             if (mem_cgroup_swapin_charge_folio(folio, NULL,
> >                                                  GFP_KERNEL, entry)) {
> >                       folio_put(folio);
> >                       return NULL;
>
> I think that why not provide "mm" when it's available?  For
> swapin_direct() called by do_swap_page(), mm can be provided.  While,
> for swapin_direct() called by shmem swapin, mm will be NULL.  We can
> even provide "mm" for __read_swap_cache_async() for VMA based swapin and
> for the fault address for cluster swapin.

Hi, Ying.

Thanks for the comment.

Without modifying too much code, providing mm here will change swapin
charge behaviour on swapoff, we discussed it previously:
https://lkml.org/lkml/2023/11/19/320

If we want to avoid the behavior change, we have to extend all direct
and indirect callers of mem_cgroup_swapin_charge_folio to accept a
standalone mm argument (including swapin_direct, swap_vma_readahead,
swap_cluster_readahead, __read_swap_cache_async, swapin_entry_mpol,
read_swap_cache_async, and some other path may need more audit), and
sort things our case by case. I'm not sure if that's a good idea...

Simply dropping it here seems the easiest way to avoid such change.
Huang, Ying Jan. 8, 2024, 7:44 a.m. UTC | #3
Kairui Song <ryncsn@gmail.com> writes:

> Huang, Ying <ying.huang@intel.com> 于2024年1月5日周五 15:16写道:
>>
>> Kairui Song <ryncsn@gmail.com> writes:
>>
>> > From: Kairui Song <kasong@tencent.com>
>> >
>> > Currently, mem_cgroup_swapin_charge_folio is always called with
>> > mm argument as NULL, except in swapin_direct.
>> >
>> > swapin_direct is used when swapin should skip readahead and
>> > swapcache (SWP_SYNCHRONOUS_IO). Other caller paths of
>> > mem_cgroup_swapin_charge_folio are for swapin that should
>> > not skip readahead and cache.
>> >
>> > This could cause swapin charging to behave differently depending
>> > on swap device. This currently didn't happen because the only call
>> > path of swapin_direct is the direct anon page fault path, where mm
>> > equals to current->mm, but will no longer be true if swapin_direct
>> > is shared and have other callers (eg, swapoff).
>> >
>> > So make swapin_direct also passes NULL for mm, no feature change.
>> >
>> > Signed-off-by: Kairui Song <kasong@tencent.com>
>> > ---
>> >  mm/swap_state.c | 2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/mm/swap_state.c b/mm/swap_state.c
>> > index 6130de8d5226..d39c5369da21 100644
>> > --- a/mm/swap_state.c
>> > +++ b/mm/swap_state.c
>> > @@ -881,7 +881,7 @@ struct folio *swapin_direct(swp_entry_t entry, gfp_t gfp_mask,
>> >       folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0,
>> >                               vma, vmf->address, false);
>> >       if (folio) {
>> > -             if (mem_cgroup_swapin_charge_folio(folio, vma->vm_mm,
>> > +             if (mem_cgroup_swapin_charge_folio(folio, NULL,
>> >                                                  GFP_KERNEL, entry)) {
>> >                       folio_put(folio);
>> >                       return NULL;
>>
>> I think that why not provide "mm" when it's available?  For
>> swapin_direct() called by do_swap_page(), mm can be provided.  While,
>> for swapin_direct() called by shmem swapin, mm will be NULL.  We can
>> even provide "mm" for __read_swap_cache_async() for VMA based swapin and
>> for the fault address for cluster swapin.
>
> Hi, Ying.
>
> Thanks for the comment.
>
> Without modifying too much code, providing mm here will change swapin
> charge behaviour on swapoff, we discussed it previously:
> https://lkml.org/lkml/2023/11/19/320

It's better to use "lore" for kernel email link, for example,

https://lore.kernel.org/lkml/20231119194740.94101-24-ryncsn@gmail.com/

This is more readable than the above link.

> If we want to avoid the behavior change, we have to extend all direct
> and indirect callers of mem_cgroup_swapin_charge_folio to accept a
> standalone mm argument (including swapin_direct, swap_vma_readahead,
> swap_cluster_readahead, __read_swap_cache_async, swapin_entry_mpol,
> read_swap_cache_async, and some other path may need more audit), and
> sort things our case by case. I'm not sure if that's a good idea...
>
> Simply dropping it here seems the easiest way to avoid such change.

OK.  Didn't realize that they are related directly.  It appears that we
always pass NULL as mm to mem_cgroup_swapin_charge_folio().  If so,
please just remove the parameter.

--
Best Regards,
Huang, Ying
Kairui Song Jan. 9, 2024, 9:42 a.m. UTC | #4
Huang, Ying <ying.huang@intel.com> 于2024年1月8日周一 15:46写道:
>
> Kairui Song <ryncsn@gmail.com> writes:
>
> > Huang, Ying <ying.huang@intel.com> 于2024年1月5日周五 15:16写道:
> >>
> >> Kairui Song <ryncsn@gmail.com> writes:
> >>
> >> > From: Kairui Song <kasong@tencent.com>
> >> >
> >> > Currently, mem_cgroup_swapin_charge_folio is always called with
> >> > mm argument as NULL, except in swapin_direct.
> >> >
> >> > swapin_direct is used when swapin should skip readahead and
> >> > swapcache (SWP_SYNCHRONOUS_IO). Other caller paths of
> >> > mem_cgroup_swapin_charge_folio are for swapin that should
> >> > not skip readahead and cache.
> >> >
> >> > This could cause swapin charging to behave differently depending
> >> > on swap device. This currently didn't happen because the only call
> >> > path of swapin_direct is the direct anon page fault path, where mm
> >> > equals to current->mm, but will no longer be true if swapin_direct
> >> > is shared and have other callers (eg, swapoff).
> >> >
> >> > So make swapin_direct also passes NULL for mm, no feature change.
> >> >
> >> > Signed-off-by: Kairui Song <kasong@tencent.com>
> >> > ---
> >> >  mm/swap_state.c | 2 +-
> >> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >> >
> >> > diff --git a/mm/swap_state.c b/mm/swap_state.c
> >> > index 6130de8d5226..d39c5369da21 100644
> >> > --- a/mm/swap_state.c
> >> > +++ b/mm/swap_state.c
> >> > @@ -881,7 +881,7 @@ struct folio *swapin_direct(swp_entry_t entry, gfp_t gfp_mask,
> >> >       folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0,
> >> >                               vma, vmf->address, false);
> >> >       if (folio) {
> >> > -             if (mem_cgroup_swapin_charge_folio(folio, vma->vm_mm,
> >> > +             if (mem_cgroup_swapin_charge_folio(folio, NULL,
> >> >                                                  GFP_KERNEL, entry)) {
> >> >                       folio_put(folio);
> >> >                       return NULL;
> >>
> >> I think that why not provide "mm" when it's available?  For
> >> swapin_direct() called by do_swap_page(), mm can be provided.  While,
> >> for swapin_direct() called by shmem swapin, mm will be NULL.  We can
> >> even provide "mm" for __read_swap_cache_async() for VMA based swapin and
> >> for the fault address for cluster swapin.
> >
> > Hi, Ying.
> >
> > Thanks for the comment.
> >
> > Without modifying too much code, providing mm here will change swapin
> > charge behaviour on swapoff, we discussed it previously:
> > https://lkml.org/lkml/2023/11/19/320
>
> It's better to use "lore" for kernel email link, for example,
>
> https://lore.kernel.org/lkml/20231119194740.94101-24-ryncsn@gmail.com/
>
> This is more readable than the above link.

Hi Ying,

Thanks for your advice.

>
> > If we want to avoid the behavior change, we have to extend all direct
> > and indirect callers of mem_cgroup_swapin_charge_folio to accept a
> > standalone mm argument (including swapin_direct, swap_vma_readahead,
> > swap_cluster_readahead, __read_swap_cache_async, swapin_entry_mpol,
> > read_swap_cache_async, and some other path may need more audit), and
> > sort things our case by case. I'm not sure if that's a good idea...
> >
> > Simply dropping it here seems the easiest way to avoid such change.
>
> OK.  Didn't realize that they are related directly.  It appears that we
> always pass NULL as mm to mem_cgroup_swapin_charge_folio().  If so,
> please just remove the parameter.

Yes, that's also what I wanted.

>
> --
> Best Regards,
> Huang, Ying
diff mbox series

Patch

diff --git a/mm/swap_state.c b/mm/swap_state.c
index 6130de8d5226..d39c5369da21 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -881,7 +881,7 @@  struct folio *swapin_direct(swp_entry_t entry, gfp_t gfp_mask,
 	folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0,
 				vma, vmf->address, false);
 	if (folio) {
-		if (mem_cgroup_swapin_charge_folio(folio, vma->vm_mm,
+		if (mem_cgroup_swapin_charge_folio(folio, NULL,
 						   GFP_KERNEL, entry)) {
 			folio_put(folio);
 			return NULL;