mm: Do not start/end writeback for pages stored in zswap

Message ID	20240610143037.812955-1-usamaarif642@gmail.com (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: Usama Arif <usamaarif642@gmail.com> To: akpm@linux-foundation.org Cc: willy@infradead.org, hannes@cmpxchg.org, yosryahmed@google.com, nphamcs@gmail.com, chengming.zhou@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, Usama Arif <usamaarif642@gmail.com> Subject: [PATCH] mm: Do not start/end writeback for pages stored in zswap Date: Mon, 10 Jun 2024 15:30:37 +0100 Message-ID: <20240610143037.812955-1-usamaarif642@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	mm: Do not start/end writeback for pages stored in zswap \| expand mm: Do not start/end writeback for pages stored in zswap

Usama Arif June 10, 2024, 2:30 p.m. UTC

start/end writeback combination incorrectly increments NR_WRITTEN
counter, eventhough the pages aren't written to disk. Pages successfully
stored in zswap should just unlock folio and return from writepage.

Signed-off-by: Usama Arif <usamaarif642@gmail.com>
---
 mm/page_io.c | 2 --
 1 file changed, 2 deletions(-)

Yosry Ahmed June 10, 2024, 5:31 p.m. UTC | #1

On Mon, Jun 10, 2024 at 7:31 AM Usama Arif <usamaarif642@gmail.com> wrote:
>
> start/end writeback combination incorrectly increments NR_WRITTEN
> counter, eventhough the pages aren't written to disk. Pages successfully
> stored in zswap should just unlock folio and return from writepage.
>
> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
> ---
>  mm/page_io.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/mm/page_io.c b/mm/page_io.c
> index a360857cf75d..501784d79977 100644
> --- a/mm/page_io.c
> +++ b/mm/page_io.c
> @@ -196,9 +196,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc)
>                 return ret;
>         }
>         if (zswap_store(folio)) {
> -               folio_start_writeback(folio);
>                 folio_unlock(folio);
> -               folio_end_writeback(folio);

Removing these calls will have several effects, I am not really sure it's safe.

1. As you note in the commit log, NR_WRITTEN stats (and apparently
others) will no longer be updated. While this may make sense, it's a
user-visible change. I am not sure if anyone relies on this.

2. folio_end_writeback() calls folio_rotate_reclaimable() after
writeback completes to put a folio that has been marked with
PG_reclaim at the tail of the LRU, to be reclaimed first next time. Do
we get this call through other paths now?

3. If I remember correctly, there was some sort of state machine where
folios go from dirty to writeback to clean. I am not sure what happens
if we take the writeback phase out of the equation.

Overall, the change seems like it will special case the folios written
to zswap vs. to disk further, and we may end up missing important
things (like folio_rotate_reclaimable()). I would like to see a much
stronger argument for why it is safe and justified tbh :)

>                 return 0;
>         }
>         if (!mem_cgroup_zswap_writeback_enabled(folio_memcg(folio))) {
> --
> 2.43.0
>

Usama Arif June 10, 2024, 6:11 p.m. UTC | #2

On 10/06/2024 18:31, Yosry Ahmed wrote:
> On Mon, Jun 10, 2024 at 7:31 AM Usama Arif <usamaarif642@gmail.com> wrote:
>> start/end writeback combination incorrectly increments NR_WRITTEN
>> counter, eventhough the pages aren't written to disk. Pages successfully
>> stored in zswap should just unlock folio and return from writepage.
>>
>> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
>> ---
>>   mm/page_io.c | 2 --
>>   1 file changed, 2 deletions(-)
>>
>> diff --git a/mm/page_io.c b/mm/page_io.c
>> index a360857cf75d..501784d79977 100644
>> --- a/mm/page_io.c
>> +++ b/mm/page_io.c
>> @@ -196,9 +196,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc)
>>                  return ret;
>>          }
>>          if (zswap_store(folio)) {
>> -               folio_start_writeback(folio);
>>                  folio_unlock(folio);
>> -               folio_end_writeback(folio);
> Removing these calls will have several effects, I am not really sure it's safe.
>
> 1. As you note in the commit log, NR_WRITTEN stats (and apparently
> others) will no longer be updated. While this may make sense, it's a
> user-visible change. I am not sure if anyone relies on this.

Thanks for the review.

This patch would correct NR_WRITTEN stat, so I think its a good thing? 
But yeah as you said for people relying on that stat, suddenly this 
number would be lowered if they pick up a kernel with this patch, not 
sure how such changes would be dealt with in the kernel.

> 2. folio_end_writeback() calls folio_rotate_reclaimable() after
> writeback completes to put a folio that has been marked with
> PG_reclaim at the tail of the LRU, to be reclaimed first next time. Do
> we get this call through other paths now?

We could add

if (folio_test_reclaim(folio)) {
         folio_clear_reclaim(folio);
         folio_rotate_reclaimable(folio);
     }

to if zswap_store is successful to fix this?

> 3. If I remember correctly, there was some sort of state machine where
> folios go from dirty to writeback to clean. I am not sure what happens
> if we take the writeback phase out of the equation.
>
> Overall, the change seems like it will special case the folios written
> to zswap vs. to disk further, and we may end up missing important
> things (like folio_rotate_reclaimable()). I would like to see a much
> stronger argument for why it is safe and justified tbh :)
>
The patch came about from zero page swap optimization series 
(https://lore.kernel.org/all/ZmcITDhdBzUGEHuY@casper.infradead.org/), 
where Matthew pointed out that NR_WRITTEN would be wrong with the way I 
was doing it.

Overall, I thought it would be good to have consistency with how 
zeropages and zswap pages would be dealt with, and have a more correct 
NR_WRITTEN stat.

In the next revision of the zero page patch, I will just add 
folio_rotate_reclaimable after folio_unlock if folio is zero filled.

>>                  return 0;
>>          }
>>          if (!mem_cgroup_zswap_writeback_enabled(folio_memcg(folio))) {
>> --
>> 2.43.0
>>

Yosry Ahmed June 10, 2024, 6:29 p.m. UTC | #3

On Mon, Jun 10, 2024 at 11:11 AM Usama Arif <usamaarif642@gmail.com> wrote:
>
>
> On 10/06/2024 18:31, Yosry Ahmed wrote:
> > On Mon, Jun 10, 2024 at 7:31 AM Usama Arif <usamaarif642@gmail.com> wrote:
> >> start/end writeback combination incorrectly increments NR_WRITTEN
> >> counter, eventhough the pages aren't written to disk. Pages successfully
> >> stored in zswap should just unlock folio and return from writepage.
> >>
> >> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
> >> ---
> >>   mm/page_io.c | 2 --
> >>   1 file changed, 2 deletions(-)
> >>
> >> diff --git a/mm/page_io.c b/mm/page_io.c
> >> index a360857cf75d..501784d79977 100644
> >> --- a/mm/page_io.c
> >> +++ b/mm/page_io.c
> >> @@ -196,9 +196,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc)
> >>                  return ret;
> >>          }
> >>          if (zswap_store(folio)) {
> >> -               folio_start_writeback(folio);
> >>                  folio_unlock(folio);
> >> -               folio_end_writeback(folio);
> > Removing these calls will have several effects, I am not really sure it's safe.
> >
> > 1. As you note in the commit log, NR_WRITTEN stats (and apparently
> > others) will no longer be updated. While this may make sense, it's a
> > user-visible change. I am not sure if anyone relies on this.
>
> Thanks for the review.
>
> This patch would correct NR_WRITTEN stat, so I think its a good thing?
> But yeah as you said for people relying on that stat, suddenly this
> number would be lowered if they pick up a kernel with this patch, not
> sure how such changes would be dealt with in the kernel.
>
> > 2. folio_end_writeback() calls folio_rotate_reclaimable() after
> > writeback completes to put a folio that has been marked with
> > PG_reclaim at the tail of the LRU, to be reclaimed first next time. Do
> > we get this call through other paths now?
>
> We could add
>
> if (folio_test_reclaim(folio)) {
>          folio_clear_reclaim(folio);
>          folio_rotate_reclaimable(folio);
>      }
>
> to if zswap_store is successful to fix this?
>
> > 3. If I remember correctly, there was some sort of state machine where
> > folios go from dirty to writeback to clean. I am not sure what happens
> > if we take the writeback phase out of the equation.
> >
> > Overall, the change seems like it will special case the folios written
> > to zswap vs. to disk further, and we may end up missing important
> > things (like folio_rotate_reclaimable()). I would like to see a much
> > stronger argument for why it is safe and justified tbh :)
> >
> The patch came about from zero page swap optimization series
> (https://lore.kernel.org/all/ZmcITDhdBzUGEHuY@casper.infradead.org/),
> where Matthew pointed out that NR_WRITTEN would be wrong with the way I
> was doing it.
>
> Overall, I thought it would be good to have consistency with how
> zeropages and zswap pages would be dealt with, and have a more correct
> NR_WRITTEN stat.
>
> In the next revision of the zero page patch, I will just add
> folio_rotate_reclaimable after folio_unlock if folio is zero filled.

I would wait until others weigh in here.

I am not sure we can just change the stat handling from under the
users, even if it is the right thing to do :/

I also think we need more analysis before we decide it's safe to
remove the writeback calls otherwise. I noticed
folio_rotate_reclaimable() on a quick look, but there may be other
problems. I am not very familiar with the dirty -> writeback -> clean
state machine.

What's the benefit of this patch beyond making the code (and stats)
make more sense semantically? It feels like a significant risk with
little reward to me.

Shakeel Butt June 10, 2024, 7:08 p.m. UTC | #4

On Mon, Jun 10, 2024 at 10:31:36AM GMT, Yosry Ahmed wrote:
> On Mon, Jun 10, 2024 at 7:31 AM Usama Arif <usamaarif642@gmail.com> wrote:
> >
> > start/end writeback combination incorrectly increments NR_WRITTEN
> > counter, eventhough the pages aren't written to disk. Pages successfully
> > stored in zswap should just unlock folio and return from writepage.
> >
> > Signed-off-by: Usama Arif <usamaarif642@gmail.com>
> > ---
> >  mm/page_io.c | 2 --
> >  1 file changed, 2 deletions(-)
> >
> > diff --git a/mm/page_io.c b/mm/page_io.c
> > index a360857cf75d..501784d79977 100644
> > --- a/mm/page_io.c
> > +++ b/mm/page_io.c
> > @@ -196,9 +196,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc)
> >                 return ret;
> >         }
> >         if (zswap_store(folio)) {
> > -               folio_start_writeback(folio);
> >                 folio_unlock(folio);
> > -               folio_end_writeback(folio);
> 
> Removing these calls will have several effects, I am not really sure it's safe.
> 
> 1. As you note in the commit log, NR_WRITTEN stats (and apparently
> others) will no longer be updated. While this may make sense, it's a
> user-visible change. I am not sure if anyone relies on this.
> 

I couldn't imagine how this stat can be useful for the zswap case and I
don't see much risk in changing this stat behavior for such cases.

> 2. folio_end_writeback() calls folio_rotate_reclaimable() after
> writeback completes to put a folio that has been marked with
> PG_reclaim at the tail of the LRU, to be reclaimed first next time. Do
> we get this call through other paths now?
> 

The folio_rotate_reclaimable() only makes sense for async writeback
pages i.e. not for zswap where we synchronously reclaim the page.

> 3. If I remember correctly, there was some sort of state machine where
> folios go from dirty to writeback to clean. I am not sure what happens
> if we take the writeback phase out of the equation.
> 

Is there really such a state machine? We only trigger writeback if the
page is dirty and we have cleared it. The only thing I can think of is
the behavior of the waiters on PG_locked bit but the window of
PG_writeback is so small that it seems like it does not matter.

Yosry Ahmed June 10, 2024, 8:05 p.m. UTC | #5

On Mon, Jun 10, 2024 at 12:08 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> On Mon, Jun 10, 2024 at 10:31:36AM GMT, Yosry Ahmed wrote:
> > On Mon, Jun 10, 2024 at 7:31 AM Usama Arif <usamaarif642@gmail.com> wrote:
> > >
> > > start/end writeback combination incorrectly increments NR_WRITTEN
> > > counter, eventhough the pages aren't written to disk. Pages successfully
> > > stored in zswap should just unlock folio and return from writepage.
> > >
> > > Signed-off-by: Usama Arif <usamaarif642@gmail.com>
> > > ---
> > >  mm/page_io.c | 2 --
> > >  1 file changed, 2 deletions(-)
> > >
> > > diff --git a/mm/page_io.c b/mm/page_io.c
> > > index a360857cf75d..501784d79977 100644
> > > --- a/mm/page_io.c
> > > +++ b/mm/page_io.c
> > > @@ -196,9 +196,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc)
> > >                 return ret;
> > >         }
> > >         if (zswap_store(folio)) {
> > > -               folio_start_writeback(folio);
> > >                 folio_unlock(folio);
> > > -               folio_end_writeback(folio);
> >
> > Removing these calls will have several effects, I am not really sure it's safe.
> >
> > 1. As you note in the commit log, NR_WRITTEN stats (and apparently
> > others) will no longer be updated. While this may make sense, it's a
> > user-visible change. I am not sure if anyone relies on this.
> >
>
> I couldn't imagine how this stat can be useful for the zswap case and I
> don't see much risk in changing this stat behavior for such cases.

It seems like NR_WRITTEN is only used in 'global_dirty_state' trace event.

NR_WRITEBACK and NR_ZONE_WRITE_PENDING are state counters, not event
counters. They are incremented in folio_start_writeback() and
decremented in folio_end_writeback(). They are probably just causing
noise.

I think for both cases it's probably fine and not really visible to userspace.

>
> > 2. folio_end_writeback() calls folio_rotate_reclaimable() after
> > writeback completes to put a folio that has been marked with
> > PG_reclaim at the tail of the LRU, to be reclaimed first next time. Do
> > we get this call through other paths now?
> >
>
> The folio_rotate_reclaimable() only makes sense for async writeback
> pages i.e. not for zswap where we synchronously reclaim the page.

Looking at pageout(), it seems like we will clear PG_reclaim if the
folio is not under writeback, and in shrink_folio_list() if the folio
is not dirty or under writeback, we will reclaim right away. I thought
zswap being synchronous was an odd case, but apparently there is wider
support for synchronous reclaim.

Thanks for pointing this out.

>
> > 3. If I remember correctly, there was some sort of state machine where
> > folios go from dirty to writeback to clean. I am not sure what happens
> > if we take the writeback phase out of the equation.
> >
>
> Is there really such a state machine? We only trigger writeback if the
> page is dirty and we have cleared it. The only thing I can think of is
> the behavior of the waiters on PG_locked bit but the window of
> PG_writeback is so small that it seems like it does not matter.

I remember Matthew talking about it during LSF/MM this year when he
was discussing page flags, but maybe I am misremembering.

Johannes Weiner June 10, 2024, 8:15 p.m. UTC | #6

On Mon, Jun 10, 2024 at 03:30:37PM +0100, Usama Arif wrote:
> start/end writeback combination incorrectly increments NR_WRITTEN
> counter, eventhough the pages aren't written to disk. Pages successfully
> stored in zswap should just unlock folio and return from writepage.
> 
> Signed-off-by: Usama Arif <usamaarif642@gmail.com>

I also don't see anything in those start and end calls that wouldn't
be pointless churn for these pages.

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

Chengming Zhou June 11, 2024, 9:53 a.m. UTC | #7

On 2024/6/10 22:30, Usama Arif wrote:
> start/end writeback combination incorrectly increments NR_WRITTEN
> counter, eventhough the pages aren't written to disk. Pages successfully
> stored in zswap should just unlock folio and return from writepage.
> 
> Signed-off-by: Usama Arif <usamaarif642@gmail.com>

Looks good to me, thanks.

Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>

> ---
>  mm/page_io.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/mm/page_io.c b/mm/page_io.c
> index a360857cf75d..501784d79977 100644
> --- a/mm/page_io.c
> +++ b/mm/page_io.c
> @@ -196,9 +196,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc)
>  		return ret;
>  	}
>  	if (zswap_store(folio)) {
> -		folio_start_writeback(folio);
>  		folio_unlock(folio);
> -		folio_end_writeback(folio);
>  		return 0;
>  	}
>  	if (!mem_cgroup_zswap_writeback_enabled(folio_memcg(folio))) {

Usama Arif June 11, 2024, 3:59 p.m. UTC | #8

On 11/06/2024 10:53, Chengming Zhou wrote:
> On 2024/6/10 22:30, Usama Arif wrote:
>> start/end writeback combination incorrectly increments NR_WRITTEN
>> counter, eventhough the pages aren't written to disk. Pages successfully
>> stored in zswap should just unlock folio and return from writepage.
>>
>> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
> Looks good to me, thanks.
>
> Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>


Fororgot to add:

Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org>

>
>> ---
>>   mm/page_io.c | 2 --
>>   1 file changed, 2 deletions(-)
>>
>> diff --git a/mm/page_io.c b/mm/page_io.c
>> index a360857cf75d..501784d79977 100644
>> --- a/mm/page_io.c
>> +++ b/mm/page_io.c
>> @@ -196,9 +196,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc)
>>   		return ret;
>>   	}
>>   	if (zswap_store(folio)) {
>> -		folio_start_writeback(folio);
>>   		folio_unlock(folio);
>> -		folio_end_writeback(folio);
>>   		return 0;
>>   	}
>>   	if (!mem_cgroup_zswap_writeback_enabled(folio_memcg(folio))) {

Shakeel Butt June 11, 2024, 5:16 p.m. UTC | #9

On Mon, Jun 10, 2024 at 03:30:37PM GMT, Usama Arif wrote:
> start/end writeback combination incorrectly increments NR_WRITTEN
> counter, eventhough the pages aren't written to disk. Pages successfully
> stored in zswap should just unlock folio and return from writepage.
> 
> Signed-off-by: Usama Arif <usamaarif642@gmail.com>

If Andrew has not picked this up, send a v2 with more detailed commit
message, particularly why it is safe apply this change. You can use the
explanation given by Yosry in response to my email. Also add text
answering the other questions raised by Yosry.

If Andrew has already picked it up, just request him to update the
commit message.

You can add:

Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>

Yosry Ahmed June 11, 2024, 5:28 p.m. UTC | #10

On Tue, Jun 11, 2024 at 10:16 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> On Mon, Jun 10, 2024 at 03:30:37PM GMT, Usama Arif wrote:
> > start/end writeback combination incorrectly increments NR_WRITTEN
> > counter, eventhough the pages aren't written to disk. Pages successfully
> > stored in zswap should just unlock folio and return from writepage.
> >
> > Signed-off-by: Usama Arif <usamaarif642@gmail.com>
>
> If Andrew has not picked this up, send a v2 with more detailed commit
> message, particularly why it is safe apply this change. You can use the
> explanation given by Yosry in response to my email. Also add text
> answering the other questions raised by Yosry.
>
> If Andrew has already picked it up, just request him to update the
> commit message.

Yes please, thanks Shakeel. I wanted to ask for this but got derailed :)

With the updated commit log feel free to add:
Acked-by: Yosry Ahmed <yosryahmed@google.com>

>
> You can add:
>
> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
>

mm: Do not start/end writeback for pages stored in zswap

Commit Message

Comments

Patch