diff mbox series

[2/2] mm,swap: skip swap readahead if page was obtained instantaneously

Message ID 20200922020148.3261797-3-riel@surriel.com (mailing list archive)
State New, archived
Headers show
Series mm,swap: skip swap readahead for instant IO (like zswap) | expand

Commit Message

Rik van Riel Sept. 22, 2020, 2:01 a.m. UTC
Check whether a swap page was obtained instantaneously, for example
because it is in zswap, or on a very fast IO device which uses busy
waiting, and we did not wait on IO to swap in this page.

If no IO was needed to get the swap page we want, kicking off readahead
on surrounding swap pages is likely to be counterproductive, because the
extra loads will cause additional latency, use up extra memory, and chances
are the surrounding pages in swap are just as fast to load as this one,
making readahead pointless.

Signed-off-by: Rik van Riel <riel@surriel.com>
---
 mm/swap_state.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

Comments

huang ying Sept. 22, 2020, 3:13 a.m. UTC | #1
On Tue, Sep 22, 2020 at 10:02 AM Rik van Riel <riel@surriel.com> wrote:
>
> Check whether a swap page was obtained instantaneously, for example
> because it is in zswap, or on a very fast IO device which uses busy
> waiting, and we did not wait on IO to swap in this page.
> If no IO was needed to get the swap page we want, kicking off readahead
> on surrounding swap pages is likely to be counterproductive, because the
> extra loads will cause additional latency, use up extra memory, and chances
> are the surrounding pages in swap are just as fast to load as this one,
> making readahead pointless.
>
> Signed-off-by: Rik van Riel <riel@surriel.com>
> ---
>  mm/swap_state.c | 14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index aacb9ba53f63..6919f9d5fe88 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -637,6 +637,7 @@ static struct page *swap_cluster_read_one(swp_entry_t entry,
>  struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
>                                 struct vm_fault *vmf)

Why not do this for swap_vma_readahead() too?  swap_cluster_read_one()
can be used in swap_vma_readahead() too.

>  {
> +       struct page *page;
>         unsigned long entry_offset = swp_offset(entry);
>         unsigned long offset = entry_offset;
>         unsigned long start_offset, end_offset;
> @@ -668,11 +669,18 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
>                 end_offset = si->max - 1;
>
>         blk_start_plug(&plug);
> +       /* If we read the page without waiting on IO, skip readahead. */
> +       page = swap_cluster_read_one(entry, offset, gfp_mask, vma, addr, false);
> +       if (page && PageUptodate(page))
> +               goto skip_unplug;
> +
> +       /* Ok, do the async read-ahead now. */
>         for (offset = start_offset; offset <= end_offset ; offset++) {
> -               /* Ok, do the async read-ahead now */
> -               swap_cluster_read_one(entry, offset, gfp_mask, vma, addr,
> -                                     offset != entry_offset);
> +               if (offset == entry_offset)
> +                       continue;
> +               swap_cluster_read_one(entry, offset, gfp_mask, vma, addr, true);
>         }
> +skip_unplug:
>         blk_finish_plug(&plug);
>
>         lru_add_drain();        /* Push any new pages onto the LRU now */

Best Regards,
Huang, Ying
Rik van Riel Sept. 22, 2020, 11:33 a.m. UTC | #2
On Tue, 2020-09-22 at 11:13 +0800, huang ying wrote:
> On Tue, Sep 22, 2020 at 10:02 AM Rik van Riel <riel@surriel.com>
> wrote:
> > Check whether a swap page was obtained instantaneously, for example
> > because it is in zswap, or on a very fast IO device which uses busy
> > waiting, and we did not wait on IO to swap in this page.
> > If no IO was needed to get the swap page we want, kicking off
> > readahead
> > on surrounding swap pages is likely to be counterproductive,
> > because the
> > extra loads will cause additional latency, use up extra memory, and
> > chances
> > are the surrounding pages in swap are just as fast to load as this
> > one,
> > making readahead pointless.
> > 
> > Signed-off-by: Rik van Riel <riel@surriel.com>
> > ---
> >  mm/swap_state.c | 14 +++++++++++---
> >  1 file changed, 11 insertions(+), 3 deletions(-)
> > 
> > diff --git a/mm/swap_state.c b/mm/swap_state.c
> > index aacb9ba53f63..6919f9d5fe88 100644
> > --- a/mm/swap_state.c
> > +++ b/mm/swap_state.c
> > @@ -637,6 +637,7 @@ static struct page
> > *swap_cluster_read_one(swp_entry_t entry,
> >  struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t
> > gfp_mask,
> >                                 struct vm_fault *vmf)
> 
> Why not do this for swap_vma_readahead()
> too?  swap_cluster_read_one()
> can be used in swap_vma_readahead() too.

Good point, I should do the same thing for swap_vma_readahead()
as well. Let me do that and send in a version 2 of the series.
Christoph Hellwig Sept. 23, 2020, 6:35 a.m. UTC | #3
On Mon, Sep 21, 2020 at 10:01:48PM -0400, Rik van Riel wrote:
> +	struct page *page;
>  	unsigned long entry_offset = swp_offset(entry);
>  	unsigned long offset = entry_offset;
>  	unsigned long start_offset, end_offset;
> @@ -668,11 +669,18 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
>  		end_offset = si->max - 1;
>  
>  	blk_start_plug(&plug);
> +	/* If we read the page without waiting on IO, skip readahead. */
> +	page = swap_cluster_read_one(entry, offset, gfp_mask, vma, addr, false);
> +	if (page && PageUptodate(page))
> +		goto skip_unplug;
> +

At least for the normal block device path the plug will prevent the
I/O submission from actually happening and thus PageUptodate from
becoming true.  I think we need to split the different code paths
more cleanly.

Btw, what device type and media did you test this with?  What kind of
numbers did you get on what workload?
diff mbox series

Patch

diff --git a/mm/swap_state.c b/mm/swap_state.c
index aacb9ba53f63..6919f9d5fe88 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -637,6 +637,7 @@  static struct page *swap_cluster_read_one(swp_entry_t entry,
 struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
 				struct vm_fault *vmf)
 {
+	struct page *page;
 	unsigned long entry_offset = swp_offset(entry);
 	unsigned long offset = entry_offset;
 	unsigned long start_offset, end_offset;
@@ -668,11 +669,18 @@  struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
 		end_offset = si->max - 1;
 
 	blk_start_plug(&plug);
+	/* If we read the page without waiting on IO, skip readahead. */
+	page = swap_cluster_read_one(entry, offset, gfp_mask, vma, addr, false);
+	if (page && PageUptodate(page))
+		goto skip_unplug;
+
+	/* Ok, do the async read-ahead now. */
 	for (offset = start_offset; offset <= end_offset ; offset++) {
-		/* Ok, do the async read-ahead now */
-		swap_cluster_read_one(entry, offset, gfp_mask, vma, addr,
-				      offset != entry_offset);
+		if (offset == entry_offset)
+			continue;
+		swap_cluster_read_one(entry, offset, gfp_mask, vma, addr, true);
 	}
+skip_unplug:
 	blk_finish_plug(&plug);
 
 	lru_add_drain();	/* Push any new pages onto the LRU now */