diff mbox series

[v2,6/9] mm/swap: handle swapcache lookup in swapin_entry

Message ID 20240102175338.62012-7-ryncsn@gmail.com (mailing list archive)
State New
Headers show
Series swapin refactor for optimization and unified readahead | expand

Commit Message

Kairui Song Jan. 2, 2024, 5:53 p.m. UTC
From: Kairui Song <kasong@tencent.com>

Since all callers of swapin_entry need to check the swap cache first, we
can merge this common routine into swapin_entry, so it can be shared and
optimized later.

Also introduce a enum to better represent possible swap cache usage, and
add some comments about it, make the usage of swap cache easier to
understand.

Signed-off-by: Kairui Song <kasong@tencent.com>
---
 mm/memory.c     | 45 ++++++++++++++++++++-------------------------
 mm/swap.h       | 20 ++++++++++++++++++--
 mm/swap_state.c | 22 ++++++++++++++--------
 mm/swapfile.c   | 21 +++++++++------------
 4 files changed, 61 insertions(+), 47 deletions(-)

Comments

Huang, Ying Jan. 8, 2024, 8:26 a.m. UTC | #1
Kairui Song <ryncsn@gmail.com> writes:

> From: Kairui Song <kasong@tencent.com>
>
> Since all callers of swapin_entry need to check the swap cache first, we
> can merge this common routine into swapin_entry, so it can be shared and
> optimized later.
>
> Also introduce a enum to better represent possible swap cache usage, and
> add some comments about it, make the usage of swap cache easier to
> understand.

I don't find any benefit to do this.  The code line number isn't
reduced.  The concept of swap cache isn't hided either.

--
Best Regards,
Huang, Ying
Kairui Song Jan. 10, 2024, 2:53 a.m. UTC | #2
Huang, Ying <ying.huang@intel.com> 于2024年1月8日周一 16:28写道:
>
> Kairui Song <ryncsn@gmail.com> writes:
>
> > From: Kairui Song <kasong@tencent.com>
> >
> > Since all callers of swapin_entry need to check the swap cache first, we
> > can merge this common routine into swapin_entry, so it can be shared and
> > optimized later.
> >
> > Also introduce a enum to better represent possible swap cache usage, and
> > add some comments about it, make the usage of swap cache easier to
> > understand.
>
> I don't find any benefit to do this.  The code line number isn't
> reduced.  The concept of swap cache isn't hided either.

Hi Ying

Thanks for the comments.

Maybe I should squash this with the following commit? The following
commit want to do cache lookup in swapin_entry to avoid a redundant
shadow lookup, it can help to improve the performance by about 4% for
swapin path. So it need to return a enum to represent cache status.

Further more, note the comments added here:

+/*
+ * Caller of swapin_entry may need to know the cache lookup result:
+ *
+ * SWAP_CACHE_HIT: cache hit, cached folio is retured.
+ * SWAP_CACHE_MISS: cache miss, folio is allocated, read from swap device
+ *                  and adde to swap cache, but still may return a cached
+ *                  folio if raced (check __read_swap_cache_async).
+ * SWAP_CACHE_BYPASS: cache miss, folio is new allocated and read
+ *                    from swap device bypassing the cache.
+ */

SWAP_CACHE_MISS might be inaccurate, this is not an issue introduced
by this commit, but better exposed. May worth a fix later. So far I
can see two benefits fixing it:
- More accurate maj/min page fault count.
- Note the PageHWPoison check in do_swap_page, it ignored the race
case, if a page getting poisoned is raced with swapcache then it may
not work as expected.

These are all minor issue indeed, some other optimization might also
be doable, but at least a comment might be helpful.

> --
> Best Regards,
> Huang, Ying
>
Huang, Ying Jan. 15, 2024, 1:45 a.m. UTC | #3
Kairui Song <ryncsn@gmail.com> writes:

> Huang, Ying <ying.huang@intel.com> 于2024年1月8日周一 16:28写道:
>>
>> Kairui Song <ryncsn@gmail.com> writes:
>>
>> > From: Kairui Song <kasong@tencent.com>
>> >
>> > Since all callers of swapin_entry need to check the swap cache first, we
>> > can merge this common routine into swapin_entry, so it can be shared and
>> > optimized later.
>> >
>> > Also introduce a enum to better represent possible swap cache usage, and
>> > add some comments about it, make the usage of swap cache easier to
>> > understand.
>>
>> I don't find any benefit to do this.  The code line number isn't
>> reduced.  The concept of swap cache isn't hided either.
>
> Hi Ying
>
> Thanks for the comments.
>
> Maybe I should squash this with the following commit? The following
> commit want to do cache lookup in swapin_entry to avoid a redundant
> shadow lookup, it can help to improve the performance by about 4% for
> swapin path.

It's good to improve performance.  But I don't think we must put
swap_cache_get_folio() in swapin_entry() to do that.  We can just get
"shadow" from swap_cache_get_folio() and pass it to swapin_entry().

> So it need to return a enum to represent cache status.

I don't think we are talking about the same thing here.

> Further more, note the comments added here:
>
> +/*
> + * Caller of swapin_entry may need to know the cache lookup result:
> + *
> + * SWAP_CACHE_HIT: cache hit, cached folio is retured.
> + * SWAP_CACHE_MISS: cache miss, folio is allocated, read from swap device
> + *                  and adde to swap cache, but still may return a cached
> + *                  folio if raced (check __read_swap_cache_async).
> + * SWAP_CACHE_BYPASS: cache miss, folio is new allocated and read
> + *                    from swap device bypassing the cache.
> + */
>
> SWAP_CACHE_MISS might be inaccurate, this is not an issue introduced
> by this commit, but better exposed. May worth a fix later. So far I
> can see two benefits fixing it:
> - More accurate maj/min page fault count.
> - Note the PageHWPoison check in do_swap_page, it ignored the race
> case, if a page getting poisoned is raced with swapcache then it may
> not work as expected.
>
> These are all minor issue indeed, some other optimization might also
> be doable, but at least a comment might be helpful.
>

--
Best Regards,
Huang, Ying
Kairui Song Jan. 15, 2024, 5:11 p.m. UTC | #4
Huang, Ying <ying.huang@intel.com> 于2024年1月15日周一 09:47写道:
>
> Kairui Song <ryncsn@gmail.com> writes:
>
> > Huang, Ying <ying.huang@intel.com> 于2024年1月8日周一 16:28写道:
> >>
> >> Kairui Song <ryncsn@gmail.com> writes:
> >>
> >> > From: Kairui Song <kasong@tencent.com>
> >> >
> >> > Since all callers of swapin_entry need to check the swap cache first, we
> >> > can merge this common routine into swapin_entry, so it can be shared and
> >> > optimized later.
> >> >
> >> > Also introduce a enum to better represent possible swap cache usage, and
> >> > add some comments about it, make the usage of swap cache easier to
> >> > understand.
> >>
> >> I don't find any benefit to do this.  The code line number isn't
> >> reduced.  The concept of swap cache isn't hided either.
> >
> > Hi Ying
> >
> > Thanks for the comments.
> >
> > Maybe I should squash this with the following commit? The following
> > commit want to do cache lookup in swapin_entry to avoid a redundant
> > shadow lookup, it can help to improve the performance by about 4% for
> > swapin path.
>
> It's good to improve performance.  But I don't think we must put
> swap_cache_get_folio() in swapin_entry() to do that.  We can just get
> "shadow" from swap_cache_get_folio() and pass it to swapin_entry().

Good idea. My only concern is, if the argument list is getting too
long (7 args if we added all mpol, ilx, shadow here), will that cause
issue for readability or performance?
diff mbox series

Patch

diff --git a/mm/memory.c b/mm/memory.c
index b56254a875f8..ab6e76c95632 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3795,13 +3795,13 @@  static vm_fault_t handle_pte_marker(struct vm_fault *vmf)
 vm_fault_t do_swap_page(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
-	struct folio *swapcache, *folio = NULL;
+	struct folio *swapcache = NULL, *folio;
+	enum swap_cache_result cache_result;
 	struct page *page;
 	struct swap_info_struct *si = NULL;
 	rmap_t rmap_flags = RMAP_NONE;
 	bool exclusive = false;
 	swp_entry_t entry;
-	bool swapcached;
 	pte_t pte;
 	vm_fault_t ret = 0;
 
@@ -3859,31 +3859,26 @@  vm_fault_t do_swap_page(struct vm_fault *vmf)
 	if (unlikely(!si))
 		goto out;
 
-	folio = swap_cache_get_folio(entry, vma, vmf->address);
-	if (folio)
+	folio = swapin_entry(entry, GFP_HIGHUSER_MOVABLE,
+			     vmf, &cache_result);
+	if (folio) {
 		page = folio_file_page(folio, swp_offset(entry));
-	swapcache = folio;
-
-	if (!folio) {
-		folio = swapin_entry(entry, GFP_HIGHUSER_MOVABLE,
-				     vmf, &swapcached);
-		if (folio) {
-			page = folio_file_page(folio, swp_offset(entry));
-			if (swapcached)
-				swapcache = folio;
-		} else {
-			/*
-			 * Back out if somebody else faulted in this pte
-			 * while we released the pte lock.
-			 */
-			vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
-					vmf->address, &vmf->ptl);
-			if (likely(vmf->pte &&
-				   pte_same(ptep_get(vmf->pte), vmf->orig_pte)))
-				ret = VM_FAULT_OOM;
-			goto unlock;
-		}
+		if (cache_result != SWAP_CACHE_BYPASS)
+			swapcache = folio;
+	} else {
+		/*
+		 * Back out if somebody else faulted in this pte
+		 * while we released the pte lock.
+		 */
+		vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
+				vmf->address, &vmf->ptl);
+		if (likely(vmf->pte &&
+			   pte_same(ptep_get(vmf->pte), vmf->orig_pte)))
+			ret = VM_FAULT_OOM;
+		goto unlock;
+	}
 
+	if (cache_result != SWAP_CACHE_HIT) {
 		/* Had to read the page from swap area: Major fault */
 		ret = VM_FAULT_MAJOR;
 		count_vm_event(PGMAJFAULT);
diff --git a/mm/swap.h b/mm/swap.h
index 502a2801f817..1f4cdb324bf0 100644
--- a/mm/swap.h
+++ b/mm/swap.h
@@ -4,6 +4,22 @@ 
 
 struct mempolicy;
 
+/*
+ * Caller of swapin_entry may need to know the cache lookup result:
+ *
+ * SWAP_CACHE_HIT: cache hit, cached folio is retured.
+ * SWAP_CACHE_MISS: cache miss, folio is allocated, read from swap device
+ *                  and adde to swap cache, but still may return a cached
+ *                  folio if raced (check __read_swap_cache_async).
+ * SWAP_CACHE_BYPASS: cache miss, folio is new allocated and read
+ *                    from swap device bypassing the cache.
+ */
+enum swap_cache_result {
+	SWAP_CACHE_HIT,
+	SWAP_CACHE_MISS,
+	SWAP_CACHE_BYPASS,
+};
+
 #ifdef CONFIG_SWAP
 #include <linux/blk_types.h> /* for bio_end_io_t */
 
@@ -55,7 +71,7 @@  struct folio *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_flags,
 struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t flag,
 		struct mempolicy *mpol, pgoff_t ilx);
 struct folio *swapin_entry(swp_entry_t entry, gfp_t flag,
-			    struct vm_fault *vmf, bool *swapcached);
+			    struct vm_fault *vmf, enum swap_cache_result *result);
 
 static inline unsigned int folio_swap_flags(struct folio *folio)
 {
@@ -87,7 +103,7 @@  static inline struct folio *swap_cluster_readahead(swp_entry_t entry,
 }
 
 static inline struct folio *swapin_entry(swp_entry_t swp, gfp_t gfp_mask,
-			struct vm_fault *vmf, bool *swapcached)
+			struct vm_fault *vmf, enum swap_cache_result *result)
 {
 	return NULL;
 }
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 66ff187aa5d3..f6f1e6f5d782 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -917,8 +917,7 @@  static struct folio *swapin_direct(swp_entry_t entry, gfp_t gfp_mask,
  * @entry: swap entry of this memory
  * @gfp_mask: memory allocation flags
  * @vmf: fault information
- * @swapcached: pointer to a bool used as indicator if the
- *              page is swapped in through swapcache.
+ * @result: a return value to indicate swap cache usage.
  *
  * Returns the struct page for entry and addr, after queueing swapin.
  *
@@ -928,16 +927,22 @@  static struct folio *swapin_direct(swp_entry_t entry, gfp_t gfp_mask,
  * or skip the readahead (ie, ramdisk based swap device).
  */
 struct folio *swapin_entry(swp_entry_t entry, gfp_t gfp_mask,
-			   struct vm_fault *vmf, bool *swapcached)
+			   struct vm_fault *vmf, enum swap_cache_result *result)
 {
+	enum swap_cache_result cache_result;
 	struct mempolicy *mpol;
 	struct folio *folio;
 	pgoff_t ilx;
-	bool cached;
+
+	folio = swap_cache_get_folio(entry, vmf->vma, vmf->address);
+	if (folio) {
+		cache_result = SWAP_CACHE_HIT;
+		goto done;
+	}
 
 	if (swap_use_no_readahead(swp_swap_info(entry), entry)) {
 		folio = swapin_direct(entry, gfp_mask, vmf);
-		cached = false;
+		cache_result = SWAP_CACHE_BYPASS;
 	} else {
 		mpol = get_vma_policy(vmf->vma, vmf->address, 0, &ilx);
 		if (swap_use_vma_readahead())
@@ -945,11 +950,12 @@  struct folio *swapin_entry(swp_entry_t entry, gfp_t gfp_mask,
 		else
 			folio = swap_cluster_readahead(entry, gfp_mask, mpol, ilx);
 		mpol_cond_put(mpol);
-		cached = true;
+		cache_result = SWAP_CACHE_MISS;
 	}
 
-	if (swapcached)
-		*swapcached = cached;
+done:
+	if (result)
+		*result = cache_result;
 
 	return folio;
 }
diff --git a/mm/swapfile.c b/mm/swapfile.c
index ce4e6c10dce7..5aa44de11edc 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1845,6 +1845,13 @@  static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 		int ret;
 		pte_t ptent;
 
+		struct vm_fault vmf = {
+			.vma = vma,
+			.address = addr,
+			.real_address = addr,
+			.pmd = pmd,
+		};
+
 		if (!pte++) {
 			pte = pte_offset_map(pmd, addr);
 			if (!pte)
@@ -1864,18 +1871,8 @@  static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 		pte_unmap(pte);
 		pte = NULL;
 
-		folio = swap_cache_get_folio(entry, vma, addr);
-		if (!folio) {
-			struct vm_fault vmf = {
-				.vma = vma,
-				.address = addr,
-				.real_address = addr,
-				.pmd = pmd,
-			};
-
-			folio = swapin_entry(entry, GFP_HIGHUSER_MOVABLE,
-					    &vmf, NULL);
-		}
+		folio = swapin_entry(entry, GFP_HIGHUSER_MOVABLE,
+				     &vmf, NULL);
 		if (!folio) {
 			/*
 			 * The entry could have been freed, and will not