mm: fix a race on nr_swap_pages

Message ID	1606995362-16413-1-git-send-email-zhaoyang.huang@unisoc.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=pR5i=FH=kvack.org=owner-linux-mm@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5D489206A1 From: Zhaoyang Huang <huangzhaoyang@gmail.com> To: Zhaoyang Huang <zhaoyang.huang@unisoc.com>, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH] mm: fix a race on nr_swap_pages Date: Thu, 3 Dec 2020 19:36:02 +0800 Message-Id: <1606995362-16413-1-git-send-email-zhaoyang.huang@unisoc.com> Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	mm: fix a race on nr_swap_pages \| expand mm: fix a race on nr_swap_pages

Message ID

1606995362-16413-1-git-send-email-zhaoyang.huang@unisoc.com (mailing list archive)

State

New, archived

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5D489206A1
From: Zhaoyang Huang <huangzhaoyang@gmail.com>
To: Zhaoyang Huang <zhaoyang.huang@unisoc.com>,
	linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH] mm: fix a race on nr_swap_pages
Date: Thu,  3 Dec 2020 19:36:02 +0800
Message-Id: <1606995362-16413-1-git-send-email-zhaoyang.huang@unisoc.com>
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

mm: fix a race on nr_swap_pages | expand

Commit Message

Zhaoyang Huang Dec. 3, 2020, 11:36 a.m. UTC

The scenario on which "Free swap -4kB" happens in my system, which is caused by
 get_swap_page_of_type or get_swap_pages racing with show_mem. Remove the race
 here.

Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
---
 mm/swapfile.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

Comments

Zhaoyang Huang Dec. 3, 2020, 12:19 p.m. UTC | #1

It is show_swap_cache_info() which races with get_swap_xxx

On Thu, Dec 3, 2020 at 7:36 PM Zhaoyang Huang <huangzhaoyang@gmail.com> wrote:
>
> The scenario on which "Free swap -4kB" happens in my system, which is caused by
>  get_swap_page_of_type or get_swap_pages racing with show_mem. Remove the race
>  here.
>
> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> ---
>  mm/swapfile.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index cf63b5f..13201b6 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -974,6 +974,8 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_size)
>         /* Only single cluster request supported */
>         WARN_ON_ONCE(n_goal > 1 && size == SWAPFILE_CLUSTER);
>
> +       spin_lock(&swap_avail_lock);
> +
>         avail_pgs = atomic_long_read(&nr_swap_pages) / size;
>         if (avail_pgs <= 0)
>                 goto noswap;
> @@ -986,8 +988,6 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_size)
>
>         atomic_long_sub(n_goal * size, &nr_swap_pages);
>
> -       spin_lock(&swap_avail_lock);
> -
>  start_over:
>         node = numa_node_id();
>         plist_for_each_entry_safe(si, next, &swap_avail_heads[node], avail_lists[node]) {
> @@ -1061,14 +1061,13 @@ swp_entry_t get_swap_page_of_type(int type)
>
>         spin_lock(&si->lock);
>         if (si->flags & SWP_WRITEOK) {
> -               atomic_long_dec(&nr_swap_pages);
>                 /* This is called for allocating swap entry, not cache */
>                 offset = scan_swap_map(si, 1);
>                 if (offset) {
> +                       atomic_long_dec(&nr_swap_pages);
>                         spin_unlock(&si->lock);
>                         return swp_entry(type, offset);
>                 }
> -               atomic_long_inc(&nr_swap_pages);
>         }
>         spin_unlock(&si->lock);
>  fail:
> --
> 1.9.1
>

Vlastimil Babka Dec. 3, 2020, 6:03 p.m. UTC | #2

On 12/3/20 12:36 PM, Zhaoyang Huang wrote:
> The scenario on which "Free swap -4kB" happens in my system, which is caused by
>  get_swap_page_of_type or get_swap_pages racing with show_mem. Remove the race
>  here.
> 
> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> ---
>  mm/swapfile.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index cf63b5f..13201b6 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -974,6 +974,8 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_size)
>  	/* Only single cluster request supported */
>  	WARN_ON_ONCE(n_goal > 1 && size == SWAPFILE_CLUSTER);
>  
> +	spin_lock(&swap_avail_lock);
> +
>  	avail_pgs = atomic_long_read(&nr_swap_pages) / size;
>  	if (avail_pgs <= 0)
>  		goto noswap;

This goto will leave with the spin lock locked, so that's a bug.

> @@ -986,8 +988,6 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_size)
>  
>  	atomic_long_sub(n_goal * size, &nr_swap_pages);
>  
> -	spin_lock(&swap_avail_lock);
> -

Is the problem that while we adjust n_goal with a min3(..., avail_pgs), somebody
else can decrease nr_swap_pages in the meanwhile and then we underflow? If yes,
the spin lock won't eliminate all such cases it seems, as e.g.
get_swap_page_of_type isn't done under the same lock, AFAIK.

>  start_over:
>  	node = numa_node_id();
>  	plist_for_each_entry_safe(si, next, &swap_avail_heads[node], avail_lists[node]) {
> @@ -1061,14 +1061,13 @@ swp_entry_t get_swap_page_of_type(int type)
>  
>  	spin_lock(&si->lock);
>  	if (si->flags & SWP_WRITEOK) {
> -		atomic_long_dec(&nr_swap_pages);
>  		/* This is called for allocating swap entry, not cache */
>  		offset = scan_swap_map(si, 1);
>  		if (offset) {
> +			atomic_long_dec(&nr_swap_pages);
>  			spin_unlock(&si->lock);
>  			return swp_entry(type, offset);
>  		}
> -		atomic_long_inc(&nr_swap_pages);

This hunk looks safer, unless I miss something. Did you check if it's enough to
prevent the negative values on your systems?

>  	}
>  	spin_unlock(&si->lock);
>  fail:
>

diff --git a/mm/swapfile.c b/mm/swapfile.c
index cf63b5f..13201b6 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -974,6 +974,8 @@  int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_size)
 	/* Only single cluster request supported */
 	WARN_ON_ONCE(n_goal > 1 && size == SWAPFILE_CLUSTER);
 
+	spin_lock(&swap_avail_lock);
+
 	avail_pgs = atomic_long_read(&nr_swap_pages) / size;
 	if (avail_pgs <= 0)
 		goto noswap;
@@ -986,8 +988,6 @@  int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_size)
 
 	atomic_long_sub(n_goal * size, &nr_swap_pages);
 
-	spin_lock(&swap_avail_lock);
-
 start_over:
 	node = numa_node_id();
 	plist_for_each_entry_safe(si, next, &swap_avail_heads[node], avail_lists[node]) {
@@ -1061,14 +1061,13 @@  swp_entry_t get_swap_page_of_type(int type)
 
 	spin_lock(&si->lock);
 	if (si->flags & SWP_WRITEOK) {
-		atomic_long_dec(&nr_swap_pages);
 		/* This is called for allocating swap entry, not cache */
 		offset = scan_swap_map(si, 1);
 		if (offset) {
+			atomic_long_dec(&nr_swap_pages);
 			spin_unlock(&si->lock);
 			return swp_entry(type, offset);
 		}
-		atomic_long_inc(&nr_swap_pages);
 	}
 	spin_unlock(&si->lock);
 fail:

mm: fix a race on nr_swap_pages

Commit Message

Comments

Patch