diff mbox series

[v2,4/8] mm/memory-failure.c: fix race with changing page more robustly

Message ID 20220216091431.39406-5-linmiaohe@huawei.com (mailing list archive)
State New
Headers show
Series A few cleanup and fixup patches for memory failure | expand

Commit Message

Miaohe Lin Feb. 16, 2022, 9:14 a.m. UTC
We're only intended to deal with the non-Compound page after we split thp
in memory_failure. However, the page could have changed compound pages due
to race window. If this happens, we could try again to hopefully handle the
page next round. Also remove unneeded orig_head. It's always equal to the
hpage. So we can use hpage directly and remove this redundant one.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/memory-failure.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

Comments

HORIGUCHI NAOYA(堀口 直也) Feb. 18, 2022, 1:13 a.m. UTC | #1
On Wed, Feb 16, 2022 at 05:14:27PM +0800, Miaohe Lin wrote:
> We're only intended to deal with the non-Compound page after we split thp
> in memory_failure. However, the page could have changed compound pages due
> to race window. If this happens, we could try again to hopefully handle the
> page next round. Also remove unneeded orig_head. It's always equal to the
> hpage. So we can use hpage directly and remove this redundant one.
> 
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  mm/memory-failure.c | 20 ++++++++++++--------
>  1 file changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 7e205d91b2d7..d66f642888be 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1690,7 +1690,6 @@ int memory_failure(unsigned long pfn, int flags)
>  {
>  	struct page *p;
>  	struct page *hpage;
> -	struct page *orig_head;
>  	struct dev_pagemap *pgmap;
>  	int res = 0;
>  	unsigned long page_flags;
> @@ -1736,7 +1735,7 @@ int memory_failure(unsigned long pfn, int flags)
>  		goto unlock_mutex;
>  	}
>  
> -	orig_head = hpage = compound_head(p);
> +	hpage = compound_head(p);
>  	num_poisoned_pages_inc();
>  
>  	/*
> @@ -1817,13 +1816,18 @@ int memory_failure(unsigned long pfn, int flags)
>  	lock_page(p);
>  
>  	/*
> -	 * The page could have changed compound pages during the locking.
> -	 * If this happens just bail out.
> +	 * We're only intended to deal with the non-Compound page here.
> +	 * However, the page could have changed compound pages due to
> +	 * race window. If this happens, we could try again to hopefully
> +	 * handle the page next round.
>  	 */
> -	if (PageCompound(p) && compound_head(p) != orig_head) {
> -		action_result(pfn, MF_MSG_DIFFERENT_COMPOUND, MF_IGNORED);
> -		res = -EBUSY;
> -		goto unlock_page;
> +	if (PageCompound(p)) {
> +		if (TestClearPageHWPoison(p))
> +			num_poisoned_pages_dec();
> +		unlock_page(p);
> +		put_page(p);
> +		flags &= ~MF_COUNT_INCREASED;

Could you limit the retry chance only once by using the local variable
"retry"?  It might be very rare to hit the race more than once in a single
error event, but just to be safe from potential infinite loop (that could be
opened by future changes).

Thanks,
Naoya Horiguchi

> +		goto try_again;
>  	}
>  
>  	/*
> -- 
> 2.23.0
Miaohe Lin Feb. 18, 2022, 1:53 a.m. UTC | #2
On 2022/2/18 9:13, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Wed, Feb 16, 2022 at 05:14:27PM +0800, Miaohe Lin wrote:
>> We're only intended to deal with the non-Compound page after we split thp
>> in memory_failure. However, the page could have changed compound pages due
>> to race window. If this happens, we could try again to hopefully handle the
>> page next round. Also remove unneeded orig_head. It's always equal to the
>> hpage. So we can use hpage directly and remove this redundant one.
>>
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> ---
>>  mm/memory-failure.c | 20 ++++++++++++--------
>>  1 file changed, 12 insertions(+), 8 deletions(-)
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index 7e205d91b2d7..d66f642888be 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -1690,7 +1690,6 @@ int memory_failure(unsigned long pfn, int flags)
>>  {
>>  	struct page *p;
>>  	struct page *hpage;
>> -	struct page *orig_head;
>>  	struct dev_pagemap *pgmap;
>>  	int res = 0;
>>  	unsigned long page_flags;
>> @@ -1736,7 +1735,7 @@ int memory_failure(unsigned long pfn, int flags)
>>  		goto unlock_mutex;
>>  	}
>>  
>> -	orig_head = hpage = compound_head(p);
>> +	hpage = compound_head(p);
>>  	num_poisoned_pages_inc();
>>  
>>  	/*
>> @@ -1817,13 +1816,18 @@ int memory_failure(unsigned long pfn, int flags)
>>  	lock_page(p);
>>  
>>  	/*
>> -	 * The page could have changed compound pages during the locking.
>> -	 * If this happens just bail out.
>> +	 * We're only intended to deal with the non-Compound page here.
>> +	 * However, the page could have changed compound pages due to
>> +	 * race window. If this happens, we could try again to hopefully
>> +	 * handle the page next round.
>>  	 */
>> -	if (PageCompound(p) && compound_head(p) != orig_head) {
>> -		action_result(pfn, MF_MSG_DIFFERENT_COMPOUND, MF_IGNORED);
>> -		res = -EBUSY;
>> -		goto unlock_page;
>> +	if (PageCompound(p)) {
>> +		if (TestClearPageHWPoison(p))
>> +			num_poisoned_pages_dec();
>> +		unlock_page(p);
>> +		put_page(p);
>> +		flags &= ~MF_COUNT_INCREASED;
> 
> Could you limit the retry chance only once by using the local variable
> "retry"?  It might be very rare to hit the race more than once in a single
> error event, but just to be safe from potential infinite loop (that could be
> opened by future changes).
> 

Sure. Will do it in V3. Thanks.

> Thanks,
> Naoya Horiguchi
> 
>> +		goto try_again;
>>  	}
>>  
>>  	/*
>> -- 
>> 2.23.0
diff mbox series

Patch

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 7e205d91b2d7..d66f642888be 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1690,7 +1690,6 @@  int memory_failure(unsigned long pfn, int flags)
 {
 	struct page *p;
 	struct page *hpage;
-	struct page *orig_head;
 	struct dev_pagemap *pgmap;
 	int res = 0;
 	unsigned long page_flags;
@@ -1736,7 +1735,7 @@  int memory_failure(unsigned long pfn, int flags)
 		goto unlock_mutex;
 	}
 
-	orig_head = hpage = compound_head(p);
+	hpage = compound_head(p);
 	num_poisoned_pages_inc();
 
 	/*
@@ -1817,13 +1816,18 @@  int memory_failure(unsigned long pfn, int flags)
 	lock_page(p);
 
 	/*
-	 * The page could have changed compound pages during the locking.
-	 * If this happens just bail out.
+	 * We're only intended to deal with the non-Compound page here.
+	 * However, the page could have changed compound pages due to
+	 * race window. If this happens, we could try again to hopefully
+	 * handle the page next round.
 	 */
-	if (PageCompound(p) && compound_head(p) != orig_head) {
-		action_result(pfn, MF_MSG_DIFFERENT_COMPOUND, MF_IGNORED);
-		res = -EBUSY;
-		goto unlock_page;
+	if (PageCompound(p)) {
+		if (TestClearPageHWPoison(p))
+			num_poisoned_pages_dec();
+		unlock_page(p);
+		put_page(p);
+		flags &= ~MF_COUNT_INCREASED;
+		goto try_again;
 	}
 
 	/*