diff mbox series

[5/6] mm, hwpoison: kill procs if unmap fails

Message ID 20220818130016.45313-6-linmiaohe@huawei.com (mailing list archive)
State New
Headers show
Series A few fixup patches for hugetlb | expand

Commit Message

Miaohe Lin Aug. 18, 2022, 1 p.m. UTC
If try_to_unmap() fails, the hwpoisoned page still resides in the address
space of some processes. We should kill these processes or the hwpoisoned
page might be consumed later. collect_procs() is always called to collect
relevant processes now so they can be killed later if unmap fails.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/memory-failure.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

Comments

HORIGUCHI NAOYA(堀口 直也) Aug. 19, 2022, 5:24 a.m. UTC | #1
On Thu, Aug 18, 2022 at 09:00:15PM +0800, Miaohe Lin wrote:
> If try_to_unmap() fails, the hwpoisoned page still resides in the address
> space of some processes. We should kill these processes or the hwpoisoned
> page might be consumed later. collect_procs() is always called to collect
> relevant processes now so they can be killed later if unmap fails.
> 
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  mm/memory-failure.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index a2f4e8b00a26..5f9615a86296 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1396,7 +1396,7 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
>  	struct address_space *mapping;
>  	LIST_HEAD(tokill);
>  	bool unmap_success;
> -	int kill = 1, forcekill;
> +	int forcekill;
>  	bool mlocked = PageMlocked(hpage);
>  
>  	/*
> @@ -1437,7 +1437,6 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
>  		if (page_mkclean(hpage)) {
>  			SetPageDirty(hpage);
>  		} else {
> -			kill = 0;
>  			ttu |= TTU_IGNORE_HWPOISON;
>  			pr_info("%#lx: corrupted page was clean: dropped without side effects\n",
>  				pfn);
> @@ -1452,8 +1451,7 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
>  	 * Error handling: We ignore errors here because
>  	 * there's nothing that can be done.

This above comment might be deprecated now (I'm not sure what this really mean),
so could you drop or update this?

Anyway, the patch looks good to me.

Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

>  	 */
> -	if (kill)
> -		collect_procs(hpage, &tokill, flags & MF_ACTION_REQUIRED);
> +	collect_procs(hpage, &tokill, flags & MF_ACTION_REQUIRED);
>  
>  	if (PageHuge(hpage) && !PageAnon(hpage)) {
>  		/*
> @@ -1495,7 +1493,8 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
>  	 * use a more force-full uncatchable kill to prevent
>  	 * any accesses to the poisoned memory.
>  	 */
> -	forcekill = PageDirty(hpage) || (flags & MF_MUST_KILL);
> +	forcekill = PageDirty(hpage) || (flags & MF_MUST_KILL) ||
> +		    !unmap_success;
>  	kill_procs(&tokill, forcekill, !unmap_success, pfn, flags);
>  
>  	return unmap_success;
> -- 
> 2.23.0
Miaohe Lin Aug. 19, 2022, 7:37 a.m. UTC | #2
On 2022/8/19 13:24, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Thu, Aug 18, 2022 at 09:00:15PM +0800, Miaohe Lin wrote:
>> If try_to_unmap() fails, the hwpoisoned page still resides in the address
>> space of some processes. We should kill these processes or the hwpoisoned
>> page might be consumed later. collect_procs() is always called to collect
>> relevant processes now so they can be killed later if unmap fails.
>>
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> ---
>>  mm/memory-failure.c | 9 ++++-----
>>  1 file changed, 4 insertions(+), 5 deletions(-)
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index a2f4e8b00a26..5f9615a86296 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -1396,7 +1396,7 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
>>  	struct address_space *mapping;
>>  	LIST_HEAD(tokill);
>>  	bool unmap_success;
>> -	int kill = 1, forcekill;
>> +	int forcekill;
>>  	bool mlocked = PageMlocked(hpage);
>>  
>>  	/*
>> @@ -1437,7 +1437,6 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
>>  		if (page_mkclean(hpage)) {
>>  			SetPageDirty(hpage);
>>  		} else {
>> -			kill = 0;
>>  			ttu |= TTU_IGNORE_HWPOISON;
>>  			pr_info("%#lx: corrupted page was clean: dropped without side effects\n",
>>  				pfn);
>> @@ -1452,8 +1451,7 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
>>  	 * Error handling: We ignore errors here because
>>  	 * there's nothing that can be done.
> 
> This above comment might be deprecated now (I'm not sure what this really mean),
> so could you drop or update this?

Do you mean remove the below comment? In fact, this doesn't make much sense for me.

* Error handling: We ignore errors here because
* there's nothing that can be done.

> 
> Anyway, the patch looks good to me.
> 
> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

Many thanks for review.

Thanks,
Miaohe Lin
HORIGUCHI NAOYA(堀口 直也) Aug. 19, 2022, 8:18 a.m. UTC | #3
On Fri, Aug 19, 2022 at 03:37:23PM +0800, Miaohe Lin wrote:
> On 2022/8/19 13:24, HORIGUCHI NAOYA(堀口 直也) wrote:
> > On Thu, Aug 18, 2022 at 09:00:15PM +0800, Miaohe Lin wrote:
> >> If try_to_unmap() fails, the hwpoisoned page still resides in the address
> >> space of some processes. We should kill these processes or the hwpoisoned
> >> page might be consumed later. collect_procs() is always called to collect
> >> relevant processes now so they can be killed later if unmap fails.
> >>
> >> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> >> ---
> >>  mm/memory-failure.c | 9 ++++-----
> >>  1 file changed, 4 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> >> index a2f4e8b00a26..5f9615a86296 100644
> >> --- a/mm/memory-failure.c
> >> +++ b/mm/memory-failure.c
> >> @@ -1396,7 +1396,7 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
> >>  	struct address_space *mapping;
> >>  	LIST_HEAD(tokill);
> >>  	bool unmap_success;
> >> -	int kill = 1, forcekill;
> >> +	int forcekill;
> >>  	bool mlocked = PageMlocked(hpage);
> >>  
> >>  	/*
> >> @@ -1437,7 +1437,6 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
> >>  		if (page_mkclean(hpage)) {
> >>  			SetPageDirty(hpage);
> >>  		} else {
> >> -			kill = 0;
> >>  			ttu |= TTU_IGNORE_HWPOISON;
> >>  			pr_info("%#lx: corrupted page was clean: dropped without side effects\n",
> >>  				pfn);
> >> @@ -1452,8 +1451,7 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
> >>  	 * Error handling: We ignore errors here because
> >>  	 * there's nothing that can be done.
> > 
> > This above comment might be deprecated now (I'm not sure what this really mean),
> > so could you drop or update this?
> 
> Do you mean remove the below comment? In fact, this doesn't make much sense for me.
> 
> * Error handling: We ignore errors here because
> * there's nothing that can be done.

Yes, that's what I meant.

Thanks,
Naoya Horiguchi
diff mbox series

Patch

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index a2f4e8b00a26..5f9615a86296 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1396,7 +1396,7 @@  static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
 	struct address_space *mapping;
 	LIST_HEAD(tokill);
 	bool unmap_success;
-	int kill = 1, forcekill;
+	int forcekill;
 	bool mlocked = PageMlocked(hpage);
 
 	/*
@@ -1437,7 +1437,6 @@  static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
 		if (page_mkclean(hpage)) {
 			SetPageDirty(hpage);
 		} else {
-			kill = 0;
 			ttu |= TTU_IGNORE_HWPOISON;
 			pr_info("%#lx: corrupted page was clean: dropped without side effects\n",
 				pfn);
@@ -1452,8 +1451,7 @@  static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
 	 * Error handling: We ignore errors here because
 	 * there's nothing that can be done.
 	 */
-	if (kill)
-		collect_procs(hpage, &tokill, flags & MF_ACTION_REQUIRED);
+	collect_procs(hpage, &tokill, flags & MF_ACTION_REQUIRED);
 
 	if (PageHuge(hpage) && !PageAnon(hpage)) {
 		/*
@@ -1495,7 +1493,8 @@  static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
 	 * use a more force-full uncatchable kill to prevent
 	 * any accesses to the poisoned memory.
 	 */
-	forcekill = PageDirty(hpage) || (flags & MF_MUST_KILL);
+	forcekill = PageDirty(hpage) || (flags & MF_MUST_KILL) ||
+		    !unmap_success;
 	kill_procs(&tokill, forcekill, !unmap_success, pfn, flags);
 
 	return unmap_success;