diff mbox series

[v3,4/5] fs: hugetlbfs: support poison recover from hugetlbfs_migrate_folio()

Message ID 20240528134513.2283548-5-wangkefeng.wang@huawei.com (mailing list archive)
State New
Headers show
Series mm: migrate: support poison recover from migrate folio | expand

Commit Message

Kefeng Wang May 28, 2024, 1:45 p.m. UTC
This is similar to __migrate_folio(), use folio_mc_copy() in HugeTLB
folio migration to avoid panic when copy from poisoned folio.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 fs/hugetlbfs/inode.c |  2 +-
 mm/migrate.c         | 13 ++++++++-----
 2 files changed, 9 insertions(+), 6 deletions(-)

Comments

Tony Luck May 28, 2024, 3:41 p.m. UTC | #1
+	if (unlikely(folio_mc_copy(dst, src))) {
+		folio_ref_unfreeze(src, expected_count);
+		return -EFAULT;

It doesn't look like any code takes action to avoid re-using the poisoned page.

So you survived, hurrah! But left the problem page for some other code to trip over.

-Tony
Jane Chu May 28, 2024, 8:19 p.m. UTC | #2
On 5/28/2024 8:41 AM, Luck, Tony wrote:

> +	if (unlikely(folio_mc_copy(dst, src))) {
> +		folio_ref_unfreeze(src, expected_count);
> +		return -EFAULT;
>
> It doesn't look like any code takes action to avoid re-using the poisoned page.
>
> So you survived, hurrah! But left the problem page for some other code to trip over.

Tony,  did you mean that  memory_failure_queue() should be called?  If 
not, could you elaborate more?

thanks,

-jane

>
> -Tony
Tony Luck May 28, 2024, 8:30 p.m. UTC | #3
>> +	if (unlikely(folio_mc_copy(dst, src))) {
>> +		folio_ref_unfreeze(src, expected_count);
>> +		return -EFAULT;
>>
>> It doesn't look like any code takes action to avoid re-using the poisoned page.
>>
>> So you survived, hurrah! But left the problem page for some other code to trip over.
>
> Tony,  did you mean that  memory_failure_queue() should be called?  If 
> not, could you elaborate more?

Maybe memory_failure_queue() can help here. Though it would need to know
which pfn inside the folio hit the poison. So some more infrastructure around the
copy to make sure the pfn is saved.

-Tony
Jane Chu May 28, 2024, 10:02 p.m. UTC | #4
On 5/28/2024 1:30 PM, Luck, Tony wrote:

>>> +	if (unlikely(folio_mc_copy(dst, src))) {
>>> +		folio_ref_unfreeze(src, expected_count);
>>> +		return -EFAULT;
>>>
>>> It doesn't look like any code takes action to avoid re-using the poisoned page.
>>>
>>> So you survived, hurrah! But left the problem page for some other code to trip over.
>> Tony,  did you mean that  memory_failure_queue() should be called?  If
>> not, could you elaborate more?
> Maybe memory_failure_queue() can help here. Though it would need to know
> which pfn inside the folio hit the poison. So some more infrastructure around the
> copy to make sure the pfn is saved.

It looks like memory_failure_queue() is only invoked when source user 
page has a poison, not if the poison

is in source kernel page. Any idea why?

thanks!

-jane

> -Tony
>
Tony Luck May 28, 2024, 10:10 p.m. UTC | #5
> It looks like memory_failure_queue() is only invoked when source user 
> page has a poison, not if the poison
>
> is in source kernel page. Any idea why?

If it is a user page, Linux can unmap it from that process. Send
signal(SIGBUS) to the process if it was actively consuming the
poison (as opposed to poison found by patrol scrubber or by
speculative access).

Linux doesn't (yet) have a way to workaround poison in a source
kernel page.  I think the only place it will ever get that is for pages
in the page cache. Action to somehow unmap them from the file?

-Tony
Kefeng Wang May 29, 2024, 2:48 a.m. UTC | #6
On 2024/5/28 23:41, Luck, Tony wrote:
> +	if (unlikely(folio_mc_copy(dst, src))) {
> +		folio_ref_unfreeze(src, expected_count);
> +		return -EFAULT;
> 
> It doesn't look like any code takes action to avoid re-using the poisoned page.
> 
> So you survived, hurrah! But left the problem page for some other code to trip over.

Hi Tony, thanks for your review,

We tried to avoid calling memory_failure_queue() after 
copy_mc_{user_}highpage(), and I think the memory_failure() should be
called by  ARCH's code(eg, mce in x86)[1] to handle the poisoned page,
but for current mainline, the x86 mce don't do that, so yes, we need a
memory_failure_queue() for x86, but it is not true for upcoming
arm64, the poisoned page is handled by apei_claim_sea(),and a new
memory_failure_queue() is unnecessary(no issue since the
TestSetPageHWPoison() check in memory_failure()).

It seems that the khugepaged[3][4] should do the same thing, we could
call memory_failure_queue() in copy_mc_{user_}highpage(), and remove
it from each caller, is that OK?

diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index 00341b56d291..6b0d6f3c8580 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -352,6 +352,9 @@ static inline int copy_mc_user_highpage(struct page 
*to, struct page *from,
         kunmap_local(vto);
         kunmap_local(vfrom);

+       if (ret)
+               memory_failure_queue(page_to_pfn(from), 0);
+
         return ret;
  }

@@ -368,6 +371,9 @@ static inline int copy_mc_highpage(struct page *to, 
struct page *from)
         kunmap_local(vto);
         kunmap_local(vfrom);

+       if (ret)
+               memory_failure_queue(page_to_pfn(from), 0);
+
         return ret;
  }

Thanks.




[1] 
https://lore.kernel.org/linux-mm/20240204082627.3892816-3-tongtiangen@huawei.com/
[2] 
https://lore.kernel.org/linux-mm/20240528085915.1955987-1-tongtiangen@huawei.com/
[3] 12904d953364 mm/khugepaged: recover from poisoned file-backed memory
[4] 6efc7afb5cc9 mm/hwpoison: introduce copy_mc_highpage



> 
> -Tony
diff mbox series

Patch

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 6df794ed4066..1107e5aa8343 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -1128,7 +1128,7 @@  static int hugetlbfs_migrate_folio(struct address_space *mapping,
 		hugetlb_set_folio_subpool(src, NULL);
 	}
 
-	folio_migrate_copy(dst, src);
+	folio_migrate_flags(dst, src);
 
 	return MIGRATEPAGE_SUCCESS;
 }
diff --git a/mm/migrate.c b/mm/migrate.c
index e8acc6c38574..78cfad677789 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -532,15 +532,18 @@  int migrate_huge_page_move_mapping(struct address_space *mapping,
 				   struct folio *dst, struct folio *src)
 {
 	XA_STATE(xas, &mapping->i_pages, folio_index(src));
-	int expected_count;
+	int expected_count = folio_expected_refs(mapping, src);
 
-	xas_lock_irq(&xas);
-	expected_count = folio_expected_refs(mapping, src);
-	if (!folio_ref_freeze(src, expected_count)) {
-		xas_unlock_irq(&xas);
+	if (!folio_ref_freeze(src, expected_count))
 		return -EAGAIN;
+
+	if (unlikely(folio_mc_copy(dst, src))) {
+		folio_ref_unfreeze(src, expected_count);
+		return -EFAULT;
 	}
 
+	xas_lock_irq(&xas);
+
 	dst->index = src->index;
 	dst->mapping = src->mapping;