diff mbox series

mm/memory-failure: release private data before split THP

Message ID 20220804025121.4001361-1-fengwei.yin@intel.com (mailing list archive)
State New
Headers show
Series mm/memory-failure: release private data before split THP | expand

Commit Message

Yin Fengwei Aug. 4, 2022, 2:51 a.m. UTC
If there is private data attached to THP, the refcount of
THP will be increased and block the THP split. Which could
further cause the meomry failure not recovered.

Release private data attached to THP before split it to
increase the chance of splitting THP successfully.

The issue was hit during HW error injection testing with
5.18 kernel + xfs as rootfs, test got killed and system
reboot was required to re-run the test.

The issue was tracked down to THP split failure caused the
memory failure not being handled. The page dump showed:

[ 1785.433075] page:0000000025f9530b refcount:18 mapcount:0 mapping:000000008162eea7 index:0xa10 pfn:0x2f0200
[ 1785.443954] head:0000000025f9530b order:4 compound_mapcount:0 compound_pincount:0
[ 1785.452408] memcg:ff4247f2d28e9000
[ 1785.456304] aops:xfs_address_space_operations ino:8555182 dentry name:"baseos-filenames.solvx"
[ 1785.466612] flags: 0x1000000000012036(referenced|uptodate|lru|active|private|head|node=0|zone=2)
[ 1785.476514] raw: 1000000000012036 ffb9460f8bc07c08 ffb9460f8bc08408 ff4247f22e6299f8
[ 1785.485268] raw: 0000000000000a10 ff4247f194ade900 00000012ffffffff ff4247f2d28e9000

It was like the error was injected to a large folio for xfs with
private data attached.

With private data released before split THP, the test case
could be run successfully many times without reboot system.

Co-developed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: Aaron Lu <aaron.lu@intel.com>
---
Changelog from RFC:
 - Use new folio API per Mathhew Wilcox's suggestion
 - Add one line comment before re-get folio of page per
   Miaohe's comment
 - Remove RFC tag
 - Add Co-developed-by of Qiuxu who did a lot of debugging
   work to locate where the real issue is

 mm/memory-failure.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)


base-commit: f86d1fbbe7858884d6754534a0afbb74fc30bc26

Comments

Miaohe Lin Aug. 4, 2022, 3:19 a.m. UTC | #1
On 2022/8/4 10:51, Yin Fengwei wrote:
> If there is private data attached to THP, the refcount of
> THP will be increased and block the THP split. Which could
> further cause the meomry failure not recovered.
> 
> Release private data attached to THP before split it to
> increase the chance of splitting THP successfully.
> 
> The issue was hit during HW error injection testing with
> 5.18 kernel + xfs as rootfs, test got killed and system
> reboot was required to re-run the test.
> 
> The issue was tracked down to THP split failure caused the
> memory failure not being handled. The page dump showed:
> 
> [ 1785.433075] page:0000000025f9530b refcount:18 mapcount:0 mapping:000000008162eea7 index:0xa10 pfn:0x2f0200
> [ 1785.443954] head:0000000025f9530b order:4 compound_mapcount:0 compound_pincount:0
> [ 1785.452408] memcg:ff4247f2d28e9000
> [ 1785.456304] aops:xfs_address_space_operations ino:8555182 dentry name:"baseos-filenames.solvx"
> [ 1785.466612] flags: 0x1000000000012036(referenced|uptodate|lru|active|private|head|node=0|zone=2)
> [ 1785.476514] raw: 1000000000012036 ffb9460f8bc07c08 ffb9460f8bc08408 ff4247f22e6299f8
> [ 1785.485268] raw: 0000000000000a10 ff4247f194ade900 00000012ffffffff ff4247f2d28e9000
> 
> It was like the error was injected to a large folio for xfs with
> private data attached.
> 
> With private data released before split THP, the test case
> could be run successfully many times without reboot system.
> 
> Co-developed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
> Suggested-by: Matthew Wilcox <willy@infradead.org>
> Reviewed-by: Aaron Lu <aaron.lu@intel.com>
> ---

Looks good to me. Thanks.

Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Yin Fengwei Aug. 4, 2022, 3:21 a.m. UTC | #2
Hi Miaohe,

On 8/4/2022 11:19 AM, Miaohe Lin wrote:
> On 2022/8/4 10:51, Yin Fengwei wrote:
>> If there is private data attached to THP, the refcount of
>> THP will be increased and block the THP split. Which could
>> further cause the meomry failure not recovered.
>>
>> Release private data attached to THP before split it to
>> increase the chance of splitting THP successfully.
>>
>> The issue was hit during HW error injection testing with
>> 5.18 kernel + xfs as rootfs, test got killed and system
>> reboot was required to re-run the test.
>>
>> The issue was tracked down to THP split failure caused the
>> memory failure not being handled. The page dump showed:
>>
>> [ 1785.433075] page:0000000025f9530b refcount:18 mapcount:0 mapping:000000008162eea7 index:0xa10 pfn:0x2f0200
>> [ 1785.443954] head:0000000025f9530b order:4 compound_mapcount:0 compound_pincount:0
>> [ 1785.452408] memcg:ff4247f2d28e9000
>> [ 1785.456304] aops:xfs_address_space_operations ino:8555182 dentry name:"baseos-filenames.solvx"
>> [ 1785.466612] flags: 0x1000000000012036(referenced|uptodate|lru|active|private|head|node=0|zone=2)
>> [ 1785.476514] raw: 1000000000012036 ffb9460f8bc07c08 ffb9460f8bc08408 ff4247f22e6299f8
>> [ 1785.485268] raw: 0000000000000a10 ff4247f194ade900 00000012ffffffff ff4247f2d28e9000
>>
>> It was like the error was injected to a large folio for xfs with
>> private data attached.
>>
>> With private data released before split THP, the test case
>> could be run successfully many times without reboot system.
>>
>> Co-developed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
>> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
>> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
>> Suggested-by: Matthew Wilcox <willy@infradead.org>
>> Reviewed-by: Aaron Lu <aaron.lu@intel.com>
>> ---
> 
> Looks good to me. Thanks.
> 
> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Thanks a lot for reviewing the patch.

Regards
Yin, Fengwei
>
Yang Shi Aug. 4, 2022, 5:39 p.m. UTC | #3
On Wed, Aug 3, 2022 at 7:52 PM Yin Fengwei <fengwei.yin@intel.com> wrote:
>
> If there is private data attached to THP, the refcount of
> THP will be increased and block the THP split. Which could
> further cause the meomry failure not recovered.
>
> Release private data attached to THP before split it to
> increase the chance of splitting THP successfully.
>
> The issue was hit during HW error injection testing with
> 5.18 kernel + xfs as rootfs, test got killed and system
> reboot was required to re-run the test.
>
> The issue was tracked down to THP split failure caused the
> memory failure not being handled. The page dump showed:
>
> [ 1785.433075] page:0000000025f9530b refcount:18 mapcount:0 mapping:000000008162eea7 index:0xa10 pfn:0x2f0200
> [ 1785.443954] head:0000000025f9530b order:4 compound_mapcount:0 compound_pincount:0
> [ 1785.452408] memcg:ff4247f2d28e9000
> [ 1785.456304] aops:xfs_address_space_operations ino:8555182 dentry name:"baseos-filenames.solvx"
> [ 1785.466612] flags: 0x1000000000012036(referenced|uptodate|lru|active|private|head|node=0|zone=2)
> [ 1785.476514] raw: 1000000000012036 ffb9460f8bc07c08 ffb9460f8bc08408 ff4247f22e6299f8
> [ 1785.485268] raw: 0000000000000a10 ff4247f194ade900 00000012ffffffff ff4247f2d28e9000
>
> It was like the error was injected to a large folio for xfs with
> private data attached.
>
> With private data released before split THP, the test case
> could be run successfully many times without reboot system.

Yes, now we have more file large pages/THP than before. The patch
itself looks good to me. But I'm wondering whether it is better to
release buffer in split_huge_page() itself since other callsites may
experience the same issue. Before only anonymous and shmem THP were
supported so we don't have to worry about the extra pin from buffers,
but it may be time to consider it now.

>
> Co-developed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
> Suggested-by: Matthew Wilcox <willy@infradead.org>
> Reviewed-by: Aaron Lu <aaron.lu@intel.com>
> ---
> Changelog from RFC:
>  - Use new folio API per Mathhew Wilcox's suggestion
>  - Add one line comment before re-get folio of page per
>    Miaohe's comment
>  - Remove RFC tag
>  - Add Co-developed-by of Qiuxu who did a lot of debugging
>    work to locate where the real issue is
>
>  mm/memory-failure.c | 16 ++++++++++++----
>  1 file changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index b864c2eff641..ef87741b0fea 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1484,16 +1484,24 @@ static int identify_page_state(unsigned long pfn, struct page *p,
>
>  static int try_to_split_thp_page(struct page *page, const char *msg)
>  {
> -       lock_page(page);
> +       struct folio *folio = page_folio(page);
> +
> +       folio_lock(folio);
> +       if (folio_test_private(folio))
> +               filemap_release_folio(folio, GFP_KERNEL);
> +
>         if (unlikely(split_huge_page(page))) {
>                 unsigned long pfn = page_to_pfn(page);
>
> -               unlock_page(page);
> +               folio_unlock(folio);
>                 pr_info("%s: %#lx: thp split failed\n", msg, pfn);
> -               put_page(page);
> +               folio_put(folio);
>                 return -EBUSY;
>         }
> -       unlock_page(page);
> +
> +       /* If split_huge_page success, folio could be different */
> +       folio = page_folio(page);
> +       folio_unlock(folio);
>
>         return 0;
>  }
>
> base-commit: f86d1fbbe7858884d6754534a0afbb74fc30bc26
> --
> 2.25.1
>
>
Yin Fengwei Aug. 5, 2022, 12:18 a.m. UTC | #4
On 2022/8/5 01:39, Yang Shi wrote:
> On Wed, Aug 3, 2022 at 7:52 PM Yin Fengwei <fengwei.yin@intel.com> wrote:
>>
>> If there is private data attached to THP, the refcount of
>> THP will be increased and block the THP split. Which could
>> further cause the meomry failure not recovered.
>>
>> Release private data attached to THP before split it to
>> increase the chance of splitting THP successfully.
>>
>> The issue was hit during HW error injection testing with
>> 5.18 kernel + xfs as rootfs, test got killed and system
>> reboot was required to re-run the test.
>>
>> The issue was tracked down to THP split failure caused the
>> memory failure not being handled. The page dump showed:
>>
>> [ 1785.433075] page:0000000025f9530b refcount:18 mapcount:0 mapping:000000008162eea7 index:0xa10 pfn:0x2f0200
>> [ 1785.443954] head:0000000025f9530b order:4 compound_mapcount:0 compound_pincount:0
>> [ 1785.452408] memcg:ff4247f2d28e9000
>> [ 1785.456304] aops:xfs_address_space_operations ino:8555182 dentry name:"baseos-filenames.solvx"
>> [ 1785.466612] flags: 0x1000000000012036(referenced|uptodate|lru|active|private|head|node=0|zone=2)
>> [ 1785.476514] raw: 1000000000012036 ffb9460f8bc07c08 ffb9460f8bc08408 ff4247f22e6299f8
>> [ 1785.485268] raw: 0000000000000a10 ff4247f194ade900 00000012ffffffff ff4247f2d28e9000
>>
>> It was like the error was injected to a large folio for xfs with
>> private data attached.
>>
>> With private data released before split THP, the test case
>> could be run successfully many times without reboot system.
> 
> Yes, now we have more file large pages/THP than before. The patch
> itself looks good to me. But I'm wondering whether it is better to
> release buffer in split_huge_page() itself since other callsites may
> experience the same issue. Before only anonymous and shmem THP were
> supported so we don't have to worry about the extra pin from buffers,
> but it may be time to consider it now.
Agree. I will send new patch with the private data release moved to
split_huge_page_to_list() if no further comment.


Regards
Yin, Fengwei

> 
>>
>> Co-developed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
>> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
>> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
>> Suggested-by: Matthew Wilcox <willy@infradead.org>
>> Reviewed-by: Aaron Lu <aaron.lu@intel.com>
>> ---
>> Changelog from RFC:
>>  - Use new folio API per Mathhew Wilcox's suggestion
>>  - Add one line comment before re-get folio of page per
>>    Miaohe's comment
>>  - Remove RFC tag
>>  - Add Co-developed-by of Qiuxu who did a lot of debugging
>>    work to locate where the real issue is
>>
>>  mm/memory-failure.c | 16 ++++++++++++----
>>  1 file changed, 12 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index b864c2eff641..ef87741b0fea 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -1484,16 +1484,24 @@ static int identify_page_state(unsigned long pfn, struct page *p,
>>
>>  static int try_to_split_thp_page(struct page *page, const char *msg)
>>  {
>> -       lock_page(page);
>> +       struct folio *folio = page_folio(page);
>> +
>> +       folio_lock(folio);
>> +       if (folio_test_private(folio))
>> +               filemap_release_folio(folio, GFP_KERNEL);
>> +
>>         if (unlikely(split_huge_page(page))) {
>>                 unsigned long pfn = page_to_pfn(page);
>>
>> -               unlock_page(page);
>> +               folio_unlock(folio);
>>                 pr_info("%s: %#lx: thp split failed\n", msg, pfn);
>> -               put_page(page);
>> +               folio_put(folio);
>>                 return -EBUSY;
>>         }
>> -       unlock_page(page);
>> +
>> +       /* If split_huge_page success, folio could be different */
>> +       folio = page_folio(page);
>> +       folio_unlock(folio);
>>
>>         return 0;
>>  }
>>
>> base-commit: f86d1fbbe7858884d6754534a0afbb74fc30bc26
>> --
>> 2.25.1
>>
>>
Miaohe Lin Aug. 5, 2022, 1:33 a.m. UTC | #5
On 2022/8/5 1:39, Yang Shi wrote:
> On Wed, Aug 3, 2022 at 7:52 PM Yin Fengwei <fengwei.yin@intel.com> wrote:
>>
>> If there is private data attached to THP, the refcount of
>> THP will be increased and block the THP split. Which could
>> further cause the meomry failure not recovered.
>>
>> Release private data attached to THP before split it to
>> increase the chance of splitting THP successfully.
>>
>> The issue was hit during HW error injection testing with
>> 5.18 kernel + xfs as rootfs, test got killed and system
>> reboot was required to re-run the test.
>>
>> The issue was tracked down to THP split failure caused the
>> memory failure not being handled. The page dump showed:
>>
>> [ 1785.433075] page:0000000025f9530b refcount:18 mapcount:0 mapping:000000008162eea7 index:0xa10 pfn:0x2f0200
>> [ 1785.443954] head:0000000025f9530b order:4 compound_mapcount:0 compound_pincount:0
>> [ 1785.452408] memcg:ff4247f2d28e9000
>> [ 1785.456304] aops:xfs_address_space_operations ino:8555182 dentry name:"baseos-filenames.solvx"
>> [ 1785.466612] flags: 0x1000000000012036(referenced|uptodate|lru|active|private|head|node=0|zone=2)
>> [ 1785.476514] raw: 1000000000012036 ffb9460f8bc07c08 ffb9460f8bc08408 ff4247f22e6299f8
>> [ 1785.485268] raw: 0000000000000a10 ff4247f194ade900 00000012ffffffff ff4247f2d28e9000
>>
>> It was like the error was injected to a large folio for xfs with
>> private data attached.
>>
>> With private data released before split THP, the test case
>> could be run successfully many times without reboot system.
> 
> Yes, now we have more file large pages/THP than before. The patch
> itself looks good to me. But I'm wondering whether it is better to
> release buffer in split_huge_page() itself since other callsites may
> experience the same issue. Before only anonymous and shmem THP were
> supported so we don't have to worry about the extra pin from buffers,
> but it may be time to consider it now.

I tend to agree with this idea. Thank Yang.

> 
>>
>> Co-developed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
>> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
>> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
>> Suggested-by: Matthew Wilcox <willy@infradead.org>
>> Reviewed-by: Aaron Lu <aaron.lu@intel.com>
>> ---
>> Changelog from RFC:
>>  - Use new folio API per Mathhew Wilcox's suggestion
>>  - Add one line comment before re-get folio of page per
>>    Miaohe's comment
>>  - Remove RFC tag
>>  - Add Co-developed-by of Qiuxu who did a lot of debugging
>>    work to locate where the real issue is
>>
>>  mm/memory-failure.c | 16 ++++++++++++----
>>  1 file changed, 12 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index b864c2eff641..ef87741b0fea 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -1484,16 +1484,24 @@ static int identify_page_state(unsigned long pfn, struct page *p,
>>
>>  static int try_to_split_thp_page(struct page *page, const char *msg)
>>  {
>> -       lock_page(page);
>> +       struct folio *folio = page_folio(page);
>> +
>> +       folio_lock(folio);
>> +       if (folio_test_private(folio))
>> +               filemap_release_folio(folio, GFP_KERNEL);
>> +
>>         if (unlikely(split_huge_page(page))) {
>>                 unsigned long pfn = page_to_pfn(page);
>>
>> -               unlock_page(page);
>> +               folio_unlock(folio);
>>                 pr_info("%s: %#lx: thp split failed\n", msg, pfn);
>> -               put_page(page);
>> +               folio_put(folio);
>>                 return -EBUSY;
>>         }
>> -       unlock_page(page);
>> +
>> +       /* If split_huge_page success, folio could be different */
>> +       folio = page_folio(page);
>> +       folio_unlock(folio);
>>
>>         return 0;
>>  }
>>
>> base-commit: f86d1fbbe7858884d6754534a0afbb74fc30bc26
>> --
>> 2.25.1
>>
>>
> .
>
diff mbox series

Patch

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index b864c2eff641..ef87741b0fea 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1484,16 +1484,24 @@  static int identify_page_state(unsigned long pfn, struct page *p,
 
 static int try_to_split_thp_page(struct page *page, const char *msg)
 {
-	lock_page(page);
+	struct folio *folio = page_folio(page);
+
+	folio_lock(folio);
+	if (folio_test_private(folio))
+		filemap_release_folio(folio, GFP_KERNEL);
+
 	if (unlikely(split_huge_page(page))) {
 		unsigned long pfn = page_to_pfn(page);
 
-		unlock_page(page);
+		folio_unlock(folio);
 		pr_info("%s: %#lx: thp split failed\n", msg, pfn);
-		put_page(page);
+		folio_put(folio);
 		return -EBUSY;
 	}
-	unlock_page(page);
+
+	/* If split_huge_page success, folio could be different */
+	folio = page_folio(page);
+	folio_unlock(folio);
 
 	return 0;
 }