diff mbox series

mm/memory-failure: fix VM_BUG_ON_PAGE(PagePoisoned(page)) when unpoison memory

Message ID 20240712064249.3882707-1-linmiaohe@huawei.com (mailing list archive)
State New
Headers show
Series mm/memory-failure: fix VM_BUG_ON_PAGE(PagePoisoned(page)) when unpoison memory | expand

Commit Message

Miaohe Lin July 12, 2024, 6:42 a.m. UTC
When I did memory failure tests recently, below panic occurs:

page dumped because: VM_BUG_ON_PAGE(PagePoisoned(page))
kernel BUG at include/linux/page-flags.h:616!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 3 PID: 720 Comm: bash Not tainted 6.10.0-rc1-00195-g148743902568 #40
RIP: 0010:unpoison_memory+0x2f3/0x590
RSP: 0018:ffffa57fc8787d60 EFLAGS: 00000246
RAX: 0000000000000037 RBX: 0000000000000009 RCX: ffff9be25fcdc9c8
RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff9be25fcdc9c0
RBP: 0000000000300000 R08: ffffffffb4956f88 R09: 0000000000009ffb
R10: 0000000000000284 R11: ffffffffb4926fa0 R12: ffffe6b00c000000
R13: ffff9bdb453dfd00 R14: 0000000000000000 R15: fffffffffffffffe
FS:  00007f08f04e4740(0000) GS:ffff9be25fcc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000564787a30410 CR3: 000000010d4e2000 CR4: 00000000000006f0
Call Trace:
 <TASK>
 unpoison_memory+0x2f3/0x590
 simple_attr_write_xsigned.constprop.0.isra.0+0xb3/0x110
 debugfs_attr_write+0x42/0x60
 full_proxy_write+0x5b/0x80
 vfs_write+0xd5/0x540
 ksys_write+0x64/0xe0
 do_syscall_64+0xb9/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f08f0314887
RSP: 002b:00007ffece710078 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 00007f08f0314887
RDX: 0000000000000009 RSI: 0000564787a30410 RDI: 0000000000000001
RBP: 0000564787a30410 R08: 000000000000fefe R09: 000000007fffffff
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000009
R13: 00007f08f041b780 R14: 00007f08f0417600 R15: 00007f08f0416a00
 </TASK>
Modules linked in: hwpoison_inject
---[ end trace 0000000000000000 ]---
RIP: 0010:unpoison_memory+0x2f3/0x590
RSP: 0018:ffffa57fc8787d60 EFLAGS: 00000246
RAX: 0000000000000037 RBX: 0000000000000009 RCX: ffff9be25fcdc9c8
RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff9be25fcdc9c0
RBP: 0000000000300000 R08: ffffffffb4956f88 R09: 0000000000009ffb
R10: 0000000000000284 R11: ffffffffb4926fa0 R12: ffffe6b00c000000
R13: ffff9bdb453dfd00 R14: 0000000000000000 R15: fffffffffffffffe
FS:  00007f08f04e4740(0000) GS:ffff9be25fcc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000564787a30410 CR3: 000000010d4e2000 CR4: 00000000000006f0
Kernel panic - not syncing: Fatal exception
Kernel Offset: 0x31c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
---[ end Kernel panic - not syncing: Fatal exception ]---

The root cause is that unpoison_memory() tries to check the PG_HWPoison
flags of an uninitialized page. So VM_BUG_ON_PAGE(PagePoisoned(page)) is
triggered. This can be reproduced by below steps:
1.Offline memory block:
 echo offline > /sys/devices/system/memory/memory12/state
2.Get offlined memory pfn:
 page-types -b n -rlN
3.Write pfn to unpoison-pfn
 echo <pfn> > /sys/kernel/debug/hwpoison/unpoison-pfn

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/memory-failure.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Andrew Morton July 12, 2024, 9:09 p.m. UTC | #1
On Fri, 12 Jul 2024 14:42:49 +0800 Miaohe Lin <linmiaohe@huawei.com> wrote:

> When I did memory failure tests recently, below panic occurs:
> 
> page dumped because: VM_BUG_ON_PAGE(PagePoisoned(page))
> kernel BUG at include/linux/page-flags.h:616!
> Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> CPU: 3 PID: 720 Comm: bash Not tainted 6.10.0-rc1-00195-g148743902568 #40
> RIP: 0010:unpoison_memory+0x2f3/0x590
> RSP: 0018:ffffa57fc8787d60 EFLAGS: 00000246
> RAX: 0000000000000037 RBX: 0000000000000009 RCX: ffff9be25fcdc9c8
> RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff9be25fcdc9c0
> RBP: 0000000000300000 R08: ffffffffb4956f88 R09: 0000000000009ffb
> R10: 0000000000000284 R11: ffffffffb4926fa0 R12: ffffe6b00c000000
> R13: ffff9bdb453dfd00 R14: 0000000000000000 R15: fffffffffffffffe
> FS:  00007f08f04e4740(0000) GS:ffff9be25fcc0000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000564787a30410 CR3: 000000010d4e2000 CR4: 00000000000006f0
> Call Trace:
>  <TASK>
>  unpoison_memory+0x2f3/0x590
>  simple_attr_write_xsigned.constprop.0.isra.0+0xb3/0x110
>  debugfs_attr_write+0x42/0x60
>  full_proxy_write+0x5b/0x80
>  vfs_write+0xd5/0x540
>  ksys_write+0x64/0xe0
>  do_syscall_64+0xb9/0x1d0
>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7f08f0314887
> RSP: 002b:00007ffece710078 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 00007f08f0314887
> RDX: 0000000000000009 RSI: 0000564787a30410 RDI: 0000000000000001
> RBP: 0000564787a30410 R08: 000000000000fefe R09: 000000007fffffff
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000009
> R13: 00007f08f041b780 R14: 00007f08f0417600 R15: 00007f08f0416a00
>  </TASK>
> Modules linked in: hwpoison_inject
> ---[ end trace 0000000000000000 ]---
> RIP: 0010:unpoison_memory+0x2f3/0x590
> RSP: 0018:ffffa57fc8787d60 EFLAGS: 00000246
> RAX: 0000000000000037 RBX: 0000000000000009 RCX: ffff9be25fcdc9c8
> RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff9be25fcdc9c0
> RBP: 0000000000300000 R08: ffffffffb4956f88 R09: 0000000000009ffb
> R10: 0000000000000284 R11: ffffffffb4926fa0 R12: ffffe6b00c000000
> R13: ffff9bdb453dfd00 R14: 0000000000000000 R15: fffffffffffffffe
> FS:  00007f08f04e4740(0000) GS:ffff9be25fcc0000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000564787a30410 CR3: 000000010d4e2000 CR4: 00000000000006f0
> Kernel panic - not syncing: Fatal exception
> Kernel Offset: 0x31c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> ---[ end Kernel panic - not syncing: Fatal exception ]---
> 
> The root cause is that unpoison_memory() tries to check the PG_HWPoison
> flags of an uninitialized page. So VM_BUG_ON_PAGE(PagePoisoned(page)) is
> triggered.

I'm not seeing the call path.  Is this BUG happening via

static __always_inline void __ClearPage##uname(struct page *page)	\
{									\
	VM_BUG_ON_PAGE(!Page##uname(page), page);			\
	page->page_type |= PG_##lname;					\
}

?

If so, where's the callsite?

> This can be reproduced by below steps:
> 1.Offline memory block:
>  echo offline > /sys/devices/system/memory/memory12/state
> 2.Get offlined memory pfn:
>  page-types -b n -rlN
> 3.Write pfn to unpoison-pfn
>  echo <pfn> > /sys/kernel/debug/hwpoison/unpoison-pfn
> 

I guess cc:stable.  It looks old?  Can you help to identify the Fixes:
target?

Thanks.
Miaohe Lin July 15, 2024, 6:23 a.m. UTC | #2
On 2024/7/13 5:09, Andrew Morton wrote:
> On Fri, 12 Jul 2024 14:42:49 +0800 Miaohe Lin <linmiaohe@huawei.com> wrote:
> 
>> When I did memory failure tests recently, below panic occurs:
>>
>> page dumped because: VM_BUG_ON_PAGE(PagePoisoned(page))
>> kernel BUG at include/linux/page-flags.h:616!
>> Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>> CPU: 3 PID: 720 Comm: bash Not tainted 6.10.0-rc1-00195-g148743902568 #40
>> RIP: 0010:unpoison_memory+0x2f3/0x590
>> RSP: 0018:ffffa57fc8787d60 EFLAGS: 00000246
>> RAX: 0000000000000037 RBX: 0000000000000009 RCX: ffff9be25fcdc9c8
>> RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff9be25fcdc9c0
>> RBP: 0000000000300000 R08: ffffffffb4956f88 R09: 0000000000009ffb
>> R10: 0000000000000284 R11: ffffffffb4926fa0 R12: ffffe6b00c000000
>> R13: ffff9bdb453dfd00 R14: 0000000000000000 R15: fffffffffffffffe
>> FS:  00007f08f04e4740(0000) GS:ffff9be25fcc0000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 0000564787a30410 CR3: 000000010d4e2000 CR4: 00000000000006f0
>> Call Trace:
>>  <TASK>
>>  unpoison_memory+0x2f3/0x590
>>  simple_attr_write_xsigned.constprop.0.isra.0+0xb3/0x110
>>  debugfs_attr_write+0x42/0x60
>>  full_proxy_write+0x5b/0x80
>>  vfs_write+0xd5/0x540
>>  ksys_write+0x64/0xe0
>>  do_syscall_64+0xb9/0x1d0
>>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
>> RIP: 0033:0x7f08f0314887
>> RSP: 002b:00007ffece710078 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
>> RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 00007f08f0314887
>> RDX: 0000000000000009 RSI: 0000564787a30410 RDI: 0000000000000001
>> RBP: 0000564787a30410 R08: 000000000000fefe R09: 000000007fffffff
>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000009
>> R13: 00007f08f041b780 R14: 00007f08f0417600 R15: 00007f08f0416a00
>>  </TASK>
>> Modules linked in: hwpoison_inject
>> ---[ end trace 0000000000000000 ]---
>> RIP: 0010:unpoison_memory+0x2f3/0x590
>> RSP: 0018:ffffa57fc8787d60 EFLAGS: 00000246
>> RAX: 0000000000000037 RBX: 0000000000000009 RCX: ffff9be25fcdc9c8
>> RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff9be25fcdc9c0
>> RBP: 0000000000300000 R08: ffffffffb4956f88 R09: 0000000000009ffb
>> R10: 0000000000000284 R11: ffffffffb4926fa0 R12: ffffe6b00c000000
>> R13: ffff9bdb453dfd00 R14: 0000000000000000 R15: fffffffffffffffe
>> FS:  00007f08f04e4740(0000) GS:ffff9be25fcc0000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 0000564787a30410 CR3: 000000010d4e2000 CR4: 00000000000006f0
>> Kernel panic - not syncing: Fatal exception
>> Kernel Offset: 0x31c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>> ---[ end Kernel panic - not syncing: Fatal exception ]---
>>
>> The root cause is that unpoison_memory() tries to check the PG_HWPoison
>> flags of an uninitialized page. So VM_BUG_ON_PAGE(PagePoisoned(page)) is
>> triggered.
> 
> I'm not seeing the call path.  Is this BUG happening via
> 
> static __always_inline void __ClearPage##uname(struct page *page)	\
> {									\
> 	VM_BUG_ON_PAGE(!Page##uname(page), page);			\
> 	page->page_type |= PG_##lname;					\
> }
> 
> ?
> 
> If so, where's the callsite?

It is BUG on PF_ANY():

PAGEFLAG(HWPoison, hwpoison, PF_ANY)

#define PF_ANY(page, enforce)	PF_POISONED_CHECK(page)

#define PF_POISONED_CHECK(page) ({					\
	VM_BUG_ON_PGFLAGS(PagePoisoned(page), page);		\
	page; })

#define	PAGE_POISON_PATTERN	-1l
static inline int PagePoisoned(const struct page *page)
{
	return READ_ONCE(page->flags) == PAGE_POISON_PATTERN;
}

The offlined pages will have page->flags set to PAGE_POISON_PATTERN while pfn is still valid:

offline_pages
  remove_pfn_range_from_zone
    page_init_poison
      memset(page, PAGE_POISON_PATTERN, size);

> 
>> This can be reproduced by below steps:
>> 1.Offline memory block:
>>  echo offline > /sys/devices/system/memory/memory12/state
>> 2.Get offlined memory pfn:
>>  page-types -b n -rlN
>> 3.Write pfn to unpoison-pfn
>>  echo <pfn> > /sys/kernel/debug/hwpoison/unpoison-pfn
>>
> 
> I guess cc:stable.  It looks old?  Can you help to identify the Fixes:
> target?

Since memory unpoison is only used for testing and users usually won't pass in a offlined pfn (memory
offline itself should be rare too). So I think this doesn't deserve cc statble. But If a Fixes tag is
required, I think it should be:

Fixes: f165b378bbdf ("mm: uninitialized struct page poisoning sanity checking")

Thanks.
.
David Hildenbrand July 15, 2024, 4:16 p.m. UTC | #3
On 15.07.24 08:23, Miaohe Lin wrote:
> On 2024/7/13 5:09, Andrew Morton wrote:
>> On Fri, 12 Jul 2024 14:42:49 +0800 Miaohe Lin <linmiaohe@huawei.com> wrote:
>>
>>> When I did memory failure tests recently, below panic occurs:
>>>
>>> page dumped because: VM_BUG_ON_PAGE(PagePoisoned(page))
>>> kernel BUG at include/linux/page-flags.h:616!
>>> Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>>> CPU: 3 PID: 720 Comm: bash Not tainted 6.10.0-rc1-00195-g148743902568 #40
>>> RIP: 0010:unpoison_memory+0x2f3/0x590
>>> RSP: 0018:ffffa57fc8787d60 EFLAGS: 00000246
>>> RAX: 0000000000000037 RBX: 0000000000000009 RCX: ffff9be25fcdc9c8
>>> RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff9be25fcdc9c0
>>> RBP: 0000000000300000 R08: ffffffffb4956f88 R09: 0000000000009ffb
>>> R10: 0000000000000284 R11: ffffffffb4926fa0 R12: ffffe6b00c000000
>>> R13: ffff9bdb453dfd00 R14: 0000000000000000 R15: fffffffffffffffe
>>> FS:  00007f08f04e4740(0000) GS:ffff9be25fcc0000(0000) knlGS:0000000000000000
>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> CR2: 0000564787a30410 CR3: 000000010d4e2000 CR4: 00000000000006f0
>>> Call Trace:
>>>   <TASK>
>>>   unpoison_memory+0x2f3/0x590
>>>   simple_attr_write_xsigned.constprop.0.isra.0+0xb3/0x110
>>>   debugfs_attr_write+0x42/0x60
>>>   full_proxy_write+0x5b/0x80
>>>   vfs_write+0xd5/0x540
>>>   ksys_write+0x64/0xe0
>>>   do_syscall_64+0xb9/0x1d0
>>>   entry_SYSCALL_64_after_hwframe+0x77/0x7f
>>> RIP: 0033:0x7f08f0314887
>>> RSP: 002b:00007ffece710078 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
>>> RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 00007f08f0314887
>>> RDX: 0000000000000009 RSI: 0000564787a30410 RDI: 0000000000000001
>>> RBP: 0000564787a30410 R08: 000000000000fefe R09: 000000007fffffff
>>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000009
>>> R13: 00007f08f041b780 R14: 00007f08f0417600 R15: 00007f08f0416a00
>>>   </TASK>
>>> Modules linked in: hwpoison_inject
>>> ---[ end trace 0000000000000000 ]---
>>> RIP: 0010:unpoison_memory+0x2f3/0x590
>>> RSP: 0018:ffffa57fc8787d60 EFLAGS: 00000246
>>> RAX: 0000000000000037 RBX: 0000000000000009 RCX: ffff9be25fcdc9c8
>>> RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff9be25fcdc9c0
>>> RBP: 0000000000300000 R08: ffffffffb4956f88 R09: 0000000000009ffb
>>> R10: 0000000000000284 R11: ffffffffb4926fa0 R12: ffffe6b00c000000
>>> R13: ffff9bdb453dfd00 R14: 0000000000000000 R15: fffffffffffffffe
>>> FS:  00007f08f04e4740(0000) GS:ffff9be25fcc0000(0000) knlGS:0000000000000000
>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> CR2: 0000564787a30410 CR3: 000000010d4e2000 CR4: 00000000000006f0
>>> Kernel panic - not syncing: Fatal exception
>>> Kernel Offset: 0x31c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>> ---[ end Kernel panic - not syncing: Fatal exception ]---
>>>
>>> The root cause is that unpoison_memory() tries to check the PG_HWPoison
>>> flags of an uninitialized page. So VM_BUG_ON_PAGE(PagePoisoned(page)) is
>>> triggered.
>>
>> I'm not seeing the call path.  Is this BUG happening via
>>
>> static __always_inline void __ClearPage##uname(struct page *page)	\
>> {									\
>> 	VM_BUG_ON_PAGE(!Page##uname(page), page);			\
>> 	page->page_type |= PG_##lname;					\
>> }
>>
>> ?
>>
>> If so, where's the callsite?
> 
> It is BUG on PF_ANY():
> 
> PAGEFLAG(HWPoison, hwpoison, PF_ANY)
> 
> #define PF_ANY(page, enforce)	PF_POISONED_CHECK(page)
> 
> #define PF_POISONED_CHECK(page) ({					\
> 	VM_BUG_ON_PGFLAGS(PagePoisoned(page), page);		\
> 	page; })
> 
> #define	PAGE_POISON_PATTERN	-1l
> static inline int PagePoisoned(const struct page *page)
> {
> 	return READ_ONCE(page->flags) == PAGE_POISON_PATTERN;
> }
> 
> The offlined pages will have page->flags set to PAGE_POISON_PATTERN while pfn is still valid:
> 
> offline_pages
>    remove_pfn_range_from_zone
>      page_init_poison
>        memset(page, PAGE_POISON_PATTERN, size);

Worth noting that this happens after __offline_isolated_pages() marked 
the covering sections as offline.

Are we missing a pfn_to_online_page() check somewhere, or are we racing 
with offlining code that marks the section offline?
Miaohe Lin July 16, 2024, 2:34 a.m. UTC | #4
On 2024/7/16 0:16, David Hildenbrand wrote:
> On 15.07.24 08:23, Miaohe Lin wrote:
>> On 2024/7/13 5:09, Andrew Morton wrote:
>>> On Fri, 12 Jul 2024 14:42:49 +0800 Miaohe Lin <linmiaohe@huawei.com> wrote:
>>>
>>>> When I did memory failure tests recently, below panic occurs:
>>>>
>>>> page dumped because: VM_BUG_ON_PAGE(PagePoisoned(page))
>>>> kernel BUG at include/linux/page-flags.h:616!
>>>> Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>>>> CPU: 3 PID: 720 Comm: bash Not tainted 6.10.0-rc1-00195-g148743902568 #40
>>>> RIP: 0010:unpoison_memory+0x2f3/0x590
>>>> RSP: 0018:ffffa57fc8787d60 EFLAGS: 00000246
>>>> RAX: 0000000000000037 RBX: 0000000000000009 RCX: ffff9be25fcdc9c8
>>>> RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff9be25fcdc9c0
>>>> RBP: 0000000000300000 R08: ffffffffb4956f88 R09: 0000000000009ffb
>>>> R10: 0000000000000284 R11: ffffffffb4926fa0 R12: ffffe6b00c000000
>>>> R13: ffff9bdb453dfd00 R14: 0000000000000000 R15: fffffffffffffffe
>>>> FS:  00007f08f04e4740(0000) GS:ffff9be25fcc0000(0000) knlGS:0000000000000000
>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> CR2: 0000564787a30410 CR3: 000000010d4e2000 CR4: 00000000000006f0
>>>> Call Trace:
>>>>   <TASK>
>>>>   unpoison_memory+0x2f3/0x590
>>>>   simple_attr_write_xsigned.constprop.0.isra.0+0xb3/0x110
>>>>   debugfs_attr_write+0x42/0x60
>>>>   full_proxy_write+0x5b/0x80
>>>>   vfs_write+0xd5/0x540
>>>>   ksys_write+0x64/0xe0
>>>>   do_syscall_64+0xb9/0x1d0
>>>>   entry_SYSCALL_64_after_hwframe+0x77/0x7f
>>>> RIP: 0033:0x7f08f0314887
>>>> RSP: 002b:00007ffece710078 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
>>>> RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 00007f08f0314887
>>>> RDX: 0000000000000009 RSI: 0000564787a30410 RDI: 0000000000000001
>>>> RBP: 0000564787a30410 R08: 000000000000fefe R09: 000000007fffffff
>>>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000009
>>>> R13: 00007f08f041b780 R14: 00007f08f0417600 R15: 00007f08f0416a00
>>>>   </TASK>
>>>> Modules linked in: hwpoison_inject
>>>> ---[ end trace 0000000000000000 ]---
>>>> RIP: 0010:unpoison_memory+0x2f3/0x590
>>>> RSP: 0018:ffffa57fc8787d60 EFLAGS: 00000246
>>>> RAX: 0000000000000037 RBX: 0000000000000009 RCX: ffff9be25fcdc9c8
>>>> RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff9be25fcdc9c0
>>>> RBP: 0000000000300000 R08: ffffffffb4956f88 R09: 0000000000009ffb
>>>> R10: 0000000000000284 R11: ffffffffb4926fa0 R12: ffffe6b00c000000
>>>> R13: ffff9bdb453dfd00 R14: 0000000000000000 R15: fffffffffffffffe
>>>> FS:  00007f08f04e4740(0000) GS:ffff9be25fcc0000(0000) knlGS:0000000000000000
>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> CR2: 0000564787a30410 CR3: 000000010d4e2000 CR4: 00000000000006f0
>>>> Kernel panic - not syncing: Fatal exception
>>>> Kernel Offset: 0x31c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>>> ---[ end Kernel panic - not syncing: Fatal exception ]---
>>>>
>>>> The root cause is that unpoison_memory() tries to check the PG_HWPoison
>>>> flags of an uninitialized page. So VM_BUG_ON_PAGE(PagePoisoned(page)) is
>>>> triggered.
>>>
>>> I'm not seeing the call path.  Is this BUG happening via
>>>
>>> static __always_inline void __ClearPage##uname(struct page *page)    \
>>> {                                    \
>>>     VM_BUG_ON_PAGE(!Page##uname(page), page);            \
>>>     page->page_type |= PG_##lname;                    \
>>> }
>>>
>>> ?
>>>
>>> If so, where's the callsite?
>>
>> It is BUG on PF_ANY():
>>
>> PAGEFLAG(HWPoison, hwpoison, PF_ANY)
>>
>> #define PF_ANY(page, enforce)    PF_POISONED_CHECK(page)
>>
>> #define PF_POISONED_CHECK(page) ({                    \
>>     VM_BUG_ON_PGFLAGS(PagePoisoned(page), page);        \
>>     page; })
>>
>> #define    PAGE_POISON_PATTERN    -1l
>> static inline int PagePoisoned(const struct page *page)
>> {
>>     return READ_ONCE(page->flags) == PAGE_POISON_PATTERN;
>> }
>>
>> The offlined pages will have page->flags set to PAGE_POISON_PATTERN while pfn is still valid:
>>
>> offline_pages
>>    remove_pfn_range_from_zone
>>      page_init_poison
>>        memset(page, PAGE_POISON_PATTERN, size);
> 
> Worth noting that this happens after __offline_isolated_pages() marked the covering sections as offline.
> 
> Are we missing a pfn_to_online_page() check somewhere, or are we racing with offlining code that marks the section offline?

I was thinking about to use pfn_to_online_page() instead of pfn_to_page() in unpoison_memory() so we can get rid of offlined pages.
But there're ZONE_DEVICE pages. They're not-onlined too. And unpoison_memory() should work for them. So we can't simply use
pfn_to_online_page() in that. Or am I miss something?

Thanks.
.
David Hildenbrand July 17, 2024, 9:01 a.m. UTC | #5
On 16.07.24 04:34, Miaohe Lin wrote:
> On 2024/7/16 0:16, David Hildenbrand wrote:
>> On 15.07.24 08:23, Miaohe Lin wrote:
>>> On 2024/7/13 5:09, Andrew Morton wrote:
>>>> On Fri, 12 Jul 2024 14:42:49 +0800 Miaohe Lin <linmiaohe@huawei.com> wrote:
>>>>
>>>>> When I did memory failure tests recently, below panic occurs:
>>>>>
>>>>> page dumped because: VM_BUG_ON_PAGE(PagePoisoned(page))
>>>>> kernel BUG at include/linux/page-flags.h:616!
>>>>> Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>>>>> CPU: 3 PID: 720 Comm: bash Not tainted 6.10.0-rc1-00195-g148743902568 #40
>>>>> RIP: 0010:unpoison_memory+0x2f3/0x590
>>>>> RSP: 0018:ffffa57fc8787d60 EFLAGS: 00000246
>>>>> RAX: 0000000000000037 RBX: 0000000000000009 RCX: ffff9be25fcdc9c8
>>>>> RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff9be25fcdc9c0
>>>>> RBP: 0000000000300000 R08: ffffffffb4956f88 R09: 0000000000009ffb
>>>>> R10: 0000000000000284 R11: ffffffffb4926fa0 R12: ffffe6b00c000000
>>>>> R13: ffff9bdb453dfd00 R14: 0000000000000000 R15: fffffffffffffffe
>>>>> FS:  00007f08f04e4740(0000) GS:ffff9be25fcc0000(0000) knlGS:0000000000000000
>>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> CR2: 0000564787a30410 CR3: 000000010d4e2000 CR4: 00000000000006f0
>>>>> Call Trace:
>>>>>    <TASK>
>>>>>    unpoison_memory+0x2f3/0x590
>>>>>    simple_attr_write_xsigned.constprop.0.isra.0+0xb3/0x110
>>>>>    debugfs_attr_write+0x42/0x60
>>>>>    full_proxy_write+0x5b/0x80
>>>>>    vfs_write+0xd5/0x540
>>>>>    ksys_write+0x64/0xe0
>>>>>    do_syscall_64+0xb9/0x1d0
>>>>>    entry_SYSCALL_64_after_hwframe+0x77/0x7f
>>>>> RIP: 0033:0x7f08f0314887
>>>>> RSP: 002b:00007ffece710078 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
>>>>> RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 00007f08f0314887
>>>>> RDX: 0000000000000009 RSI: 0000564787a30410 RDI: 0000000000000001
>>>>> RBP: 0000564787a30410 R08: 000000000000fefe R09: 000000007fffffff
>>>>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000009
>>>>> R13: 00007f08f041b780 R14: 00007f08f0417600 R15: 00007f08f0416a00
>>>>>    </TASK>
>>>>> Modules linked in: hwpoison_inject
>>>>> ---[ end trace 0000000000000000 ]---
>>>>> RIP: 0010:unpoison_memory+0x2f3/0x590
>>>>> RSP: 0018:ffffa57fc8787d60 EFLAGS: 00000246
>>>>> RAX: 0000000000000037 RBX: 0000000000000009 RCX: ffff9be25fcdc9c8
>>>>> RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff9be25fcdc9c0
>>>>> RBP: 0000000000300000 R08: ffffffffb4956f88 R09: 0000000000009ffb
>>>>> R10: 0000000000000284 R11: ffffffffb4926fa0 R12: ffffe6b00c000000
>>>>> R13: ffff9bdb453dfd00 R14: 0000000000000000 R15: fffffffffffffffe
>>>>> FS:  00007f08f04e4740(0000) GS:ffff9be25fcc0000(0000) knlGS:0000000000000000
>>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> CR2: 0000564787a30410 CR3: 000000010d4e2000 CR4: 00000000000006f0
>>>>> Kernel panic - not syncing: Fatal exception
>>>>> Kernel Offset: 0x31c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>>>> ---[ end Kernel panic - not syncing: Fatal exception ]---
>>>>>
>>>>> The root cause is that unpoison_memory() tries to check the PG_HWPoison
>>>>> flags of an uninitialized page. So VM_BUG_ON_PAGE(PagePoisoned(page)) is
>>>>> triggered.
>>>>
>>>> I'm not seeing the call path.  Is this BUG happening via
>>>>
>>>> static __always_inline void __ClearPage##uname(struct page *page)    \
>>>> {                                    \
>>>>      VM_BUG_ON_PAGE(!Page##uname(page), page);            \
>>>>      page->page_type |= PG_##lname;                    \
>>>> }
>>>>
>>>> ?
>>>>
>>>> If so, where's the callsite?
>>>
>>> It is BUG on PF_ANY():
>>>
>>> PAGEFLAG(HWPoison, hwpoison, PF_ANY)
>>>
>>> #define PF_ANY(page, enforce)    PF_POISONED_CHECK(page)
>>>
>>> #define PF_POISONED_CHECK(page) ({                    \
>>>      VM_BUG_ON_PGFLAGS(PagePoisoned(page), page);        \
>>>      page; })
>>>
>>> #define    PAGE_POISON_PATTERN    -1l
>>> static inline int PagePoisoned(const struct page *page)
>>> {
>>>      return READ_ONCE(page->flags) == PAGE_POISON_PATTERN;
>>> }
>>>
>>> The offlined pages will have page->flags set to PAGE_POISON_PATTERN while pfn is still valid:
>>>
>>> offline_pages
>>>     remove_pfn_range_from_zone
>>>       page_init_poison
>>>         memset(page, PAGE_POISON_PATTERN, size);
>>
>> Worth noting that this happens after __offline_isolated_pages() marked the covering sections as offline.
>>
>> Are we missing a pfn_to_online_page() check somewhere, or are we racing with offlining code that marks the section offline?
> 
> I was thinking about to use pfn_to_online_page() instead of pfn_to_page() in unpoison_memory() so we can get rid of offlined pages.
> But there're ZONE_DEVICE pages. They're not-onlined too. And unpoison_memory() should work for them. So we can't simply use
> pfn_to_online_page() in that. Or am I miss something?

Right, pfn_to_online_page() does not detect ZONE_DEVICE. That has to be 
handled separately if pfn_to_online_page() would fail.

... which is what we do in memory_failure():

p = pfn_to_online_page(pfn);
if (!p) {
	if (pfn_valid(pfn)) {
		pgmap = get_dev_pagemap(pfn, NULL);
		put_ref_page(pfn, flags);
		if (pgmap) {
			...
		}
	}
	...
}
Miaohe Lin July 18, 2024, 3:04 a.m. UTC | #6
On 2024/7/17 17:01, David Hildenbrand wrote:
> On 16.07.24 04:34, Miaohe Lin wrote:
>> On 2024/7/16 0:16, David Hildenbrand wrote:
>>> On 15.07.24 08:23, Miaohe Lin wrote:
>>>> On 2024/7/13 5:09, Andrew Morton wrote:
>>>>> On Fri, 12 Jul 2024 14:42:49 +0800 Miaohe Lin <linmiaohe@huawei.com> wrote:
>>>>>
>>>>>> When I did memory failure tests recently, below panic occurs:
>>>>>>
>>>>>> page dumped because: VM_BUG_ON_PAGE(PagePoisoned(page))
>>>>>> kernel BUG at include/linux/page-flags.h:616!
>>>>>> Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>>>>>> CPU: 3 PID: 720 Comm: bash Not tainted 6.10.0-rc1-00195-g148743902568 #40
>>>>>> RIP: 0010:unpoison_memory+0x2f3/0x590
>>>>>> RSP: 0018:ffffa57fc8787d60 EFLAGS: 00000246
>>>>>> RAX: 0000000000000037 RBX: 0000000000000009 RCX: ffff9be25fcdc9c8
>>>>>> RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff9be25fcdc9c0
>>>>>> RBP: 0000000000300000 R08: ffffffffb4956f88 R09: 0000000000009ffb
>>>>>> R10: 0000000000000284 R11: ffffffffb4926fa0 R12: ffffe6b00c000000
>>>>>> R13: ffff9bdb453dfd00 R14: 0000000000000000 R15: fffffffffffffffe
>>>>>> FS:  00007f08f04e4740(0000) GS:ffff9be25fcc0000(0000) knlGS:0000000000000000
>>>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>> CR2: 0000564787a30410 CR3: 000000010d4e2000 CR4: 00000000000006f0
>>>>>> Call Trace:
>>>>>>    <TASK>
>>>>>>    unpoison_memory+0x2f3/0x590
>>>>>>    simple_attr_write_xsigned.constprop.0.isra.0+0xb3/0x110
>>>>>>    debugfs_attr_write+0x42/0x60
>>>>>>    full_proxy_write+0x5b/0x80
>>>>>>    vfs_write+0xd5/0x540
>>>>>>    ksys_write+0x64/0xe0
>>>>>>    do_syscall_64+0xb9/0x1d0
>>>>>>    entry_SYSCALL_64_after_hwframe+0x77/0x7f
>>>>>> RIP: 0033:0x7f08f0314887
>>>>>> RSP: 002b:00007ffece710078 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
>>>>>> RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 00007f08f0314887
>>>>>> RDX: 0000000000000009 RSI: 0000564787a30410 RDI: 0000000000000001
>>>>>> RBP: 0000564787a30410 R08: 000000000000fefe R09: 000000007fffffff
>>>>>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000009
>>>>>> R13: 00007f08f041b780 R14: 00007f08f0417600 R15: 00007f08f0416a00
>>>>>>    </TASK>
>>>>>> Modules linked in: hwpoison_inject
>>>>>> ---[ end trace 0000000000000000 ]---
>>>>>> RIP: 0010:unpoison_memory+0x2f3/0x590
>>>>>> RSP: 0018:ffffa57fc8787d60 EFLAGS: 00000246
>>>>>> RAX: 0000000000000037 RBX: 0000000000000009 RCX: ffff9be25fcdc9c8
>>>>>> RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff9be25fcdc9c0
>>>>>> RBP: 0000000000300000 R08: ffffffffb4956f88 R09: 0000000000009ffb
>>>>>> R10: 0000000000000284 R11: ffffffffb4926fa0 R12: ffffe6b00c000000
>>>>>> R13: ffff9bdb453dfd00 R14: 0000000000000000 R15: fffffffffffffffe
>>>>>> FS:  00007f08f04e4740(0000) GS:ffff9be25fcc0000(0000) knlGS:0000000000000000
>>>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>> CR2: 0000564787a30410 CR3: 000000010d4e2000 CR4: 00000000000006f0
>>>>>> Kernel panic - not syncing: Fatal exception
>>>>>> Kernel Offset: 0x31c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>>>>> ---[ end Kernel panic - not syncing: Fatal exception ]---
>>>>>>
>>>>>> The root cause is that unpoison_memory() tries to check the PG_HWPoison
>>>>>> flags of an uninitialized page. So VM_BUG_ON_PAGE(PagePoisoned(page)) is
>>>>>> triggered.
>>>>>
>>>>> I'm not seeing the call path.  Is this BUG happening via
>>>>>
>>>>> static __always_inline void __ClearPage##uname(struct page *page)    \
>>>>> {                                    \
>>>>>      VM_BUG_ON_PAGE(!Page##uname(page), page);            \
>>>>>      page->page_type |= PG_##lname;                    \
>>>>> }
>>>>>
>>>>> ?
>>>>>
>>>>> If so, where's the callsite?
>>>>
>>>> It is BUG on PF_ANY():
>>>>
>>>> PAGEFLAG(HWPoison, hwpoison, PF_ANY)
>>>>
>>>> #define PF_ANY(page, enforce)    PF_POISONED_CHECK(page)
>>>>
>>>> #define PF_POISONED_CHECK(page) ({                    \
>>>>      VM_BUG_ON_PGFLAGS(PagePoisoned(page), page);        \
>>>>      page; })
>>>>
>>>> #define    PAGE_POISON_PATTERN    -1l
>>>> static inline int PagePoisoned(const struct page *page)
>>>> {
>>>>      return READ_ONCE(page->flags) == PAGE_POISON_PATTERN;
>>>> }
>>>>
>>>> The offlined pages will have page->flags set to PAGE_POISON_PATTERN while pfn is still valid:
>>>>
>>>> offline_pages
>>>>     remove_pfn_range_from_zone
>>>>       page_init_poison
>>>>         memset(page, PAGE_POISON_PATTERN, size);
>>>
>>> Worth noting that this happens after __offline_isolated_pages() marked the covering sections as offline.
>>>
>>> Are we missing a pfn_to_online_page() check somewhere, or are we racing with offlining code that marks the section offline?
>>
>> I was thinking about to use pfn_to_online_page() instead of pfn_to_page() in unpoison_memory() so we can get rid of offlined pages.
>> But there're ZONE_DEVICE pages. They're not-onlined too. And unpoison_memory() should work for them. So we can't simply use
>> pfn_to_online_page() in that. Or am I miss something?
> 
> Right, pfn_to_online_page() does not detect ZONE_DEVICE. That has to be handled separately if pfn_to_online_page() would fail.
> 
> ... which is what we do in memory_failure():
> 
> p = pfn_to_online_page(pfn);
> if (!p) {
>     if (pfn_valid(pfn)) {
>         pgmap = get_dev_pagemap(pfn, NULL);
>         put_ref_page(pfn, flags);
>         if (pgmap) {
>             ...
>         }
>     }
>     ...
> }

Yup, this will be a good alternative. But will it be better to simply check PagePoisoned() instead?

Thanks.
.
David Hildenbrand July 18, 2024, 5:15 a.m. UTC | #7
On 18.07.24 05:04, Miaohe Lin wrote:
> On 2024/7/17 17:01, David Hildenbrand wrote:
>> On 16.07.24 04:34, Miaohe Lin wrote:
>>> On 2024/7/16 0:16, David Hildenbrand wrote:
>>>> On 15.07.24 08:23, Miaohe Lin wrote:
>>>>> On 2024/7/13 5:09, Andrew Morton wrote:
>>>>>> On Fri, 12 Jul 2024 14:42:49 +0800 Miaohe Lin <linmiaohe@huawei.com> wrote:
>>>>>>
>>>>>>> When I did memory failure tests recently, below panic occurs:
>>>>>>>
>>>>>>> page dumped because: VM_BUG_ON_PAGE(PagePoisoned(page))
>>>>>>> kernel BUG at include/linux/page-flags.h:616!
>>>>>>> Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>>>>>>> CPU: 3 PID: 720 Comm: bash Not tainted 6.10.0-rc1-00195-g148743902568 #40
>>>>>>> RIP: 0010:unpoison_memory+0x2f3/0x590
>>>>>>> RSP: 0018:ffffa57fc8787d60 EFLAGS: 00000246
>>>>>>> RAX: 0000000000000037 RBX: 0000000000000009 RCX: ffff9be25fcdc9c8
>>>>>>> RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff9be25fcdc9c0
>>>>>>> RBP: 0000000000300000 R08: ffffffffb4956f88 R09: 0000000000009ffb
>>>>>>> R10: 0000000000000284 R11: ffffffffb4926fa0 R12: ffffe6b00c000000
>>>>>>> R13: ffff9bdb453dfd00 R14: 0000000000000000 R15: fffffffffffffffe
>>>>>>> FS:  00007f08f04e4740(0000) GS:ffff9be25fcc0000(0000) knlGS:0000000000000000
>>>>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>> CR2: 0000564787a30410 CR3: 000000010d4e2000 CR4: 00000000000006f0
>>>>>>> Call Trace:
>>>>>>>     <TASK>
>>>>>>>     unpoison_memory+0x2f3/0x590
>>>>>>>     simple_attr_write_xsigned.constprop.0.isra.0+0xb3/0x110
>>>>>>>     debugfs_attr_write+0x42/0x60
>>>>>>>     full_proxy_write+0x5b/0x80
>>>>>>>     vfs_write+0xd5/0x540
>>>>>>>     ksys_write+0x64/0xe0
>>>>>>>     do_syscall_64+0xb9/0x1d0
>>>>>>>     entry_SYSCALL_64_after_hwframe+0x77/0x7f
>>>>>>> RIP: 0033:0x7f08f0314887
>>>>>>> RSP: 002b:00007ffece710078 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
>>>>>>> RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 00007f08f0314887
>>>>>>> RDX: 0000000000000009 RSI: 0000564787a30410 RDI: 0000000000000001
>>>>>>> RBP: 0000564787a30410 R08: 000000000000fefe R09: 000000007fffffff
>>>>>>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000009
>>>>>>> R13: 00007f08f041b780 R14: 00007f08f0417600 R15: 00007f08f0416a00
>>>>>>>     </TASK>
>>>>>>> Modules linked in: hwpoison_inject
>>>>>>> ---[ end trace 0000000000000000 ]---
>>>>>>> RIP: 0010:unpoison_memory+0x2f3/0x590
>>>>>>> RSP: 0018:ffffa57fc8787d60 EFLAGS: 00000246
>>>>>>> RAX: 0000000000000037 RBX: 0000000000000009 RCX: ffff9be25fcdc9c8
>>>>>>> RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff9be25fcdc9c0
>>>>>>> RBP: 0000000000300000 R08: ffffffffb4956f88 R09: 0000000000009ffb
>>>>>>> R10: 0000000000000284 R11: ffffffffb4926fa0 R12: ffffe6b00c000000
>>>>>>> R13: ffff9bdb453dfd00 R14: 0000000000000000 R15: fffffffffffffffe
>>>>>>> FS:  00007f08f04e4740(0000) GS:ffff9be25fcc0000(0000) knlGS:0000000000000000
>>>>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>> CR2: 0000564787a30410 CR3: 000000010d4e2000 CR4: 00000000000006f0
>>>>>>> Kernel panic - not syncing: Fatal exception
>>>>>>> Kernel Offset: 0x31c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>>>>>> ---[ end Kernel panic - not syncing: Fatal exception ]---
>>>>>>>
>>>>>>> The root cause is that unpoison_memory() tries to check the PG_HWPoison
>>>>>>> flags of an uninitialized page. So VM_BUG_ON_PAGE(PagePoisoned(page)) is
>>>>>>> triggered.
>>>>>>
>>>>>> I'm not seeing the call path.  Is this BUG happening via
>>>>>>
>>>>>> static __always_inline void __ClearPage##uname(struct page *page)    \
>>>>>> {                                    \
>>>>>>       VM_BUG_ON_PAGE(!Page##uname(page), page);            \
>>>>>>       page->page_type |= PG_##lname;                    \
>>>>>> }
>>>>>>
>>>>>> ?
>>>>>>
>>>>>> If so, where's the callsite?
>>>>>
>>>>> It is BUG on PF_ANY():
>>>>>
>>>>> PAGEFLAG(HWPoison, hwpoison, PF_ANY)
>>>>>
>>>>> #define PF_ANY(page, enforce)    PF_POISONED_CHECK(page)
>>>>>
>>>>> #define PF_POISONED_CHECK(page) ({                    \
>>>>>       VM_BUG_ON_PGFLAGS(PagePoisoned(page), page);        \
>>>>>       page; })
>>>>>
>>>>> #define    PAGE_POISON_PATTERN    -1l
>>>>> static inline int PagePoisoned(const struct page *page)
>>>>> {
>>>>>       return READ_ONCE(page->flags) == PAGE_POISON_PATTERN;
>>>>> }
>>>>>
>>>>> The offlined pages will have page->flags set to PAGE_POISON_PATTERN while pfn is still valid:
>>>>>
>>>>> offline_pages
>>>>>      remove_pfn_range_from_zone
>>>>>        page_init_poison
>>>>>          memset(page, PAGE_POISON_PATTERN, size);
>>>>
>>>> Worth noting that this happens after __offline_isolated_pages() marked the covering sections as offline.
>>>>
>>>> Are we missing a pfn_to_online_page() check somewhere, or are we racing with offlining code that marks the section offline?
>>>
>>> I was thinking about to use pfn_to_online_page() instead of pfn_to_page() in unpoison_memory() so we can get rid of offlined pages.
>>> But there're ZONE_DEVICE pages. They're not-onlined too. And unpoison_memory() should work for them. So we can't simply use
>>> pfn_to_online_page() in that. Or am I miss something?
>>
>> Right, pfn_to_online_page() does not detect ZONE_DEVICE. That has to be handled separately if pfn_to_online_page() would fail.
>>
>> ... which is what we do in memory_failure():
>>
>> p = pfn_to_online_page(pfn);
>> if (!p) {
>>      if (pfn_valid(pfn)) {
>>          pgmap = get_dev_pagemap(pfn, NULL);
>>          put_ref_page(pfn, flags);
>>          if (pgmap) {
>>              ...
>>          }
>>      }
>>      ...
>> }
> 
> Yup, this will be a good alternative. But will it be better to simply check PagePoisoned() instead?

The memmap of offline memory sections shall not be touched, so .... 
don't touch it ;)

Especially because that PagePoisoned() check is non-sensical without 
poisoining-during-memmap-init. You would still work with memory in 
offline sections.

I think the code is even wrong in that regard: we allow for memory 
offlining to work with HWPoisoned pages, see __offline_isolated_pages(). 
Staring at unpoison_memory(), we might be putting these pages back to 
the buddy? Which is completely wrong.


... not to mention that a function called "unpoison_memory()" doing 
nothing when it finds PagePoison() is completely confusing. Last but not 
least, take a look at the number of users of PagePoison().

Likely PagePoison() warrants a cleanup, but I am not sure yet what's the 
right thing to do.
Miaohe Lin July 19, 2024, 3:55 a.m. UTC | #8
On 2024/7/18 13:15, David Hildenbrand wrote:
> On 18.07.24 05:04, Miaohe Lin wrote:
>> On 2024/7/17 17:01, David Hildenbrand wrote:
>>> On 16.07.24 04:34, Miaohe Lin wrote:
>>>> On 2024/7/16 0:16, David Hildenbrand wrote:
>>>>> On 15.07.24 08:23, Miaohe Lin wrote:
>>>>>> On 2024/7/13 5:09, Andrew Morton wrote:
>>>>>>> On Fri, 12 Jul 2024 14:42:49 +0800 Miaohe Lin <linmiaohe@huawei.com> wrote:
>>>>>>>
>>>>>>>> When I did memory failure tests recently, below panic occurs:
>>>>>>>>
>>>>>>>> page dumped because: VM_BUG_ON_PAGE(PagePoisoned(page))
>>>>>>>> kernel BUG at include/linux/page-flags.h:616!
>>>>>>>> Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>>>>>>>> CPU: 3 PID: 720 Comm: bash Not tainted 6.10.0-rc1-00195-g148743902568 #40
>>>>>>>> RIP: 0010:unpoison_memory+0x2f3/0x590
>>>>>>>> RSP: 0018:ffffa57fc8787d60 EFLAGS: 00000246
>>>>>>>> RAX: 0000000000000037 RBX: 0000000000000009 RCX: ffff9be25fcdc9c8
>>>>>>>> RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff9be25fcdc9c0
>>>>>>>> RBP: 0000000000300000 R08: ffffffffb4956f88 R09: 0000000000009ffb
>>>>>>>> R10: 0000000000000284 R11: ffffffffb4926fa0 R12: ffffe6b00c000000
>>>>>>>> R13: ffff9bdb453dfd00 R14: 0000000000000000 R15: fffffffffffffffe
>>>>>>>> FS:  00007f08f04e4740(0000) GS:ffff9be25fcc0000(0000) knlGS:0000000000000000
>>>>>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>>> CR2: 0000564787a30410 CR3: 000000010d4e2000 CR4: 00000000000006f0
>>>>>>>> Call Trace:
>>>>>>>>     <TASK>
>>>>>>>>     unpoison_memory+0x2f3/0x590
>>>>>>>>     simple_attr_write_xsigned.constprop.0.isra.0+0xb3/0x110
>>>>>>>>     debugfs_attr_write+0x42/0x60
>>>>>>>>     full_proxy_write+0x5b/0x80
>>>>>>>>     vfs_write+0xd5/0x540
>>>>>>>>     ksys_write+0x64/0xe0
>>>>>>>>     do_syscall_64+0xb9/0x1d0
>>>>>>>>     entry_SYSCALL_64_after_hwframe+0x77/0x7f
>>>>>>>> RIP: 0033:0x7f08f0314887
>>>>>>>> RSP: 002b:00007ffece710078 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
>>>>>>>> RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 00007f08f0314887
>>>>>>>> RDX: 0000000000000009 RSI: 0000564787a30410 RDI: 0000000000000001
>>>>>>>> RBP: 0000564787a30410 R08: 000000000000fefe R09: 000000007fffffff
>>>>>>>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000009
>>>>>>>> R13: 00007f08f041b780 R14: 00007f08f0417600 R15: 00007f08f0416a00
>>>>>>>>     </TASK>
>>>>>>>> Modules linked in: hwpoison_inject
>>>>>>>> ---[ end trace 0000000000000000 ]---
>>>>>>>> RIP: 0010:unpoison_memory+0x2f3/0x590
>>>>>>>> RSP: 0018:ffffa57fc8787d60 EFLAGS: 00000246
>>>>>>>> RAX: 0000000000000037 RBX: 0000000000000009 RCX: ffff9be25fcdc9c8
>>>>>>>> RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff9be25fcdc9c0
>>>>>>>> RBP: 0000000000300000 R08: ffffffffb4956f88 R09: 0000000000009ffb
>>>>>>>> R10: 0000000000000284 R11: ffffffffb4926fa0 R12: ffffe6b00c000000
>>>>>>>> R13: ffff9bdb453dfd00 R14: 0000000000000000 R15: fffffffffffffffe
>>>>>>>> FS:  00007f08f04e4740(0000) GS:ffff9be25fcc0000(0000) knlGS:0000000000000000
>>>>>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>>> CR2: 0000564787a30410 CR3: 000000010d4e2000 CR4: 00000000000006f0
>>>>>>>> Kernel panic - not syncing: Fatal exception
>>>>>>>> Kernel Offset: 0x31c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>>>>>>> ---[ end Kernel panic - not syncing: Fatal exception ]---
>>>>>>>>
>>>>>>>> The root cause is that unpoison_memory() tries to check the PG_HWPoison
>>>>>>>> flags of an uninitialized page. So VM_BUG_ON_PAGE(PagePoisoned(page)) is
>>>>>>>> triggered.
>>>>>>>
>>>>>>> I'm not seeing the call path.  Is this BUG happening via
>>>>>>>
>>>>>>> static __always_inline void __ClearPage##uname(struct page *page)    \
>>>>>>> {                                    \
>>>>>>>       VM_BUG_ON_PAGE(!Page##uname(page), page);            \
>>>>>>>       page->page_type |= PG_##lname;                    \
>>>>>>> }
>>>>>>>
>>>>>>> ?
>>>>>>>
>>>>>>> If so, where's the callsite?
>>>>>>
>>>>>> It is BUG on PF_ANY():
>>>>>>
>>>>>> PAGEFLAG(HWPoison, hwpoison, PF_ANY)
>>>>>>
>>>>>> #define PF_ANY(page, enforce)    PF_POISONED_CHECK(page)
>>>>>>
>>>>>> #define PF_POISONED_CHECK(page) ({                    \
>>>>>>       VM_BUG_ON_PGFLAGS(PagePoisoned(page), page);        \
>>>>>>       page; })
>>>>>>
>>>>>> #define    PAGE_POISON_PATTERN    -1l
>>>>>> static inline int PagePoisoned(const struct page *page)
>>>>>> {
>>>>>>       return READ_ONCE(page->flags) == PAGE_POISON_PATTERN;
>>>>>> }
>>>>>>
>>>>>> The offlined pages will have page->flags set to PAGE_POISON_PATTERN while pfn is still valid:
>>>>>>
>>>>>> offline_pages
>>>>>>      remove_pfn_range_from_zone
>>>>>>        page_init_poison
>>>>>>          memset(page, PAGE_POISON_PATTERN, size);
>>>>>
>>>>> Worth noting that this happens after __offline_isolated_pages() marked the covering sections as offline.
>>>>>
>>>>> Are we missing a pfn_to_online_page() check somewhere, or are we racing with offlining code that marks the section offline?
>>>>
>>>> I was thinking about to use pfn_to_online_page() instead of pfn_to_page() in unpoison_memory() so we can get rid of offlined pages.
>>>> But there're ZONE_DEVICE pages. They're not-onlined too. And unpoison_memory() should work for them. So we can't simply use
>>>> pfn_to_online_page() in that. Or am I miss something?
>>>
>>> Right, pfn_to_online_page() does not detect ZONE_DEVICE. That has to be handled separately if pfn_to_online_page() would fail.
>>>
>>> ... which is what we do in memory_failure():
>>>
>>> p = pfn_to_online_page(pfn);
>>> if (!p) {
>>>      if (pfn_valid(pfn)) {
>>>          pgmap = get_dev_pagemap(pfn, NULL);
>>>          put_ref_page(pfn, flags);
>>>          if (pgmap) {
>>>              ...
>>>          }
>>>      }
>>>      ...
>>> }
>>
>> Yup, this will be a good alternative. But will it be better to simply check PagePoisoned() instead?
> 
> The memmap of offline memory sections shall not be touched, so .... don't touch it ;)
> 
> Especially because that PagePoisoned() check is non-sensical without poisoining-during-memmap-init. You would still work with memory in offline sections.
> 
> I think the code is even wrong in that regard: we allow for memory offlining to work with HWPoisoned pages, see __offline_isolated_pages(). Staring at unpoison_memory(), we might be putting these pages back to the buddy? Which is completely wrong.

I agree with you. Thanks for detailed explanation. :)
Thanks David.
.
David Hildenbrand Aug. 1, 2024, 8:24 p.m. UTC | #9
On 19.07.24 05:55, Miaohe Lin wrote:
> On 2024/7/18 13:15, David Hildenbrand wrote:
>> On 18.07.24 05:04, Miaohe Lin wrote:
>>> On 2024/7/17 17:01, David Hildenbrand wrote:
>>>> On 16.07.24 04:34, Miaohe Lin wrote:
>>>>> On 2024/7/16 0:16, David Hildenbrand wrote:
>>>>>> On 15.07.24 08:23, Miaohe Lin wrote:
>>>>>>> On 2024/7/13 5:09, Andrew Morton wrote:
>>>>>>>> On Fri, 12 Jul 2024 14:42:49 +0800 Miaohe Lin <linmiaohe@huawei.com> wrote:
>>>>>>>>
>>>>>>>>> When I did memory failure tests recently, below panic occurs:
>>>>>>>>>
>>>>>>>>> page dumped because: VM_BUG_ON_PAGE(PagePoisoned(page))
>>>>>>>>> kernel BUG at include/linux/page-flags.h:616!
>>>>>>>>> Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>>>>>>>>> CPU: 3 PID: 720 Comm: bash Not tainted 6.10.0-rc1-00195-g148743902568 #40
>>>>>>>>> RIP: 0010:unpoison_memory+0x2f3/0x590
>>>>>>>>> RSP: 0018:ffffa57fc8787d60 EFLAGS: 00000246
>>>>>>>>> RAX: 0000000000000037 RBX: 0000000000000009 RCX: ffff9be25fcdc9c8
>>>>>>>>> RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff9be25fcdc9c0
>>>>>>>>> RBP: 0000000000300000 R08: ffffffffb4956f88 R09: 0000000000009ffb
>>>>>>>>> R10: 0000000000000284 R11: ffffffffb4926fa0 R12: ffffe6b00c000000
>>>>>>>>> R13: ffff9bdb453dfd00 R14: 0000000000000000 R15: fffffffffffffffe
>>>>>>>>> FS:  00007f08f04e4740(0000) GS:ffff9be25fcc0000(0000) knlGS:0000000000000000
>>>>>>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>>>> CR2: 0000564787a30410 CR3: 000000010d4e2000 CR4: 00000000000006f0
>>>>>>>>> Call Trace:
>>>>>>>>>      <TASK>
>>>>>>>>>      unpoison_memory+0x2f3/0x590
>>>>>>>>>      simple_attr_write_xsigned.constprop.0.isra.0+0xb3/0x110
>>>>>>>>>      debugfs_attr_write+0x42/0x60
>>>>>>>>>      full_proxy_write+0x5b/0x80
>>>>>>>>>      vfs_write+0xd5/0x540
>>>>>>>>>      ksys_write+0x64/0xe0
>>>>>>>>>      do_syscall_64+0xb9/0x1d0
>>>>>>>>>      entry_SYSCALL_64_after_hwframe+0x77/0x7f
>>>>>>>>> RIP: 0033:0x7f08f0314887
>>>>>>>>> RSP: 002b:00007ffece710078 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
>>>>>>>>> RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 00007f08f0314887
>>>>>>>>> RDX: 0000000000000009 RSI: 0000564787a30410 RDI: 0000000000000001
>>>>>>>>> RBP: 0000564787a30410 R08: 000000000000fefe R09: 000000007fffffff
>>>>>>>>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000009
>>>>>>>>> R13: 00007f08f041b780 R14: 00007f08f0417600 R15: 00007f08f0416a00
>>>>>>>>>      </TASK>
>>>>>>>>> Modules linked in: hwpoison_inject
>>>>>>>>> ---[ end trace 0000000000000000 ]---
>>>>>>>>> RIP: 0010:unpoison_memory+0x2f3/0x590
>>>>>>>>> RSP: 0018:ffffa57fc8787d60 EFLAGS: 00000246
>>>>>>>>> RAX: 0000000000000037 RBX: 0000000000000009 RCX: ffff9be25fcdc9c8
>>>>>>>>> RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff9be25fcdc9c0
>>>>>>>>> RBP: 0000000000300000 R08: ffffffffb4956f88 R09: 0000000000009ffb
>>>>>>>>> R10: 0000000000000284 R11: ffffffffb4926fa0 R12: ffffe6b00c000000
>>>>>>>>> R13: ffff9bdb453dfd00 R14: 0000000000000000 R15: fffffffffffffffe
>>>>>>>>> FS:  00007f08f04e4740(0000) GS:ffff9be25fcc0000(0000) knlGS:0000000000000000
>>>>>>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>>>> CR2: 0000564787a30410 CR3: 000000010d4e2000 CR4: 00000000000006f0
>>>>>>>>> Kernel panic - not syncing: Fatal exception
>>>>>>>>> Kernel Offset: 0x31c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>>>>>>>> ---[ end Kernel panic - not syncing: Fatal exception ]---
>>>>>>>>>
>>>>>>>>> The root cause is that unpoison_memory() tries to check the PG_HWPoison
>>>>>>>>> flags of an uninitialized page. So VM_BUG_ON_PAGE(PagePoisoned(page)) is
>>>>>>>>> triggered.
>>>>>>>>
>>>>>>>> I'm not seeing the call path.  Is this BUG happening via
>>>>>>>>
>>>>>>>> static __always_inline void __ClearPage##uname(struct page *page)    \
>>>>>>>> {                                    \
>>>>>>>>        VM_BUG_ON_PAGE(!Page##uname(page), page);            \
>>>>>>>>        page->page_type |= PG_##lname;                    \
>>>>>>>> }
>>>>>>>>
>>>>>>>> ?
>>>>>>>>
>>>>>>>> If so, where's the callsite?
>>>>>>>
>>>>>>> It is BUG on PF_ANY():
>>>>>>>
>>>>>>> PAGEFLAG(HWPoison, hwpoison, PF_ANY)
>>>>>>>
>>>>>>> #define PF_ANY(page, enforce)    PF_POISONED_CHECK(page)
>>>>>>>
>>>>>>> #define PF_POISONED_CHECK(page) ({                    \
>>>>>>>        VM_BUG_ON_PGFLAGS(PagePoisoned(page), page);        \
>>>>>>>        page; })
>>>>>>>
>>>>>>> #define    PAGE_POISON_PATTERN    -1l
>>>>>>> static inline int PagePoisoned(const struct page *page)
>>>>>>> {
>>>>>>>        return READ_ONCE(page->flags) == PAGE_POISON_PATTERN;
>>>>>>> }
>>>>>>>
>>>>>>> The offlined pages will have page->flags set to PAGE_POISON_PATTERN while pfn is still valid:
>>>>>>>
>>>>>>> offline_pages
>>>>>>>       remove_pfn_range_from_zone
>>>>>>>         page_init_poison
>>>>>>>           memset(page, PAGE_POISON_PATTERN, size);
>>>>>>
>>>>>> Worth noting that this happens after __offline_isolated_pages() marked the covering sections as offline.
>>>>>>
>>>>>> Are we missing a pfn_to_online_page() check somewhere, or are we racing with offlining code that marks the section offline?
>>>>>
>>>>> I was thinking about to use pfn_to_online_page() instead of pfn_to_page() in unpoison_memory() so we can get rid of offlined pages.
>>>>> But there're ZONE_DEVICE pages. They're not-onlined too. And unpoison_memory() should work for them. So we can't simply use
>>>>> pfn_to_online_page() in that. Or am I miss something?
>>>>
>>>> Right, pfn_to_online_page() does not detect ZONE_DEVICE. That has to be handled separately if pfn_to_online_page() would fail.
>>>>
>>>> ... which is what we do in memory_failure():
>>>>
>>>> p = pfn_to_online_page(pfn);
>>>> if (!p) {
>>>>       if (pfn_valid(pfn)) {
>>>>           pgmap = get_dev_pagemap(pfn, NULL);
>>>>           put_ref_page(pfn, flags);
>>>>           if (pgmap) {
>>>>               ...
>>>>           }
>>>>       }
>>>>       ...
>>>> }
>>>
>>> Yup, this will be a good alternative. But will it be better to simply check PagePoisoned() instead?
>>
>> The memmap of offline memory sections shall not be touched, so .... don't touch it ;)
>>
>> Especially because that PagePoisoned() check is non-sensical without poisoining-during-memmap-init. You would still work with memory in offline sections.
>>
>> I think the code is even wrong in that regard: we allow for memory offlining to work with HWPoisoned pages, see __offline_isolated_pages(). Staring at unpoison_memory(), we might be putting these pages back to the buddy? Which is completely wrong.
> 
> I agree with you. Thanks for detailed explanation. :)
> Thanks David.

So ... I assume there will be a new patch? :)
Miaohe Lin Aug. 5, 2024, 6:25 a.m. UTC | #10
On 2024/8/2 4:24, David Hildenbrand wrote:
> On 19.07.24 05:55, Miaohe Lin wrote:
>> On 2024/7/18 13:15, David Hildenbrand wrote:
>>> On 18.07.24 05:04, Miaohe Lin wrote:
>>>> On 2024/7/17 17:01, David Hildenbrand wrote:
>>>>> On 16.07.24 04:34, Miaohe Lin wrote:
>>>>>> On 2024/7/16 0:16, David Hildenbrand wrote:
>>>>>>> On 15.07.24 08:23, Miaohe Lin wrote:
>>>>>>>> On 2024/7/13 5:09, Andrew Morton wrote:
>>>>>>>>> On Fri, 12 Jul 2024 14:42:49 +0800 Miaohe Lin <linmiaohe@huawei.com> wrote:
>>>>>>>>>
>>>>>>>>>> When I did memory failure tests recently, below panic occurs:
>>>>>>>>>>
>>>>>>>>>> page dumped because: VM_BUG_ON_PAGE(PagePoisoned(page))
>>>>>>>>>> kernel BUG at include/linux/page-flags.h:616!
>>>>>>>>>> Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>>>>>>>>>> CPU: 3 PID: 720 Comm: bash Not tainted 6.10.0-rc1-00195-g148743902568 #40
>>>>>>>>>> RIP: 0010:unpoison_memory+0x2f3/0x590
>>>>>>>>>> RSP: 0018:ffffa57fc8787d60 EFLAGS: 00000246
>>>>>>>>>> RAX: 0000000000000037 RBX: 0000000000000009 RCX: ffff9be25fcdc9c8
>>>>>>>>>> RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff9be25fcdc9c0
>>>>>>>>>> RBP: 0000000000300000 R08: ffffffffb4956f88 R09: 0000000000009ffb
>>>>>>>>>> R10: 0000000000000284 R11: ffffffffb4926fa0 R12: ffffe6b00c000000
>>>>>>>>>> R13: ffff9bdb453dfd00 R14: 0000000000000000 R15: fffffffffffffffe
>>>>>>>>>> FS:  00007f08f04e4740(0000) GS:ffff9be25fcc0000(0000) knlGS:0000000000000000
>>>>>>>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>>>>> CR2: 0000564787a30410 CR3: 000000010d4e2000 CR4: 00000000000006f0
>>>>>>>>>> Call Trace:
>>>>>>>>>>      <TASK>
>>>>>>>>>>      unpoison_memory+0x2f3/0x590
>>>>>>>>>>      simple_attr_write_xsigned.constprop.0.isra.0+0xb3/0x110
>>>>>>>>>>      debugfs_attr_write+0x42/0x60
>>>>>>>>>>      full_proxy_write+0x5b/0x80
>>>>>>>>>>      vfs_write+0xd5/0x540
>>>>>>>>>>      ksys_write+0x64/0xe0
>>>>>>>>>>      do_syscall_64+0xb9/0x1d0
>>>>>>>>>>      entry_SYSCALL_64_after_hwframe+0x77/0x7f
>>>>>>>>>> RIP: 0033:0x7f08f0314887
>>>>>>>>>> RSP: 002b:00007ffece710078 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
>>>>>>>>>> RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 00007f08f0314887
>>>>>>>>>> RDX: 0000000000000009 RSI: 0000564787a30410 RDI: 0000000000000001
>>>>>>>>>> RBP: 0000564787a30410 R08: 000000000000fefe R09: 000000007fffffff
>>>>>>>>>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000009
>>>>>>>>>> R13: 00007f08f041b780 R14: 00007f08f0417600 R15: 00007f08f0416a00
>>>>>>>>>>      </TASK>
>>>>>>>>>> Modules linked in: hwpoison_inject
>>>>>>>>>> ---[ end trace 0000000000000000 ]---
>>>>>>>>>> RIP: 0010:unpoison_memory+0x2f3/0x590
>>>>>>>>>> RSP: 0018:ffffa57fc8787d60 EFLAGS: 00000246
>>>>>>>>>> RAX: 0000000000000037 RBX: 0000000000000009 RCX: ffff9be25fcdc9c8
>>>>>>>>>> RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff9be25fcdc9c0
>>>>>>>>>> RBP: 0000000000300000 R08: ffffffffb4956f88 R09: 0000000000009ffb
>>>>>>>>>> R10: 0000000000000284 R11: ffffffffb4926fa0 R12: ffffe6b00c000000
>>>>>>>>>> R13: ffff9bdb453dfd00 R14: 0000000000000000 R15: fffffffffffffffe
>>>>>>>>>> FS:  00007f08f04e4740(0000) GS:ffff9be25fcc0000(0000) knlGS:0000000000000000
>>>>>>>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>>>>> CR2: 0000564787a30410 CR3: 000000010d4e2000 CR4: 00000000000006f0
>>>>>>>>>> Kernel panic - not syncing: Fatal exception
>>>>>>>>>> Kernel Offset: 0x31c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>>>>>>>>> ---[ end Kernel panic - not syncing: Fatal exception ]---
>>>>>>>>>>
>>>>>>>>>> The root cause is that unpoison_memory() tries to check the PG_HWPoison
>>>>>>>>>> flags of an uninitialized page. So VM_BUG_ON_PAGE(PagePoisoned(page)) is
>>>>>>>>>> triggered.
>>>>>>>>>
>>>>>>>>> I'm not seeing the call path.  Is this BUG happening via
>>>>>>>>>
>>>>>>>>> static __always_inline void __ClearPage##uname(struct page *page)    \
>>>>>>>>> {                                    \
>>>>>>>>>        VM_BUG_ON_PAGE(!Page##uname(page), page);            \
>>>>>>>>>        page->page_type |= PG_##lname;                    \
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> ?
>>>>>>>>>
>>>>>>>>> If so, where's the callsite?
>>>>>>>>
>>>>>>>> It is BUG on PF_ANY():
>>>>>>>>
>>>>>>>> PAGEFLAG(HWPoison, hwpoison, PF_ANY)
>>>>>>>>
>>>>>>>> #define PF_ANY(page, enforce)    PF_POISONED_CHECK(page)
>>>>>>>>
>>>>>>>> #define PF_POISONED_CHECK(page) ({                    \
>>>>>>>>        VM_BUG_ON_PGFLAGS(PagePoisoned(page), page);        \
>>>>>>>>        page; })
>>>>>>>>
>>>>>>>> #define    PAGE_POISON_PATTERN    -1l
>>>>>>>> static inline int PagePoisoned(const struct page *page)
>>>>>>>> {
>>>>>>>>        return READ_ONCE(page->flags) == PAGE_POISON_PATTERN;
>>>>>>>> }
>>>>>>>>
>>>>>>>> The offlined pages will have page->flags set to PAGE_POISON_PATTERN while pfn is still valid:
>>>>>>>>
>>>>>>>> offline_pages
>>>>>>>>       remove_pfn_range_from_zone
>>>>>>>>         page_init_poison
>>>>>>>>           memset(page, PAGE_POISON_PATTERN, size);
>>>>>>>
>>>>>>> Worth noting that this happens after __offline_isolated_pages() marked the covering sections as offline.
>>>>>>>
>>>>>>> Are we missing a pfn_to_online_page() check somewhere, or are we racing with offlining code that marks the section offline?
>>>>>>
>>>>>> I was thinking about to use pfn_to_online_page() instead of pfn_to_page() in unpoison_memory() so we can get rid of offlined pages.
>>>>>> But there're ZONE_DEVICE pages. They're not-onlined too. And unpoison_memory() should work for them. So we can't simply use
>>>>>> pfn_to_online_page() in that. Or am I miss something?
>>>>>
>>>>> Right, pfn_to_online_page() does not detect ZONE_DEVICE. That has to be handled separately if pfn_to_online_page() would fail.
>>>>>
>>>>> ... which is what we do in memory_failure():
>>>>>
>>>>> p = pfn_to_online_page(pfn);
>>>>> if (!p) {
>>>>>       if (pfn_valid(pfn)) {
>>>>>           pgmap = get_dev_pagemap(pfn, NULL);
>>>>>           put_ref_page(pfn, flags);
>>>>>           if (pgmap) {
>>>>>               ...
>>>>>           }
>>>>>       }
>>>>>       ...
>>>>> }
>>>>
>>>> Yup, this will be a good alternative. But will it be better to simply check PagePoisoned() instead?
>>>
>>> The memmap of offline memory sections shall not be touched, so .... don't touch it ;)
>>>
>>> Especially because that PagePoisoned() check is non-sensical without poisoining-during-memmap-init. You would still work with memory in offline sections.
>>>
>>> I think the code is even wrong in that regard: we allow for memory offlining to work with HWPoisoned pages, see __offline_isolated_pages(). Staring at unpoison_memory(), we might be putting these pages back to the buddy? Which is completely wrong.
>>
>> I agree with you. Thanks for detailed explanation. :)
>> Thanks David.
> 
> So ... I assume there will be a new patch? :)

I was just back from my two-weeks holidays. ;) I will try to send a new version when possible.

Thanks.
.
diff mbox series

Patch

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 581d3e5c9117..8c765329829f 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2564,6 +2564,13 @@  int unpoison_memory(unsigned long pfn)
 		goto unlock_mutex;
 	}
 
+	if (PagePoisoned(p)) {
+		unpoison_pr_info("%#lx: page is uninitialized\n",
+				 pfn, &unpoison_rs);
+		ret = -EOPNOTSUPP;
+		goto unlock_mutex;
+	}
+
 	if (!PageHWPoison(p)) {
 		unpoison_pr_info("%#lx: page was already unpoisoned\n",
 				 pfn, &unpoison_rs);