Message ID | 20250113082718.1872494-3-mawupeng1@huawei.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: memory_failure: unmap poisoned filio during migrate properly | expand |
On 13.01.25 09:27, Wupeng Ma wrote: > From: Ma Wupeng <mawupeng1@huawei.com> > > Commit b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to > be offlined) add page poison checks in do_migrate_range in order to make > offline hwpoisoned page possible by introducing isolate_lru_page and > try_to_unmap for hwpoisoned page. However folio lock must be held before > calling try_to_unmap. Add it to fix this problem. > > Waring will be produced if filio is not locked during unmap: > > ------------[ cut here ]------------ > kernel BUG at ./include/linux/swapops.h:400! > Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP > Modules linked in: > CPU: 4 UID: 0 PID: 411 Comm: bash Tainted: G W 6.13.0-rc1-00016-g3c434c7ee82a-dirty #41 > Tainted: [W]=WARN > Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 > pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > pc : try_to_unmap_one+0xb08/0xd3c > lr : try_to_unmap_one+0x3dc/0xd3c > Call trace: > try_to_unmap_one+0xb08/0xd3c (P) > try_to_unmap_one+0x3dc/0xd3c (L) > rmap_walk_anon+0xdc/0x1f8 > rmap_walk+0x3c/0x58 > try_to_unmap+0x88/0x90 > unmap_poisoned_folio+0x30/0xa8 > do_migrate_range+0x4a0/0x568 > offline_pages+0x5a4/0x670 > memory_block_action+0x17c/0x374 > memory_subsys_offline+0x3c/0x78 > device_offline+0xa4/0xd0 > state_store+0x8c/0xf0 > dev_attr_store+0x18/0x2c > sysfs_kf_write+0x44/0x54 > kernfs_fop_write_iter+0x118/0x1a8 > vfs_write+0x3a8/0x4bc > ksys_write+0x6c/0xf8 > __arm64_sys_write+0x1c/0x28 > invoke_syscall+0x44/0x100 > el0_svc_common.constprop.0+0x40/0xe0 > do_el0_svc+0x1c/0x28 > el0_svc+0x30/0xd0 > el0t_64_sync_handler+0xc8/0xcc > el0t_64_sync+0x198/0x19c > Code: f9407be0 b5fff320 d4210000 17ffff97 (d4210000) > ---[ end trace 0000000000000000 ]--- > > Fixes: b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined") > Signed-off-by: Ma Wupeng <mawupeng1@huawei.com> > --- > mm/memory_hotplug.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 330668d37e44..9bedecfc3577 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1805,8 +1805,12 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) > (folio_test_large(folio) && folio_test_has_hwpoisoned(folio))) { > if (WARN_ON(folio_test_lru(folio))) > folio_isolate_lru(folio); > - if (folio_mapped(folio)) > + if (folio_mapped(folio)) { > + folio_lock(folio); > unmap_poisoned_folio(folio, TTU_IGNORE_MLOCK | TTU_HWPOISON); > + folio_unlock(folio); > + } The comment above says "have elevated reference counts", but I I wonder if this code could race with un-poisoning (although probably a rare event). If there is an elevated reference already, why not move that chunk after the folio_try_get() and just drop the comment that describes the implied magic? I mean, migration of hwpoison is dangerous either way, so that's not the biggest problem I guess :)
On 2025/1/13 20:35, David Hildenbrand wrote: > On 13.01.25 09:27, Wupeng Ma wrote: >> From: Ma Wupeng <mawupeng1@huawei.com> >> >> Commit b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to >> be offlined) add page poison checks in do_migrate_range in order to make >> offline hwpoisoned page possible by introducing isolate_lru_page and >> try_to_unmap for hwpoisoned page. However folio lock must be held before >> calling try_to_unmap. Add it to fix this problem. >> >> Waring will be produced if filio is not locked during unmap: >> >> ------------[ cut here ]------------ >> kernel BUG at ./include/linux/swapops.h:400! >> Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP >> Modules linked in: >> CPU: 4 UID: 0 PID: 411 Comm: bash Tainted: G W 6.13.0-rc1-00016-g3c434c7ee82a-dirty #41 >> Tainted: [W]=WARN >> Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 >> pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) >> pc : try_to_unmap_one+0xb08/0xd3c >> lr : try_to_unmap_one+0x3dc/0xd3c >> Call trace: >> try_to_unmap_one+0xb08/0xd3c (P) >> try_to_unmap_one+0x3dc/0xd3c (L) >> rmap_walk_anon+0xdc/0x1f8 >> rmap_walk+0x3c/0x58 >> try_to_unmap+0x88/0x90 >> unmap_poisoned_folio+0x30/0xa8 >> do_migrate_range+0x4a0/0x568 >> offline_pages+0x5a4/0x670 >> memory_block_action+0x17c/0x374 >> memory_subsys_offline+0x3c/0x78 >> device_offline+0xa4/0xd0 >> state_store+0x8c/0xf0 >> dev_attr_store+0x18/0x2c >> sysfs_kf_write+0x44/0x54 >> kernfs_fop_write_iter+0x118/0x1a8 >> vfs_write+0x3a8/0x4bc >> ksys_write+0x6c/0xf8 >> __arm64_sys_write+0x1c/0x28 >> invoke_syscall+0x44/0x100 >> el0_svc_common.constprop.0+0x40/0xe0 >> do_el0_svc+0x1c/0x28 >> el0_svc+0x30/0xd0 >> el0t_64_sync_handler+0xc8/0xcc >> el0t_64_sync+0x198/0x19c >> Code: f9407be0 b5fff320 d4210000 17ffff97 (d4210000) >> ---[ end trace 0000000000000000 ]--- >> >> Fixes: b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined") >> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com> >> --- >> mm/memory_hotplug.c | 6 +++++- >> 1 file changed, 5 insertions(+), 1 deletion(-) >> >> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >> index 330668d37e44..9bedecfc3577 100644 >> --- a/mm/memory_hotplug.c >> +++ b/mm/memory_hotplug.c >> @@ -1805,8 +1805,12 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) >> (folio_test_large(folio) && folio_test_has_hwpoisoned(folio))) { >> if (WARN_ON(folio_test_lru(folio))) >> folio_isolate_lru(folio); >> - if (folio_mapped(folio)) >> + if (folio_mapped(folio)) { >> + folio_lock(folio); >> unmap_poisoned_folio(folio, TTU_IGNORE_MLOCK | TTU_HWPOISON); >> + folio_unlock(folio); >> + } > > The comment above says "have elevated reference counts", but I I wonder if this code could race with un-poisoning (although probably a rare event). > > > If there is an elevated reference already, why not move that chunk after the folio_try_get() and just drop the comment that describes the implied magic? > > I mean, migration of hwpoison is dangerous either way, so that's not the biggest problem I guess :) AFAICT during memory_failure, the refcount will not be cleared for poisoned page in order to keep it away from free buddy pages. So move that chunk after the folio_try_get() do seems nice and poisoned free pages do seems weird to me. >
On 14.01.25 10:08, mawupeng wrote: > > > On 2025/1/13 20:35, David Hildenbrand wrote: >> On 13.01.25 09:27, Wupeng Ma wrote: >>> From: Ma Wupeng <mawupeng1@huawei.com> >>> >>> Commit b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to >>> be offlined) add page poison checks in do_migrate_range in order to make >>> offline hwpoisoned page possible by introducing isolate_lru_page and >>> try_to_unmap for hwpoisoned page. However folio lock must be held before >>> calling try_to_unmap. Add it to fix this problem. >>> >>> Waring will be produced if filio is not locked during unmap: >>> >>> ------------[ cut here ]------------ >>> kernel BUG at ./include/linux/swapops.h:400! >>> Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP >>> Modules linked in: >>> CPU: 4 UID: 0 PID: 411 Comm: bash Tainted: G W 6.13.0-rc1-00016-g3c434c7ee82a-dirty #41 >>> Tainted: [W]=WARN >>> Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 >>> pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) >>> pc : try_to_unmap_one+0xb08/0xd3c >>> lr : try_to_unmap_one+0x3dc/0xd3c >>> Call trace: >>> try_to_unmap_one+0xb08/0xd3c (P) >>> try_to_unmap_one+0x3dc/0xd3c (L) >>> rmap_walk_anon+0xdc/0x1f8 >>> rmap_walk+0x3c/0x58 >>> try_to_unmap+0x88/0x90 >>> unmap_poisoned_folio+0x30/0xa8 >>> do_migrate_range+0x4a0/0x568 >>> offline_pages+0x5a4/0x670 >>> memory_block_action+0x17c/0x374 >>> memory_subsys_offline+0x3c/0x78 >>> device_offline+0xa4/0xd0 >>> state_store+0x8c/0xf0 >>> dev_attr_store+0x18/0x2c >>> sysfs_kf_write+0x44/0x54 >>> kernfs_fop_write_iter+0x118/0x1a8 >>> vfs_write+0x3a8/0x4bc >>> ksys_write+0x6c/0xf8 >>> __arm64_sys_write+0x1c/0x28 >>> invoke_syscall+0x44/0x100 >>> el0_svc_common.constprop.0+0x40/0xe0 >>> do_el0_svc+0x1c/0x28 >>> el0_svc+0x30/0xd0 >>> el0t_64_sync_handler+0xc8/0xcc >>> el0t_64_sync+0x198/0x19c >>> Code: f9407be0 b5fff320 d4210000 17ffff97 (d4210000) >>> ---[ end trace 0000000000000000 ]--- >>> >>> Fixes: b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined") >>> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com> >>> --- >>> mm/memory_hotplug.c | 6 +++++- >>> 1 file changed, 5 insertions(+), 1 deletion(-) >>> >>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >>> index 330668d37e44..9bedecfc3577 100644 >>> --- a/mm/memory_hotplug.c >>> +++ b/mm/memory_hotplug.c >>> @@ -1805,8 +1805,12 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) >>> (folio_test_large(folio) && folio_test_has_hwpoisoned(folio))) { >>> if (WARN_ON(folio_test_lru(folio))) >>> folio_isolate_lru(folio); >>> - if (folio_mapped(folio)) >>> + if (folio_mapped(folio)) { >>> + folio_lock(folio); >>> unmap_poisoned_folio(folio, TTU_IGNORE_MLOCK | TTU_HWPOISON); >>> + folio_unlock(folio); >>> + } >> >> The comment above says "have elevated reference counts", but I I wonder if this code could race with un-poisoning (although probably a rare event). >> >> >> If there is an elevated reference already, why not move that chunk after the folio_try_get() and just drop the comment that describes the implied magic? >> >> I mean, migration of hwpoison is dangerous either way, so that's not the biggest problem I guess :) > > AFAICT during memory_failure, the refcount will not be cleared for poisoned page in order to keep it away from free buddy pages. > So move that chunk after the folio_try_get() do seems nice and poisoned free pages do seems weird to me. If the "free" poisoned pages have a raised refcount, folio_try_get() would be able to grab them to essentially skip them (not mapped). If the "free" poisoned pages don't have a raised refcount (impossible right now IIRC), folio_try_get() would skip, them, which would be the right thing to do. So I think that should work.
On 2025/1/13 20:35, David Hildenbrand wrote: > On 13.01.25 09:27, Wupeng Ma wrote: >> From: Ma Wupeng <mawupeng1@huawei.com> >> >> Commit b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to >> be offlined) add page poison checks in do_migrate_range in order to make >> offline hwpoisoned page possible by introducing isolate_lru_page and >> try_to_unmap for hwpoisoned page. However folio lock must be held before >> calling try_to_unmap. Add it to fix this problem. >> >> Waring will be produced if filio is not locked during unmap: >> >> ------------[ cut here ]------------ >> kernel BUG at ./include/linux/swapops.h:400! >> Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP >> Modules linked in: >> CPU: 4 UID: 0 PID: 411 Comm: bash Tainted: G W 6.13.0-rc1-00016-g3c434c7ee82a-dirty #41 >> Tainted: [W]=WARN >> Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 >> pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) >> pc : try_to_unmap_one+0xb08/0xd3c >> lr : try_to_unmap_one+0x3dc/0xd3c >> Call trace: >> try_to_unmap_one+0xb08/0xd3c (P) >> try_to_unmap_one+0x3dc/0xd3c (L) >> rmap_walk_anon+0xdc/0x1f8 >> rmap_walk+0x3c/0x58 >> try_to_unmap+0x88/0x90 >> unmap_poisoned_folio+0x30/0xa8 >> do_migrate_range+0x4a0/0x568 >> offline_pages+0x5a4/0x670 >> memory_block_action+0x17c/0x374 >> memory_subsys_offline+0x3c/0x78 >> device_offline+0xa4/0xd0 >> state_store+0x8c/0xf0 >> dev_attr_store+0x18/0x2c >> sysfs_kf_write+0x44/0x54 >> kernfs_fop_write_iter+0x118/0x1a8 >> vfs_write+0x3a8/0x4bc >> ksys_write+0x6c/0xf8 >> __arm64_sys_write+0x1c/0x28 >> invoke_syscall+0x44/0x100 >> el0_svc_common.constprop.0+0x40/0xe0 >> do_el0_svc+0x1c/0x28 >> el0_svc+0x30/0xd0 >> el0t_64_sync_handler+0xc8/0xcc >> el0t_64_sync+0x198/0x19c >> Code: f9407be0 b5fff320 d4210000 17ffff97 (d4210000) >> ---[ end trace 0000000000000000 ]--- >> >> Fixes: b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined") >> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com> >> --- >> mm/memory_hotplug.c | 6 +++++- >> 1 file changed, 5 insertions(+), 1 deletion(-) >> >> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >> index 330668d37e44..9bedecfc3577 100644 >> --- a/mm/memory_hotplug.c >> +++ b/mm/memory_hotplug.c >> @@ -1805,8 +1805,12 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) >> (folio_test_large(folio) && folio_test_has_hwpoisoned(folio))) { >> if (WARN_ON(folio_test_lru(folio))) >> folio_isolate_lru(folio); >> - if (folio_mapped(folio)) >> + if (folio_mapped(folio)) { >> + folio_lock(folio); >> unmap_poisoned_folio(folio, TTU_IGNORE_MLOCK | TTU_HWPOISON); >> + folio_unlock(folio); >> + } > > The comment above says "have elevated reference counts", but I I wonder if this code could race with un-poisoning (although probably a rare event). AFAICS, this will open the race window with unpoisoning: unpoison_memory do_migrate_range folio_try_get if (folio_ref_count(folio) > 1) return -EBUSY; But this should be benign one and userspace can simply retry later. Thanks. .
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 330668d37e44..9bedecfc3577 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1805,8 +1805,12 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) (folio_test_large(folio) && folio_test_has_hwpoisoned(folio))) { if (WARN_ON(folio_test_lru(folio))) folio_isolate_lru(folio); - if (folio_mapped(folio)) + if (folio_mapped(folio)) { + folio_lock(folio); unmap_poisoned_folio(folio, TTU_IGNORE_MLOCK | TTU_HWPOISON); + folio_unlock(folio); + } + continue; }