Message ID | 20200624150137.7052-1-nao.horiguchi@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | HWPOISON: soft offline rework | expand |
On Wed, 24 Jun 2020 15:01:22 +0000 nao.horiguchi@gmail.com wrote: > I rebased soft-offline rework patchset [1][2] onto the latest mmotm. The > rebasing required some non-trivial changes to adjust, but mainly that was > straightforward. I confirmed that the reported problem doesn't reproduce on > compaction after soft offline. For more precise description of the problem > and the motivation of this patchset, please see [2]. > > I think that the following two patches in v2 are better to be done with > separate work of hard-offline rework, so it's not included in this series. > > - mm,hwpoison: Take pages off the buddy when hard-offlining > - mm/hwpoison-inject: Rip off duplicated checks > > These two are not directly related to the reported problem, so they seems > not urgent. And the first one breaks num_poisoned_pages counting in some > testcases, and The second patch needs more consideration about commented point. > It would be nice to have some sort of overview of the patch series in this [0/n] email. > [1] v1: https://lore.kernel.org/linux-mm/1541746035-13408-1-git-send-email-n-horiguchi@ah.jp.nec.com/ > [2] v2: https://lore.kernel.org/linux-mm/20191017142123.24245-1-osalvador@suse.de/ The above have such, but are they up to date?
On Wed, Jun 24, 2020 at 12:17:42PM -0700, Andrew Morton wrote: > On Wed, 24 Jun 2020 15:01:22 +0000 nao.horiguchi@gmail.com wrote: > > > I rebased soft-offline rework patchset [1][2] onto the latest mmotm. The > > rebasing required some non-trivial changes to adjust, but mainly that was > > straightforward. I confirmed that the reported problem doesn't reproduce on > > compaction after soft offline. For more precise description of the problem > > and the motivation of this patchset, please see [2]. > > > > I think that the following two patches in v2 are better to be done with > > separate work of hard-offline rework, so it's not included in this series. > > > > - mm,hwpoison: Take pages off the buddy when hard-offlining > > - mm/hwpoison-inject: Rip off duplicated checks > > > > These two are not directly related to the reported problem, so they seems > > not urgent. And the first one breaks num_poisoned_pages counting in some > > testcases, and The second patch needs more consideration about commented point. > > > > It would be nice to have some sort of overview of the patch series in > this [0/n] email. > > > [1] v1: https://lore.kernel.org/linux-mm/1541746035-13408-1-git-send-email-n-horiguchi@ah.jp.nec.com/ > > [2] v2: https://lore.kernel.org/linux-mm/20191017142123.24245-1-osalvador@suse.de/ > > The above have such, but are they up to date? The description of the problem doesn't change, but there're some new patches and some patches are postponed, so I should've added an overview of this series: - patch 1, 2 are cleanups. - patch 3, 4, 5 change the precondition when calling memory_failure(). Previously we sometimes call it with holding refcount of the target page and somtimes call without holding it, and we passed a flag of whether refcount was taken out of memory_failure(). It was confusing and caused code more complex than needed. - patch 6-10 are cleanups. - patch 11 introduces new logic to remove the error page from buddy allocator, which is also applied to the path of soft-offling in-use pages in patch 12. - patch 13 is basically a refactoring but I added some adjustment to make sure that the freed page is surely sent back to buddy instead of being kept in pcplist, which is based on discussion in v2. - patch 14 fixes the inconsistency of return values between injection interfaces. - patch 15 is a new patch to complement missing code found in code review for previous version. Core change is in patch 11 and 12, and the others are kind of cleanup/refactoring. Thanks, Naoya Horiguchi
On Wed, 24 Jun 2020 22:36:18 +0000 HORIGUCHI NAOYA(堀口 直也) <naoya.horiguchi@nec.com> wrote: > On Wed, Jun 24, 2020 at 12:17:42PM -0700, Andrew Morton wrote: > > On Wed, 24 Jun 2020 15:01:22 +0000 nao.horiguchi@gmail.com wrote: > > > > > I rebased soft-offline rework patchset [1][2] onto the latest mmotm. The > > > rebasing required some non-trivial changes to adjust, but mainly that was > > > straightforward. I confirmed that the reported problem doesn't reproduce on > > > compaction after soft offline. For more precise description of the problem > > > and the motivation of this patchset, please see [2]. > > > > > > I think that the following two patches in v2 are better to be done with > > > separate work of hard-offline rework, so it's not included in this series. > > > > > > - mm,hwpoison: Take pages off the buddy when hard-offlining > > > - mm/hwpoison-inject: Rip off duplicated checks > > > > > > These two are not directly related to the reported problem, so they seems > > > not urgent. And the first one breaks num_poisoned_pages counting in some > > > testcases, and The second patch needs more consideration about commented point. > > > > > > > It would be nice to have some sort of overview of the patch series in > > this [0/n] email. > > > > > [1] v1: https://lore.kernel.org/linux-mm/1541746035-13408-1-git-send-email-n-horiguchi@ah.jp.nec.com/ > > > [2] v2: https://lore.kernel.org/linux-mm/20191017142123.24245-1-osalvador@suse.de/ > > > > The above have such, but are they up to date? > > The description of the problem doesn't change, but there're some new patches > and some patches are postponed, so I should've added an overview of this series: > > - patch 1, 2 are cleanups. > - patch 3, 4, 5 change the precondition when calling memory_failure(). Previously > we sometimes call it with holding refcount of the target page and somtimes call > without holding it, and we passed a flag of whether refcount was taken out of > memory_failure(). It was confusing and caused code more complex than needed. > - patch 6-10 are cleanups. > - patch 11 introduces new logic to remove the error page from buddy allocator, > which is also applied to the path of soft-offling in-use pages in patch 12. > - patch 13 is basically a refactoring but I added some adjustment to make sure > that the freed page is surely sent back to buddy instead of being kept in pcplist, > which is based on discussion in v2. > - patch 14 fixes the inconsistency of return values between injection interfaces. > - patch 15 is a new patch to complement missing code found in code review for > previous version. > > Core change is in patch 11 and 12, and the others are kind of cleanup/refactoring. And all the other words in https://lore.kernel.org/linux-mm/1541746035-13408-1-git-send-email-n-horiguchi@ah.jp.nec.com/ are still accurate and complete?
On Wed, Jun 24, 2020 at 03:49:47PM -0700, Andrew Morton wrote: > On Wed, 24 Jun 2020 22:36:18 +0000 HORIGUCHI NAOYA(堀口 直也) <naoya.horiguchi@nec.com> wrote: > > > On Wed, Jun 24, 2020 at 12:17:42PM -0700, Andrew Morton wrote: > > > On Wed, 24 Jun 2020 15:01:22 +0000 nao.horiguchi@gmail.com wrote: > > > > > > > I rebased soft-offline rework patchset [1][2] onto the latest mmotm. The > > > > rebasing required some non-trivial changes to adjust, but mainly that was > > > > straightforward. I confirmed that the reported problem doesn't reproduce on > > > > compaction after soft offline. For more precise description of the problem > > > > and the motivation of this patchset, please see [2]. > > > > > > > > I think that the following two patches in v2 are better to be done with > > > > separate work of hard-offline rework, so it's not included in this series. > > > > > > > > - mm,hwpoison: Take pages off the buddy when hard-offlining > > > > - mm/hwpoison-inject: Rip off duplicated checks > > > > > > > > These two are not directly related to the reported problem, so they seems > > > > not urgent. And the first one breaks num_poisoned_pages counting in some > > > > testcases, and The second patch needs more consideration about commented point. > > > > > > > > > > It would be nice to have some sort of overview of the patch series in > > > this [0/n] email. > > > > > > > [1] v1: https://lore.kernel.org/linux-mm/1541746035-13408-1-git-send-email-n-horiguchi@ah.jp.nec.com/ > > > > [2] v2: https://lore.kernel.org/linux-mm/20191017142123.24245-1-osalvador@suse.de/ > > > > > > The above have such, but are they up to date? > > > > The description of the problem doesn't change, but there're some new patches > > and some patches are postponed, so I should've added an overview of this series: > > > > - patch 1, 2 are cleanups. > > - patch 3, 4, 5 change the precondition when calling memory_failure(). Previously > > we sometimes call it with holding refcount of the target page and somtimes call > > without holding it, and we passed a flag of whether refcount was taken out of > > memory_failure(). It was confusing and caused code more complex than needed. > > - patch 6-10 are cleanups. > > - patch 11 introduces new logic to remove the error page from buddy allocator, > > which is also applied to the path of soft-offling in-use pages in patch 12. > > - patch 13 is basically a refactoring but I added some adjustment to make sure > > that the freed page is surely sent back to buddy instead of being kept in pcplist, > > which is based on discussion in v2. > > - patch 14 fixes the inconsistency of return values between injection interfaces. > > - patch 15 is a new patch to complement missing code found in code review for > > previous version. > > > > Core change is in patch 11 and 12, and the others are kind of cleanup/refactoring. > > And all the other words in > https://lore.kernel.org/linux-mm/1541746035-13408-1-git-send-email-n-horiguchi@ah.jp.nec.com/ > are still accurate and complete? Yes, they are. - Naoya
On Wed, Jun 24, 2020 at 03:01:22PM +0000, nao.horiguchi@gmail.com wrote: > I rebased soft-offline rework patchset [1][2] onto the latest mmotm. The > rebasing required some non-trivial changes to adjust, but mainly that was > straightforward. I confirmed that the reported problem doesn't reproduce on > compaction after soft offline. For more precise description of the problem > and the motivation of this patchset, please see [2]. > > I think that the following two patches in v2 are better to be done with > separate work of hard-offline rework, so it's not included in this series. > > - mm,hwpoison: Take pages off the buddy when hard-offlining > - mm/hwpoison-inject: Rip off duplicated checks > > These two are not directly related to the reported problem, so they seems > not urgent. And the first one breaks num_poisoned_pages counting in some > testcases, and The second patch needs more consideration about commented point. > > Any comment/suggestion/help would be appreciated. Next-20200626 failed to compile due to this series. Reverting the whole thing [1] will fix the issue below right away, mm/memory-failure.c: In function ‘__soft_offline_page’: mm/memory-failure.c:1827:3: error: implicit declaration of function ‘page_handle_poison’; did you mean ‘page_init_poison’? [-Werror=implicit-function-declaration] page_handle_poison(page, false, true); ^~~~~~~~~~~~~~~~~~ page_init_poison .config used, https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config https://raw.githubusercontent.com/cailca/linux-mm/master/arm64.config [1] git revert --no-edit f296ba1d3a07..0bd1762119e9 > > [1] v1: https://lore.kernel.org/linux-mm/1541746035-13408-1-git-send-email-n-horiguchi@ah.jp.nec.com/ > [2] v2: https://lore.kernel.org/linux-mm/20191017142123.24245-1-osalvador@suse.de/ > > Thanks, > Naoya Horiguchi > --- > Summary: > > Naoya Horiguchi (7): > mm,hwpoison: cleanup unused PageHuge() check > mm, hwpoison: remove recalculating hpage > mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED > mm,hwpoison-inject: don't pin for hwpoison_filter > mm,hwpoison: remove MF_COUNT_INCREASED > mm,hwpoison: remove flag argument from soft offline functions > mm,hwpoison: introduce MF_MSG_UNSPLIT_THP > > Oscar Salvador (8): > mm,madvise: Refactor madvise_inject_error > mm,hwpoison: Un-export get_hwpoison_page and make it static > mm,hwpoison: Kill put_hwpoison_page > mm,hwpoison: Unify THP handling for hard and soft offline > mm,hwpoison: Rework soft offline for free pages > mm,hwpoison: Rework soft offline for in-use pages > mm,hwpoison: Refactor soft_offline_huge_page and __soft_offline_page > mm,hwpoison: Return 0 if the page is already poisoned in soft-offline > > drivers/base/memory.c | 2 +- > include/linux/mm.h | 12 +- > include/linux/page-flags.h | 6 +- > include/ras/ras_event.h | 3 + > mm/hwpoison-inject.c | 18 +-- > mm/madvise.c | 39 +++--- > mm/memory-failure.c | 331 ++++++++++++++++++++------------------------- > mm/migrate.c | 11 +- > mm/page_alloc.c | 63 +++++++-- > 9 files changed, 233 insertions(+), 252 deletions(-) >
On Wed, 2020-06-24 at 15:01 +0000, nao.horiguchi@gmail.com wrote: > I rebased soft-offline rework patchset [1][2] onto the latest > mmotm. The > rebasing required some non-trivial changes to adjust, but mainly that > was > straightforward. I confirmed that the reported problem doesn't > reproduce on > compaction after soft offline. For more precise description of the > problem > and the motivation of this patchset, please see [2]. Hi Naoya, Thanks for dusting this off. To be honest, I got stuck with the hard offline mode so this delayed the resubmission, along other problems. > I think that the following two patches in v2 are better to be done > with > separate work of hard-offline rework, so it's not included in this > series. > > - mm,hwpoison: Take pages off the buddy when hard-offlining > - mm/hwpoison-inject: Rip off duplicated checks > > These two are not directly related to the reported problem, so they > seems > not urgent. And the first one breaks num_poisoned_pages counting in > some > testcases, and The second patch needs more consideration about > commented point. I fully agree. > Any comment/suggestion/help would be appreciated. My "new" version included a patch to make sure we give a chance to pages that possibly are in a pcplist. Current behavior is that if someone tries to soft-offline such a page, we return an error because page count is 0 but page is not in the buddy system. Since this patchset already landed in the mm tree, I could send it as a standalone patch on top if you agree with it. My patch looked something like: From: Oscar Salvador <osalvador@suse.de> Date: Mon, 29 Jun 2020 12:25:11 +0200 Subject: [PATCH] mm,hwpoison: Drain pcplists before bailing out for non-buddy zero-refcount page A page with 0-refcount and !PageBuddy could perfectly be a pcppage. Currently, we bail out with an error if we encounter such a page, meaning that we do not give a chance to handle pcppages. Fix this by draining pcplists whenever we find this kind of page and retry the check again. It might be that pcplists have been spilled into the buddy allocator and so we can handle it. Signed-off-by: Oscar Salvador <osalvador@suse.de> --- mm/memory-failure.c | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index e90ddddab397..3aac3f1eeed0 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -958,7 +958,7 @@ static int page_action(struct page_state *ps, struct page *p, * Return: return 0 if failed to grab the refcount, otherwise true (some * non-zero value.) */ -static int get_hwpoison_page(struct page *page) +static int __get_hwpoison_page(struct page *page) { struct page *head = compound_head(page); @@ -988,6 +988,28 @@ static int get_hwpoison_page(struct page *page) return 0; } +static int get_hwpoison_page(struct page *p) +{ + int ret; + bool drained = false; + +retry: + ret = __get_hwpoison_page(p); + if (!ret) { + if (!is_free_buddy_page(p) && !page_count(p) && !drained) { + /* + * The page might be in a pcplist, so try to drain + * those and see if we are lucky. + */ + drain_all_pages(page_zone(p)); + drained = true; + goto retry; + } + } + + return ret; +} + /* * Do all that is necessary to remove user space mappings. Unmap * the pages and send SIGBUS to the processes if the data was dirty.
On Wed, Jun 24, 2020 at 03:01:22PM +0000, nao.horiguchi@gmail.com wrote: > I rebased soft-offline rework patchset [1][2] onto the latest mmotm. The > rebasing required some non-trivial changes to adjust, but mainly that was > straightforward. I confirmed that the reported problem doesn't reproduce on > compaction after soft offline. For more precise description of the problem > and the motivation of this patchset, please see [2]. > > I think that the following two patches in v2 are better to be done with > separate work of hard-offline rework, so it's not included in this series. > > - mm,hwpoison: Take pages off the buddy when hard-offlining > - mm/hwpoison-inject: Rip off duplicated checks > > These two are not directly related to the reported problem, so they seems > not urgent. And the first one breaks num_poisoned_pages counting in some > testcases, and The second patch needs more consideration about commented point. > > Any comment/suggestion/help would be appreciated. Even after applied the compling fix, https://lore.kernel.org/linux-mm/20200628065409.GA546944@u2004/ madvise(MADV_SOFT_OFFLINE) will fail with EIO with hugetlb where it would succeed without this series. Steps: # git clone https://github.com/cailca/linux-mm # cd linux-mm; make # ./random 1 (Need at least two NUMA memory nodes) start: migrate_huge_offline - use NUMA nodes 0,4. - mmap and free 8388608 bytes hugepages on node 0 - mmap and free 8388608 bytes hugepages on node 4 madvise: Input/output error (x86.config is also included there.) [10718.158548][T13039] __get_any_page: 0x1d1600 free huge page [10718.165684][T13039] Soft offlining pfn 0x1d1600 at process virtual address 0x7f1dd2000000 [10718.175061][T13039] Soft offlining pfn 0x154c00 at process virtual address 0x7f1dd2200000 [10718.185492][T13039] Soft offlining pfn 0x137c00 at process virtual address 0x7f1dd2000000 [10718.195209][T13039] Soft offlining pfn 0x4c7a00 at process virtual address 0x7f1dd2200000 [10718.203483][T13039] soft offline: 0x4c7a00: hugepage isolation failed: 0, page count 2, type bfffc00001000f (locked|referenced|uptodate|dirty|head) [10718.218228][T13039] Soft offlining pfn 0x4c7a00 at process virtual address 0x7f1dd2000000 [10718.227522][T13039] Soft offlining pfn 0x2da800 at process virtual address 0x7f1dd2200000 [10718.238503][T13039] Soft offlining pfn 0x1de200 at process virtual address 0x7f1dd2000000 [10718.247822][T13039] Soft offlining pfn 0x155c00 at process virtual address 0x7f1dd2200000 [10718.259031][T13039] Soft offlining pfn 0x203600 at process virtual address 0x7f1dd2000000 [10718.268504][T13039] Soft offlining pfn 0x417600 at process virtual address 0x7f1dd2200000 [10718.278830][T13039] Soft offlining pfn 0x20a600 at process virtual address 0x7f1dd2000000 [10718.288133][T13039] Soft offlining pfn 0x1d0800 at process virtual address 0x7f1dd2200000 [10718.299198][T13039] Soft offlining pfn 0x3e5800 at process virtual address 0x7f1dd2000000 [10718.308593][T13039] Soft offlining pfn 0x21f200 at process virtual address 0x7f1dd2200000 [10718.319725][T13039] Soft offlining pfn 0x18c600 at process virtual address 0x7f1dd2000000 [10718.329301][T13039] Soft offlining pfn 0x396a00 at process virtual address 0x7f1dd2200000 [10718.378882][T13039] Soft offlining pfn 0x4d5000 at process virtual address 0x7f1dd2000000 [10718.388415][T13039] Soft offlining pfn 0x4e5000 at process virtual address 0x7f1dd2200000 [10718.398807][T13039] Soft offlining pfn 0x2f5000 at process virtual address 0x7f1dd2000000 [10718.408236][T13039] Soft offlining pfn 0x50b400 at process virtual address 0x7f1dd2200000 [10718.419781][T13039] Soft offlining pfn 0x396800 at process virtual address 0x7f1dd2000000 [10718.429677][T13039] Soft offlining pfn 0xd69c00 at process virtual address 0x7f1dd2200000 [10718.440435][T13039] Soft offlining pfn 0x21f000 at process virtual address 0x7f1dd2000000 [10718.450341][T13039] Soft offlining pfn 0x399400 at process virtual address 0x7f1dd2200000 [10718.458768][T13039] __get_any_page: 0x399400: unknown zero refcount page type bfffc000000000 The main part is, https://github.com/cailca/linux-mm/blob/master/random.c#L372 > > [1] v1: https://lore.kernel.org/linux-mm/1541746035-13408-1-git-send-email-n-horiguchi@ah.jp.nec.com/ > [2] v2: https://lore.kernel.org/linux-mm/20191017142123.24245-1-osalvador@suse.de/ > > Thanks, > Naoya Horiguchi > --- > Summary: > > Naoya Horiguchi (7): > mm,hwpoison: cleanup unused PageHuge() check > mm, hwpoison: remove recalculating hpage > mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED > mm,hwpoison-inject: don't pin for hwpoison_filter > mm,hwpoison: remove MF_COUNT_INCREASED > mm,hwpoison: remove flag argument from soft offline functions > mm,hwpoison: introduce MF_MSG_UNSPLIT_THP > > Oscar Salvador (8): > mm,madvise: Refactor madvise_inject_error > mm,hwpoison: Un-export get_hwpoison_page and make it static > mm,hwpoison: Kill put_hwpoison_page > mm,hwpoison: Unify THP handling for hard and soft offline > mm,hwpoison: Rework soft offline for free pages > mm,hwpoison: Rework soft offline for in-use pages > mm,hwpoison: Refactor soft_offline_huge_page and __soft_offline_page > mm,hwpoison: Return 0 if the page is already poisoned in soft-offline > > drivers/base/memory.c | 2 +- > include/linux/mm.h | 12 +- > include/linux/page-flags.h | 6 +- > include/ras/ras_event.h | 3 + > mm/hwpoison-inject.c | 18 +-- > mm/madvise.c | 39 +++--- > mm/memory-failure.c | 331 ++++++++++++++++++++------------------------- > mm/migrate.c | 11 +- > mm/page_alloc.c | 63 +++++++-- > 9 files changed, 233 insertions(+), 252 deletions(-) >
On Tue, 2020-06-30 at 01:08 -0400, Qian Cai wrote: > On Wed, Jun 24, 2020 at 03:01:22PM +0000, nao.horiguchi@gmail.com > wrote: > > I rebased soft-offline rework patchset [1][2] onto the latest > > mmotm. The > > rebasing required some non-trivial changes to adjust, but mainly > > that was > > straightforward. I confirmed that the reported problem doesn't > > reproduce on > > compaction after soft offline. For more precise description of the > > problem > > and the motivation of this patchset, please see [2]. > > > > I think that the following two patches in v2 are better to be done > > with > > separate work of hard-offline rework, so it's not included in this > > series. > > > > - mm,hwpoison: Take pages off the buddy when hard-offlining > > - mm/hwpoison-inject: Rip off duplicated checks > > > > These two are not directly related to the reported problem, so they > > seems > > not urgent. And the first one breaks num_poisoned_pages counting > > in some > > testcases, and The second patch needs more consideration about > > commented point. > > > > Any comment/suggestion/help would be appreciated. > > Even after applied the compling fix, > > https://lore.kernel.org/linux-mm/20200628065409.GA546944@u2004/ > > madvise(MADV_SOFT_OFFLINE) will fail with EIO with hugetlb where it > would succeed without this series. Steps: > > # git clone https://github.com/cailca/linux-mm > # cd linux-mm; make > # ./random 1 (Need at least two NUMA memory nodes) > start: migrate_huge_offline > - use NUMA nodes 0,4. > - mmap and free 8388608 bytes hugepages on node 0 > - mmap and free 8388608 bytes hugepages on node 4 > madvise: Input/output error I think I know why. It's been a while since I took a look, but I compared the posted patchset with my newest patchset I had ready and I saw I made some changes with regard of hugetlb pages. I will be taking a look, although it might be better to re-post the patchset instead of adding a fix on top since the changes are a bit substantial. Thanks for reporting.
On Mon, Jun 29, 2020 at 12:29:25PM +0200, Oscar Salvador wrote: > On Wed, 2020-06-24 at 15:01 +0000, nao.horiguchi@gmail.com wrote: > > I rebased soft-offline rework patchset [1][2] onto the latest > > mmotm. The > > rebasing required some non-trivial changes to adjust, but mainly that > > was > > straightforward. I confirmed that the reported problem doesn't > > reproduce on > > compaction after soft offline. For more precise description of the > > problem > > and the motivation of this patchset, please see [2]. > > Hi Naoya, > > Thanks for dusting this off. > To be honest, I got stuck with the hard offline mode so this delayed > the resubmission, along other problems. > > > I think that the following two patches in v2 are better to be done > > with > > separate work of hard-offline rework, so it's not included in this > > series. > > > > - mm,hwpoison: Take pages off the buddy when hard-offlining > > - mm/hwpoison-inject: Rip off duplicated checks > > > > These two are not directly related to the reported problem, so they > > seems > > not urgent. And the first one breaks num_poisoned_pages counting in > > some > > testcases, and The second patch needs more consideration about > > commented point. > > I fully agree. > > > Any comment/suggestion/help would be appreciated. > > My "new" version included a patch to make sure we give a chance to > pages that possibly are in a pcplist. > Current behavior is that if someone tries to soft-offline such a page, > we return an error because page count is 0 but page is not in the buddy > system. > > Since this patchset already landed in the mm tree, I could send it as a > standalone patch on top if you agree with it. > > My patch looked something like: > > From: Oscar Salvador <osalvador@suse.de> > Date: Mon, 29 Jun 2020 12:25:11 +0200 > Subject: [PATCH] mm,hwpoison: Drain pcplists before bailing out for > non-buddy > zero-refcount page > > A page with 0-refcount and !PageBuddy could perfectly be a pcppage. > Currently, we bail out with an error if we encounter such a page, > meaning that we do not give a chance to handle pcppages. > > Fix this by draining pcplists whenever we find this kind of page > and retry the check again. > It might be that pcplists have been spilled into the buddy allocator > and so we can handle it. > > Signed-off-by: Oscar Salvador <osalvador@suse.de> I'm fine with this patch, so this will be with the next version with some minor change like moving function comment block together. Actually I feel we could refactor get_hwpoison_page() with get_any_page() to handle this kind of pcplist related issues in one place. But this could be more important in hard-offline reworking, and we don't have to do it in this series. Thanks, Naoya Horiguchi
On Tue, 2020-06-30 at 08:35 +0200, Oscar Salvador wrote: > > Even after applied the compling fix, > > > > https://lore.kernel.org/linux-mm/20200628065409.GA546944@u2004/ > > > > madvise(MADV_SOFT_OFFLINE) will fail with EIO with hugetlb where it > > would succeed without this series. Steps: > > > > # git clone https://github.com/cailca/linux-mm > > # cd linux-mm; make > > # ./random 1 (Need at least two NUMA memory nodes) > > start: migrate_huge_offline > > - use NUMA nodes 0,4. > > - mmap and free 8388608 bytes hugepages on node 0 > > - mmap and free 8388608 bytes hugepages on node 4 > > madvise: Input/output error > > I think I know why. > It's been a while since I took a look, but I compared the posted > patchset with my newest patchset I had ready and I saw I made some > changes with regard of hugetlb pages. > > I will be taking a look, although it might be better to re-post the > patchset instead of adding a fix on top since the changes are a bit > substantial. Yap, I just confirmed this. Posted version had some flaws wrt. hugetlb pages, that is why I was working on a v3. I am rebasing my v3 on top of the mm with the posted patchset reverted. I expect to post it on Friday.
On Wed, Jul 01, 2020 at 10:22:07AM +0200, Oscar Salvador wrote: > On Tue, 2020-06-30 at 08:35 +0200, Oscar Salvador wrote: > > > Even after applied the compling fix, > > > > > > https://lore.kernel.org/linux-mm/20200628065409.GA546944@u2004/ > > > > > > madvise(MADV_SOFT_OFFLINE) will fail with EIO with hugetlb where it > > > would succeed without this series. Steps: > > > > > > # git clone https://github.com/cailca/linux-mm > > > # cd linux-mm; make > > > # ./random 1 (Need at least two NUMA memory nodes) > > > start: migrate_huge_offline > > > - use NUMA nodes 0,4. > > > - mmap and free 8388608 bytes hugepages on node 0 > > > - mmap and free 8388608 bytes hugepages on node 4 > > > madvise: Input/output error > > > > I think I know why. > > It's been a while since I took a look, but I compared the posted > > patchset with my newest patchset I had ready and I saw I made some > > changes with regard of hugetlb pages. > > > > I will be taking a look, although it might be better to re-post the > > patchset instead of adding a fix on top since the changes are a bit > > substantial. > > Yap, I just confirmed this. I've reproduced the EIO, but still not sure how this happens ... > Posted version had some flaws wrt. hugetlb pages, that is why I was > working on a v3. > I am rebasing my v3 on top of the mm with the posted patchset reverted. OK, thank you for offering to work for v4, so I'll wait for it for testing. Thanks, Naoya Horiguchi
On Tue, Jun 30, 2020 at 01:08:03AM -0400, Qian Cai wrote: > Even after applied the compling fix, > > https://lore.kernel.org/linux-mm/20200628065409.GA546944@u2004/ > > madvise(MADV_SOFT_OFFLINE) will fail with EIO with hugetlb where it > would succeed without this series. Steps: > > # git clone https://github.com/cailca/linux-mm > # cd linux-mm; make > # ./random 1 (Need at least two NUMA memory nodes) > start: migrate_huge_offline > - use NUMA nodes 0,4. > - mmap and free 8388608 bytes hugepages on node 0 > - mmap and free 8388608 bytes hugepages on node 4 > madvise: Input/output error Ok, sorry for the lateness, but I had to re-fetch the code on my brain again. I just finished v4 of this patchset and it seems this problem is gone: # ./random 1 - start: migrate_huge_offline - use NUMA nodes 0,1. - mmap and free 8388608 bytes hugepages on node 0 - mmap and free 8388608 bytes hugepages on node 1 - pass: mmap_offline_node_huge - start: hotplug_memory offline: Device or resource busy offline: Device or resource busy offline: Device or resource busy offline: Device or resource busy offline: Device or resource busy offline: Device or resource busy offline: Device or resource busy offline: Device or resource busy offline: Invalid argument offline: Device or resource busy offline: Invalid argument offline: Device or resource busy offline: Device or resource busy offline: Device or resource busy offline: Device or resource busy - pass: hotplug_memory The test seems to suceed and no crash on the kernel side. I will just run some more tests to make sure the thing is solid enough and then I will post v4. Thanks