[v3,00/15] HWPOISON: soft offline rework

Message ID	20200624150137.7052-1-nao.horiguchi@gmail.com (mailing list archive)
Headers	show Return-Path: <SRS0=FJBn=AF=kvack.org=owner-linux-mm@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7BB31206EB From: nao.horiguchi@gmail.com To: linux-mm@kvack.org Cc: mhocko@kernel.org, akpm@linux-foundation.org, mike.kravetz@oracle.com, osalvador@suse.de, tony.luck@intel.com, david@redhat.com, aneesh.kumar@linux.vnet.ibm.com, zeil@yandex-team.ru, naoya.horiguchi@nec.com, linux-kernel@vger.kernel.org Subject: [PATCH v3 00/15] HWPOISON: soft offline rework Date: Wed, 24 Jun 2020 15:01:22 +0000 Message-Id: <20200624150137.7052-1-nao.horiguchi@gmail.com> Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	HWPOISON: soft offline rework \| expand [v3,00/15] HWPOISON: soft offline rework [v3,01/15] mm,hwpoison: cleanup unused PageHuge() check [v3,02/15] mm, hwpoison: remove recalculating hpage [v3,03/15] mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED [v3,04/15] mm,madvise: Refactor madvise_inject_error [v3,05/15] mm,hwpoison-inject: don't pin for hwpoison_filter [v3,06/15] mm,hwpoison: Un-export get_hwpoison_page and make it static [v3,07/15] mm,hwpoison: Kill put_hwpoison_page [v3,08/15] mm,hwpoison: remove MF_COUNT_INCREASED [v3,09/15] mm,hwpoison: remove flag argument from soft offline functions [v3,10/15] mm,hwpoison: Unify THP handling for hard and soft offline [v3,11/15] mm,hwpoison: Rework soft offline for free pages [v3,12/15] mm,hwpoison: Rework soft offline for in-use pages [v3,13/15] mm,hwpoison: Refactor soft_offline_huge_page and __soft_offline_page [v3,14/15] mm,hwpoison: Return 0 if the page is already poisoned in soft-offline [v3,15/15] mm,hwpoison: introduce MF_MSG_UNSPLIT_THP

Naoya Horiguchi June 24, 2020, 3:01 p.m. UTC

I rebased soft-offline rework patchset [1][2] onto the latest mmotm.  The
rebasing required some non-trivial changes to adjust, but mainly that was
straightforward.  I confirmed that the reported problem doesn't reproduce on
compaction after soft offline.  For more precise description of the problem
and the motivation of this patchset, please see [2].

I think that the following two patches in v2 are better to be done with
separate work of hard-offline rework, so it's not included in this series.

  - mm,hwpoison: Take pages off the buddy when hard-offlining
  - mm/hwpoison-inject: Rip off duplicated checks

These two are not directly related to the reported problem, so they seems
not urgent.  And the first one breaks num_poisoned_pages counting in some
testcases, and The second patch needs more consideration about commented point.

Any comment/suggestion/help would be appreciated.

[1] v1: https://lore.kernel.org/linux-mm/1541746035-13408-1-git-send-email-n-horiguchi@ah.jp.nec.com/
[2] v2: https://lore.kernel.org/linux-mm/20191017142123.24245-1-osalvador@suse.de/

Thanks,
Naoya Horiguchi
---
Summary:

Naoya Horiguchi (7):
      mm,hwpoison: cleanup unused PageHuge() check
      mm, hwpoison: remove recalculating hpage
      mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED
      mm,hwpoison-inject: don't pin for hwpoison_filter
      mm,hwpoison: remove MF_COUNT_INCREASED
      mm,hwpoison: remove flag argument from soft offline functions
      mm,hwpoison: introduce MF_MSG_UNSPLIT_THP

Oscar Salvador (8):
      mm,madvise: Refactor madvise_inject_error
      mm,hwpoison: Un-export get_hwpoison_page and make it static
      mm,hwpoison: Kill put_hwpoison_page
      mm,hwpoison: Unify THP handling for hard and soft offline
      mm,hwpoison: Rework soft offline for free pages
      mm,hwpoison: Rework soft offline for in-use pages
      mm,hwpoison: Refactor soft_offline_huge_page and __soft_offline_page
      mm,hwpoison: Return 0 if the page is already poisoned in soft-offline

 drivers/base/memory.c      |   2 +-
 include/linux/mm.h         |  12 +-
 include/linux/page-flags.h |   6 +-
 include/ras/ras_event.h    |   3 +
 mm/hwpoison-inject.c       |  18 +--
 mm/madvise.c               |  39 +++---
 mm/memory-failure.c        | 331 ++++++++++++++++++++-------------------------
 mm/migrate.c               |  11 +-
 mm/page_alloc.c            |  63 +++++++--
 9 files changed, 233 insertions(+), 252 deletions(-)

Andrew Morton June 24, 2020, 7:17 p.m. UTC | #1

On Wed, 24 Jun 2020 15:01:22 +0000 nao.horiguchi@gmail.com wrote:

> I rebased soft-offline rework patchset [1][2] onto the latest mmotm.  The
> rebasing required some non-trivial changes to adjust, but mainly that was
> straightforward.  I confirmed that the reported problem doesn't reproduce on
> compaction after soft offline.  For more precise description of the problem
> and the motivation of this patchset, please see [2].
> 
> I think that the following two patches in v2 are better to be done with
> separate work of hard-offline rework, so it's not included in this series.
> 
>   - mm,hwpoison: Take pages off the buddy when hard-offlining
>   - mm/hwpoison-inject: Rip off duplicated checks
> 
> These two are not directly related to the reported problem, so they seems
> not urgent.  And the first one breaks num_poisoned_pages counting in some
> testcases, and The second patch needs more consideration about commented point.
> 

It would be nice to have some sort of overview of the patch series in
this [0/n] email.

> [1] v1: https://lore.kernel.org/linux-mm/1541746035-13408-1-git-send-email-n-horiguchi@ah.jp.nec.com/
> [2] v2: https://lore.kernel.org/linux-mm/20191017142123.24245-1-osalvador@suse.de/

The above have such, but are they up to date?

HORIGUCHI NAOYA(堀口直也) June 24, 2020, 10:36 p.m. UTC | #2

On Wed, Jun 24, 2020 at 12:17:42PM -0700, Andrew Morton wrote:
> On Wed, 24 Jun 2020 15:01:22 +0000 nao.horiguchi@gmail.com wrote:
> 
> > I rebased soft-offline rework patchset [1][2] onto the latest mmotm.  The
> > rebasing required some non-trivial changes to adjust, but mainly that was
> > straightforward.  I confirmed that the reported problem doesn't reproduce on
> > compaction after soft offline.  For more precise description of the problem
> > and the motivation of this patchset, please see [2].
> > 
> > I think that the following two patches in v2 are better to be done with
> > separate work of hard-offline rework, so it's not included in this series.
> > 
> >   - mm,hwpoison: Take pages off the buddy when hard-offlining
> >   - mm/hwpoison-inject: Rip off duplicated checks
> > 
> > These two are not directly related to the reported problem, so they seems
> > not urgent.  And the first one breaks num_poisoned_pages counting in some
> > testcases, and The second patch needs more consideration about commented point.
> > 
> 
> It would be nice to have some sort of overview of the patch series in
> this [0/n] email.
> 
> > [1] v1: https://lore.kernel.org/linux-mm/1541746035-13408-1-git-send-email-n-horiguchi@ah.jp.nec.com/
> > [2] v2: https://lore.kernel.org/linux-mm/20191017142123.24245-1-osalvador@suse.de/
> 
> The above have such, but are they up to date?

The description of the problem doesn't change, but there're some new patches
and some patches are postponed, so I should've added an overview of this series:

- patch 1, 2 are cleanups.
- patch 3, 4, 5 change the precondition when calling memory_failure(). Previously
  we sometimes call it with holding refcount of the target page and somtimes call
  without holding it, and we passed a flag of whether refcount was taken out of
  memory_failure().  It was confusing and caused code more complex than needed.
- patch 6-10 are cleanups.
- patch 11 introduces new logic to remove the error page from buddy allocator,
  which is also applied to the path of soft-offling in-use pages in patch 12.
- patch 13 is basically a refactoring but I added some adjustment to make sure
  that the freed page is surely sent back to buddy instead of being kept in pcplist,
  which is based on discussion in v2.
- patch 14 fixes the inconsistency of return values between injection interfaces.
- patch 15 is a new patch to complement missing code found in code review for
  previous version.

Core change is in patch 11 and 12, and the others are kind of cleanup/refactoring.

Thanks,
Naoya Horiguchi

Andrew Morton June 24, 2020, 10:49 p.m. UTC | #3

On Wed, 24 Jun 2020 22:36:18 +0000 HORIGUCHI NAOYA(堀口　直也) <naoya.horiguchi@nec.com> wrote:

> On Wed, Jun 24, 2020 at 12:17:42PM -0700, Andrew Morton wrote:
> > On Wed, 24 Jun 2020 15:01:22 +0000 nao.horiguchi@gmail.com wrote:
> > 
> > > I rebased soft-offline rework patchset [1][2] onto the latest mmotm.  The
> > > rebasing required some non-trivial changes to adjust, but mainly that was
> > > straightforward.  I confirmed that the reported problem doesn't reproduce on
> > > compaction after soft offline.  For more precise description of the problem
> > > and the motivation of this patchset, please see [2].
> > > 
> > > I think that the following two patches in v2 are better to be done with
> > > separate work of hard-offline rework, so it's not included in this series.
> > > 
> > >   - mm,hwpoison: Take pages off the buddy when hard-offlining
> > >   - mm/hwpoison-inject: Rip off duplicated checks
> > > 
> > > These two are not directly related to the reported problem, so they seems
> > > not urgent.  And the first one breaks num_poisoned_pages counting in some
> > > testcases, and The second patch needs more consideration about commented point.
> > > 
> > 
> > It would be nice to have some sort of overview of the patch series in
> > this [0/n] email.
> > 
> > > [1] v1: https://lore.kernel.org/linux-mm/1541746035-13408-1-git-send-email-n-horiguchi@ah.jp.nec.com/
> > > [2] v2: https://lore.kernel.org/linux-mm/20191017142123.24245-1-osalvador@suse.de/
> > 
> > The above have such, but are they up to date?
> 
> The description of the problem doesn't change, but there're some new patches
> and some patches are postponed, so I should've added an overview of this series:
> 
> - patch 1, 2 are cleanups.
> - patch 3, 4, 5 change the precondition when calling memory_failure(). Previously
>   we sometimes call it with holding refcount of the target page and somtimes call
>   without holding it, and we passed a flag of whether refcount was taken out of
>   memory_failure().  It was confusing and caused code more complex than needed.
> - patch 6-10 are cleanups.
> - patch 11 introduces new logic to remove the error page from buddy allocator,
>   which is also applied to the path of soft-offling in-use pages in patch 12.
> - patch 13 is basically a refactoring but I added some adjustment to make sure
>   that the freed page is surely sent back to buddy instead of being kept in pcplist,
>   which is based on discussion in v2.
> - patch 14 fixes the inconsistency of return values between injection interfaces.
> - patch 15 is a new patch to complement missing code found in code review for
>   previous version.
> 
> Core change is in patch 11 and 12, and the others are kind of cleanup/refactoring.

And all the other words in
https://lore.kernel.org/linux-mm/1541746035-13408-1-git-send-email-n-horiguchi@ah.jp.nec.com/
are still accurate and complete?

HORIGUCHI NAOYA(堀口直也) June 24, 2020, 11:01 p.m. UTC | #4

On Wed, Jun 24, 2020 at 03:49:47PM -0700, Andrew Morton wrote:
> On Wed, 24 Jun 2020 22:36:18 +0000 HORIGUCHI NAOYA(堀口　直也) <naoya.horiguchi@nec.com> wrote:
> 
> > On Wed, Jun 24, 2020 at 12:17:42PM -0700, Andrew Morton wrote:
> > > On Wed, 24 Jun 2020 15:01:22 +0000 nao.horiguchi@gmail.com wrote:
> > > 
> > > > I rebased soft-offline rework patchset [1][2] onto the latest mmotm.  The
> > > > rebasing required some non-trivial changes to adjust, but mainly that was
> > > > straightforward.  I confirmed that the reported problem doesn't reproduce on
> > > > compaction after soft offline.  For more precise description of the problem
> > > > and the motivation of this patchset, please see [2].
> > > > 
> > > > I think that the following two patches in v2 are better to be done with
> > > > separate work of hard-offline rework, so it's not included in this series.
> > > > 
> > > >   - mm,hwpoison: Take pages off the buddy when hard-offlining
> > > >   - mm/hwpoison-inject: Rip off duplicated checks
> > > > 
> > > > These two are not directly related to the reported problem, so they seems
> > > > not urgent.  And the first one breaks num_poisoned_pages counting in some
> > > > testcases, and The second patch needs more consideration about commented point.
> > > > 
> > > 
> > > It would be nice to have some sort of overview of the patch series in
> > > this [0/n] email.
> > > 
> > > > [1] v1: https://lore.kernel.org/linux-mm/1541746035-13408-1-git-send-email-n-horiguchi@ah.jp.nec.com/
> > > > [2] v2: https://lore.kernel.org/linux-mm/20191017142123.24245-1-osalvador@suse.de/
> > > 
> > > The above have such, but are they up to date?
> > 
> > The description of the problem doesn't change, but there're some new patches
> > and some patches are postponed, so I should've added an overview of this series:
> > 
> > - patch 1, 2 are cleanups.
> > - patch 3, 4, 5 change the precondition when calling memory_failure(). Previously
> >   we sometimes call it with holding refcount of the target page and somtimes call
> >   without holding it, and we passed a flag of whether refcount was taken out of
> >   memory_failure().  It was confusing and caused code more complex than needed.
> > - patch 6-10 are cleanups.
> > - patch 11 introduces new logic to remove the error page from buddy allocator,
> >   which is also applied to the path of soft-offling in-use pages in patch 12.
> > - patch 13 is basically a refactoring but I added some adjustment to make sure
> >   that the freed page is surely sent back to buddy instead of being kept in pcplist,
> >   which is based on discussion in v2.
> > - patch 14 fixes the inconsistency of return values between injection interfaces.
> > - patch 15 is a new patch to complement missing code found in code review for
> >   previous version.
> > 
> > Core change is in patch 11 and 12, and the others are kind of cleanup/refactoring.
> 
> And all the other words in
> https://lore.kernel.org/linux-mm/1541746035-13408-1-git-send-email-n-horiguchi@ah.jp.nec.com/
> are still accurate and complete?

Yes, they are.

- Naoya

Qian Cai June 26, 2020, 1:35 p.m. UTC | #5

On Wed, Jun 24, 2020 at 03:01:22PM +0000, nao.horiguchi@gmail.com wrote:
> I rebased soft-offline rework patchset [1][2] onto the latest mmotm.  The
> rebasing required some non-trivial changes to adjust, but mainly that was
> straightforward.  I confirmed that the reported problem doesn't reproduce on
> compaction after soft offline.  For more precise description of the problem
> and the motivation of this patchset, please see [2].
> 
> I think that the following two patches in v2 are better to be done with
> separate work of hard-offline rework, so it's not included in this series.
> 
>   - mm,hwpoison: Take pages off the buddy when hard-offlining
>   - mm/hwpoison-inject: Rip off duplicated checks
> 
> These two are not directly related to the reported problem, so they seems
> not urgent.  And the first one breaks num_poisoned_pages counting in some
> testcases, and The second patch needs more consideration about commented point.
> 
> Any comment/suggestion/help would be appreciated.

Next-20200626 failed to compile due to this series. Reverting the whole
thing [1] will fix the issue below right away,

mm/memory-failure.c: In function ‘__soft_offline_page’:
mm/memory-failure.c:1827:3: error: implicit declaration of function ‘page_handle_poison’; did you mean ‘page_init_poison’? [-Werror=implicit-function-declaration]
   page_handle_poison(page, false, true);
   ^~~~~~~~~~~~~~~~~~
   page_init_poison

.config used,

https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config
https://raw.githubusercontent.com/cailca/linux-mm/master/arm64.config

[1] git revert --no-edit f296ba1d3a07..0bd1762119e9

> 
> [1] v1: https://lore.kernel.org/linux-mm/1541746035-13408-1-git-send-email-n-horiguchi@ah.jp.nec.com/
> [2] v2: https://lore.kernel.org/linux-mm/20191017142123.24245-1-osalvador@suse.de/
> 
> Thanks,
> Naoya Horiguchi
> ---
> Summary:
> 
> Naoya Horiguchi (7):
>       mm,hwpoison: cleanup unused PageHuge() check
>       mm, hwpoison: remove recalculating hpage
>       mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED
>       mm,hwpoison-inject: don't pin for hwpoison_filter
>       mm,hwpoison: remove MF_COUNT_INCREASED
>       mm,hwpoison: remove flag argument from soft offline functions
>       mm,hwpoison: introduce MF_MSG_UNSPLIT_THP
> 
> Oscar Salvador (8):
>       mm,madvise: Refactor madvise_inject_error
>       mm,hwpoison: Un-export get_hwpoison_page and make it static
>       mm,hwpoison: Kill put_hwpoison_page
>       mm,hwpoison: Unify THP handling for hard and soft offline
>       mm,hwpoison: Rework soft offline for free pages
>       mm,hwpoison: Rework soft offline for in-use pages
>       mm,hwpoison: Refactor soft_offline_huge_page and __soft_offline_page
>       mm,hwpoison: Return 0 if the page is already poisoned in soft-offline
> 
>  drivers/base/memory.c      |   2 +-
>  include/linux/mm.h         |  12 +-
>  include/linux/page-flags.h |   6 +-
>  include/ras/ras_event.h    |   3 +
>  mm/hwpoison-inject.c       |  18 +--
>  mm/madvise.c               |  39 +++---
>  mm/memory-failure.c        | 331 ++++++++++++++++++++-------------------------
>  mm/migrate.c               |  11 +-
>  mm/page_alloc.c            |  63 +++++++--
>  9 files changed, 233 insertions(+), 252 deletions(-)
>

Oscar Salvador June 29, 2020, 10:29 a.m. UTC | #6

On Wed, 2020-06-24 at 15:01 +0000, nao.horiguchi@gmail.com wrote:
> I rebased soft-offline rework patchset [1][2] onto the latest
> mmotm.  The
> rebasing required some non-trivial changes to adjust, but mainly that
> was
> straightforward.  I confirmed that the reported problem doesn't
> reproduce on
> compaction after soft offline.  For more precise description of the
> problem
> and the motivation of this patchset, please see [2].

Hi Naoya,

Thanks for dusting this off.
To be honest, I got stuck with the hard offline mode so this delayed
the resubmission, along other problems.

> I think that the following two patches in v2 are better to be done
> with
> separate work of hard-offline rework, so it's not included in this
> series.
> 
>   - mm,hwpoison: Take pages off the buddy when hard-offlining
>   - mm/hwpoison-inject: Rip off duplicated checks
> 
> These two are not directly related to the reported problem, so they
> seems
> not urgent.  And the first one breaks num_poisoned_pages counting in
> some
> testcases, and The second patch needs more consideration about
> commented point.

I fully agree.

> Any comment/suggestion/help would be appreciated.

My "new" version included a patch to make sure we give a chance to
pages that possibly are in a pcplist.
Current behavior is that if someone tries to soft-offline such a page,
we return an error because page count is 0 but page is not in the buddy
system.

Since this patchset already landed in the mm tree, I could send it as a
standalone patch on top if you agree with it.

My patch looked something like:

From: Oscar Salvador <osalvador@suse.de>
Date: Mon, 29 Jun 2020 12:25:11 +0200
Subject: [PATCH] mm,hwpoison: Drain pcplists before bailing out for
non-buddy
 zero-refcount page

A page with 0-refcount and !PageBuddy could perfectly be a pcppage.
Currently, we bail out with an error if we encounter such a page,
meaning that we do not give a chance to handle pcppages.

Fix this by draining pcplists whenever we find this kind of page
and retry the check again.
It might be that pcplists have been spilled into the buddy allocator
and so we can handle it.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 mm/memory-failure.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index e90ddddab397..3aac3f1eeed0 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -958,7 +958,7 @@ static int page_action(struct page_state *ps,
struct page *p,
  * Return: return 0 if failed to grab the refcount, otherwise true
(some
  * non-zero value.)
  */
-static int get_hwpoison_page(struct page *page)
+static int __get_hwpoison_page(struct page *page)
 {
 	struct page *head = compound_head(page);
 
@@ -988,6 +988,28 @@ static int get_hwpoison_page(struct page *page)
 	return 0;
 }
 
+static int get_hwpoison_page(struct page *p)
+{
+	int ret;
+	bool drained = false;
+
+retry:
+	ret = __get_hwpoison_page(p);
+	if (!ret) {
+		if (!is_free_buddy_page(p) && !page_count(p) &&
!drained) {
+			/*
+			 * The page might be in a pcplist, so try to
drain
+			 * those and see if we are lucky.
+			 */
+			drain_all_pages(page_zone(p));
+			drained = true;
+			goto retry;
+		}
+	}
+
+	return ret;
+}
+
 /*
  * Do all that is necessary to remove user space mappings. Unmap
  * the pages and send SIGBUS to the processes if the data was dirty.

Qian Cai June 30, 2020, 5:08 a.m. UTC | #7

On Wed, Jun 24, 2020 at 03:01:22PM +0000, nao.horiguchi@gmail.com wrote:
> I rebased soft-offline rework patchset [1][2] onto the latest mmotm.  The
> rebasing required some non-trivial changes to adjust, but mainly that was
> straightforward.  I confirmed that the reported problem doesn't reproduce on
> compaction after soft offline.  For more precise description of the problem
> and the motivation of this patchset, please see [2].
> 
> I think that the following two patches in v2 are better to be done with
> separate work of hard-offline rework, so it's not included in this series.
> 
>   - mm,hwpoison: Take pages off the buddy when hard-offlining
>   - mm/hwpoison-inject: Rip off duplicated checks
> 
> These two are not directly related to the reported problem, so they seems
> not urgent.  And the first one breaks num_poisoned_pages counting in some
> testcases, and The second patch needs more consideration about commented point.
> 
> Any comment/suggestion/help would be appreciated.

Even after applied the compling fix,

https://lore.kernel.org/linux-mm/20200628065409.GA546944@u2004/

madvise(MADV_SOFT_OFFLINE) will fail with EIO with hugetlb where it
would succeed without this series. Steps:

# git clone https://github.com/cailca/linux-mm
# cd linux-mm; make
# ./random 1 (Need at least two NUMA memory nodes)
 start: migrate_huge_offline
- use NUMA nodes 0,4.
- mmap and free 8388608 bytes hugepages on node 0
- mmap and free 8388608 bytes hugepages on node 4
madvise: Input/output error

(x86.config is also included there.)

[10718.158548][T13039] __get_any_page: 0x1d1600 free huge page
[10718.165684][T13039] Soft offlining pfn 0x1d1600 at process virtual address 0x7f1dd2000000
[10718.175061][T13039] Soft offlining pfn 0x154c00 at process virtual address 0x7f1dd2200000
[10718.185492][T13039] Soft offlining pfn 0x137c00 at process virtual address 0x7f1dd2000000
[10718.195209][T13039] Soft offlining pfn 0x4c7a00 at process virtual address 0x7f1dd2200000
[10718.203483][T13039] soft offline: 0x4c7a00: hugepage isolation failed: 0, page count 2, type bfffc00001000f (locked|referenced|uptodate|dirty|head)
[10718.218228][T13039] Soft offlining pfn 0x4c7a00 at process virtual address 0x7f1dd2000000
[10718.227522][T13039] Soft offlining pfn 0x2da800 at process virtual address 0x7f1dd2200000
[10718.238503][T13039] Soft offlining pfn 0x1de200 at process virtual address 0x7f1dd2000000
[10718.247822][T13039] Soft offlining pfn 0x155c00 at process virtual address 0x7f1dd2200000
[10718.259031][T13039] Soft offlining pfn 0x203600 at process virtual address 0x7f1dd2000000
[10718.268504][T13039] Soft offlining pfn 0x417600 at process virtual address 0x7f1dd2200000
[10718.278830][T13039] Soft offlining pfn 0x20a600 at process virtual address 0x7f1dd2000000
[10718.288133][T13039] Soft offlining pfn 0x1d0800 at process virtual address 0x7f1dd2200000
[10718.299198][T13039] Soft offlining pfn 0x3e5800 at process virtual address 0x7f1dd2000000
[10718.308593][T13039] Soft offlining pfn 0x21f200 at process virtual address 0x7f1dd2200000
[10718.319725][T13039] Soft offlining pfn 0x18c600 at process virtual address 0x7f1dd2000000
[10718.329301][T13039] Soft offlining pfn 0x396a00 at process virtual address 0x7f1dd2200000
[10718.378882][T13039] Soft offlining pfn 0x4d5000 at process virtual address 0x7f1dd2000000
[10718.388415][T13039] Soft offlining pfn 0x4e5000 at process virtual address 0x7f1dd2200000
[10718.398807][T13039] Soft offlining pfn 0x2f5000 at process virtual address 0x7f1dd2000000
[10718.408236][T13039] Soft offlining pfn 0x50b400 at process virtual address 0x7f1dd2200000
[10718.419781][T13039] Soft offlining pfn 0x396800 at process virtual address 0x7f1dd2000000
[10718.429677][T13039] Soft offlining pfn 0xd69c00 at process virtual address 0x7f1dd2200000
[10718.440435][T13039] Soft offlining pfn 0x21f000 at process virtual address 0x7f1dd2000000
[10718.450341][T13039] Soft offlining pfn 0x399400 at process virtual address 0x7f1dd2200000
[10718.458768][T13039] __get_any_page: 0x399400: unknown zero refcount page type bfffc000000000

The main part is,
https://github.com/cailca/linux-mm/blob/master/random.c#L372

> 
> [1] v1: https://lore.kernel.org/linux-mm/1541746035-13408-1-git-send-email-n-horiguchi@ah.jp.nec.com/
> [2] v2: https://lore.kernel.org/linux-mm/20191017142123.24245-1-osalvador@suse.de/
> 
> Thanks,
> Naoya Horiguchi
> ---
> Summary:
> 
> Naoya Horiguchi (7):
>       mm,hwpoison: cleanup unused PageHuge() check
>       mm, hwpoison: remove recalculating hpage
>       mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED
>       mm,hwpoison-inject: don't pin for hwpoison_filter
>       mm,hwpoison: remove MF_COUNT_INCREASED
>       mm,hwpoison: remove flag argument from soft offline functions
>       mm,hwpoison: introduce MF_MSG_UNSPLIT_THP
> 
> Oscar Salvador (8):
>       mm,madvise: Refactor madvise_inject_error
>       mm,hwpoison: Un-export get_hwpoison_page and make it static
>       mm,hwpoison: Kill put_hwpoison_page
>       mm,hwpoison: Unify THP handling for hard and soft offline
>       mm,hwpoison: Rework soft offline for free pages
>       mm,hwpoison: Rework soft offline for in-use pages
>       mm,hwpoison: Refactor soft_offline_huge_page and __soft_offline_page
>       mm,hwpoison: Return 0 if the page is already poisoned in soft-offline
> 
>  drivers/base/memory.c      |   2 +-
>  include/linux/mm.h         |  12 +-
>  include/linux/page-flags.h |   6 +-
>  include/ras/ras_event.h    |   3 +
>  mm/hwpoison-inject.c       |  18 +--
>  mm/madvise.c               |  39 +++---
>  mm/memory-failure.c        | 331 ++++++++++++++++++++-------------------------
>  mm/migrate.c               |  11 +-
>  mm/page_alloc.c            |  63 +++++++--
>  9 files changed, 233 insertions(+), 252 deletions(-)
>

Oscar Salvador June 30, 2020, 6:35 a.m. UTC | #8

On Tue, 2020-06-30 at 01:08 -0400, Qian Cai wrote:
> On Wed, Jun 24, 2020 at 03:01:22PM +0000, nao.horiguchi@gmail.com
> wrote:
> > I rebased soft-offline rework patchset [1][2] onto the latest
> > mmotm.  The
> > rebasing required some non-trivial changes to adjust, but mainly
> > that was
> > straightforward.  I confirmed that the reported problem doesn't
> > reproduce on
> > compaction after soft offline.  For more precise description of the
> > problem
> > and the motivation of this patchset, please see [2].
> > 
> > I think that the following two patches in v2 are better to be done
> > with
> > separate work of hard-offline rework, so it's not included in this
> > series.
> > 
> >   - mm,hwpoison: Take pages off the buddy when hard-offlining
> >   - mm/hwpoison-inject: Rip off duplicated checks
> > 
> > These two are not directly related to the reported problem, so they
> > seems
> > not urgent.  And the first one breaks num_poisoned_pages counting
> > in some
> > testcases, and The second patch needs more consideration about
> > commented point.
> > 
> > Any comment/suggestion/help would be appreciated.
> 
> Even after applied the compling fix,
> 
> https://lore.kernel.org/linux-mm/20200628065409.GA546944@u2004/
> 
> madvise(MADV_SOFT_OFFLINE) will fail with EIO with hugetlb where it
> would succeed without this series. Steps:
> 
> # git clone https://github.com/cailca/linux-mm
> # cd linux-mm; make
> # ./random 1 (Need at least two NUMA memory nodes)
>  start: migrate_huge_offline
> - use NUMA nodes 0,4.
> - mmap and free 8388608 bytes hugepages on node 0
> - mmap and free 8388608 bytes hugepages on node 4
> madvise: Input/output error

I think I know why.
It's been a while since I took a look, but I compared the posted
patchset with my newest patchset I had ready and I saw I made some
changes with regard of hugetlb pages.

I will be taking a look, although it might be better to re-post the
patchset instead of adding a fix on top since the changes are a bit
substantial.

Thanks for reporting.

HORIGUCHI NAOYA(堀口直也) June 30, 2020, 6:50 a.m. UTC | #9

On Mon, Jun 29, 2020 at 12:29:25PM +0200, Oscar Salvador wrote:
> On Wed, 2020-06-24 at 15:01 +0000, nao.horiguchi@gmail.com wrote:
> > I rebased soft-offline rework patchset [1][2] onto the latest
> > mmotm.  The
> > rebasing required some non-trivial changes to adjust, but mainly that
> > was
> > straightforward.  I confirmed that the reported problem doesn't
> > reproduce on
> > compaction after soft offline.  For more precise description of the
> > problem
> > and the motivation of this patchset, please see [2].
> 
> Hi Naoya,
> 
> Thanks for dusting this off.
> To be honest, I got stuck with the hard offline mode so this delayed
> the resubmission, along other problems.
> 
> > I think that the following two patches in v2 are better to be done
> > with
> > separate work of hard-offline rework, so it's not included in this
> > series.
> > 
> >   - mm,hwpoison: Take pages off the buddy when hard-offlining
> >   - mm/hwpoison-inject: Rip off duplicated checks
> > 
> > These two are not directly related to the reported problem, so they
> > seems
> > not urgent.  And the first one breaks num_poisoned_pages counting in
> > some
> > testcases, and The second patch needs more consideration about
> > commented point.
> 
> I fully agree.
> 
> > Any comment/suggestion/help would be appreciated.
> 
> My "new" version included a patch to make sure we give a chance to
> pages that possibly are in a pcplist.
> Current behavior is that if someone tries to soft-offline such a page,
> we return an error because page count is 0 but page is not in the buddy
> system.
> 
> Since this patchset already landed in the mm tree, I could send it as a
> standalone patch on top if you agree with it.
> 
> My patch looked something like:
> 
> From: Oscar Salvador <osalvador@suse.de>
> Date: Mon, 29 Jun 2020 12:25:11 +0200
> Subject: [PATCH] mm,hwpoison: Drain pcplists before bailing out for
> non-buddy
>  zero-refcount page
> 
> A page with 0-refcount and !PageBuddy could perfectly be a pcppage.
> Currently, we bail out with an error if we encounter such a page,
> meaning that we do not give a chance to handle pcppages.
> 
> Fix this by draining pcplists whenever we find this kind of page
> and retry the check again.
> It might be that pcplists have been spilled into the buddy allocator
> and so we can handle it.
>
> Signed-off-by: Oscar Salvador <osalvador@suse.de>

I'm fine with this patch, so this will be with the next version with
some minor change like moving function comment block together.

Actually I feel we could refactor get_hwpoison_page() with get_any_page()
to handle this kind of pcplist related issues in one place. But this could
be more important in hard-offline reworking, and we don't have to do it
in this series.

Thanks,
Naoya Horiguchi

Oscar Salvador July 1, 2020, 8:22 a.m. UTC | #10

On Tue, 2020-06-30 at 08:35 +0200, Oscar Salvador wrote:
> > Even after applied the compling fix,
> > 
> > https://lore.kernel.org/linux-mm/20200628065409.GA546944@u2004/
> > 
> > madvise(MADV_SOFT_OFFLINE) will fail with EIO with hugetlb where it
> > would succeed without this series. Steps:
> > 
> > # git clone https://github.com/cailca/linux-mm
> > # cd linux-mm; make
> > # ./random 1 (Need at least two NUMA memory nodes)
> >  start: migrate_huge_offline
> > - use NUMA nodes 0,4.
> > - mmap and free 8388608 bytes hugepages on node 0
> > - mmap and free 8388608 bytes hugepages on node 4
> > madvise: Input/output error
> 
> I think I know why.
> It's been a while since I took a look, but I compared the posted
> patchset with my newest patchset I had ready and I saw I made some
> changes with regard of hugetlb pages.
> 
> I will be taking a look, although it might be better to re-post the
> patchset instead of adding a fix on top since the changes are a bit
> substantial.

Yap, I just confirmed this.

Posted version had some flaws wrt. hugetlb pages, that is why I was
working on a v3.
I am rebasing my v3 on top of the mm with the posted patchset reverted.

I expect to post it on Friday.

HORIGUCHI NAOYA(堀口直也) July 1, 2020, 9:07 a.m. UTC | #11

On Wed, Jul 01, 2020 at 10:22:07AM +0200, Oscar Salvador wrote:
> On Tue, 2020-06-30 at 08:35 +0200, Oscar Salvador wrote:
> > > Even after applied the compling fix,
> > > 
> > > https://lore.kernel.org/linux-mm/20200628065409.GA546944@u2004/
> > > 
> > > madvise(MADV_SOFT_OFFLINE) will fail with EIO with hugetlb where it
> > > would succeed without this series. Steps:
> > > 
> > > # git clone https://github.com/cailca/linux-mm
> > > # cd linux-mm; make
> > > # ./random 1 (Need at least two NUMA memory nodes)
> > >  start: migrate_huge_offline
> > > - use NUMA nodes 0,4.
> > > - mmap and free 8388608 bytes hugepages on node 0
> > > - mmap and free 8388608 bytes hugepages on node 4
> > > madvise: Input/output error
> > 
> > I think I know why.
> > It's been a while since I took a look, but I compared the posted
> > patchset with my newest patchset I had ready and I saw I made some
> > changes with regard of hugetlb pages.
> > 
> > I will be taking a look, although it might be better to re-post the
> > patchset instead of adding a fix on top since the changes are a bit
> > substantial.
> 
> Yap, I just confirmed this.

I've reproduced the EIO, but still not sure how this happens ...

> Posted version had some flaws wrt. hugetlb pages, that is why I was
> working on a v3.
> I am rebasing my v3 on top of the mm with the posted patchset reverted.

OK, thank you for offering to work for v4, so I'll wait for it for testing.

Thanks,
Naoya Horiguchi

Oscar Salvador July 14, 2020, 10:08 a.m. UTC | #12

On Tue, Jun 30, 2020 at 01:08:03AM -0400, Qian Cai wrote:
> Even after applied the compling fix,
> 
> https://lore.kernel.org/linux-mm/20200628065409.GA546944@u2004/
> 
> madvise(MADV_SOFT_OFFLINE) will fail with EIO with hugetlb where it
> would succeed without this series. Steps:
> 
> # git clone https://github.com/cailca/linux-mm
> # cd linux-mm; make
> # ./random 1 (Need at least two NUMA memory nodes)
>  start: migrate_huge_offline
> - use NUMA nodes 0,4.
> - mmap and free 8388608 bytes hugepages on node 0
> - mmap and free 8388608 bytes hugepages on node 4
> madvise: Input/output error

Ok, sorry for the lateness, but I had to re-fetch the code on my brain again.

I just finished v4 of this patchset and it seems this problem is gone:

# ./random 1
- start: migrate_huge_offline
- use NUMA nodes 0,1.
- mmap and free 8388608 bytes hugepages on node 0
- mmap and free 8388608 bytes hugepages on node 1
- pass: mmap_offline_node_huge
- start: hotplug_memory
offline: Device or resource busy
offline: Device or resource busy
offline: Device or resource busy
offline: Device or resource busy
offline: Device or resource busy
offline: Device or resource busy
offline: Device or resource busy
offline: Device or resource busy
offline: Invalid argument
offline: Device or resource busy
offline: Invalid argument
offline: Device or resource busy
offline: Device or resource busy
offline: Device or resource busy
offline: Device or resource busy
- pass: hotplug_memory

The test seems to suceed and no crash on the kernel side.

I will just run some more tests to make sure the thing is solid enough
and then I will post v4.

Thanks

[v3,00/15] HWPOISON: soft offline rework

Message

Comments