diff mbox series

[v2,1/2] mm: soft-offline: return -EBUSY if set_hwpoison_free_buddy_page() fails

Message ID 1560154686-18497-2-git-send-email-n-horiguchi@ah.jp.nec.com (mailing list archive)
State New, archived
Headers show
Series fix return value issue of soft offlining hugepages | expand

Commit Message

Naoya Horiguchi June 10, 2019, 8:18 a.m. UTC
The pass/fail of soft offline should be judged by checking whether the
raw error page was finally contained or not (i.e. the result of
set_hwpoison_free_buddy_page()), but current code do not work like that.
So this patch is suggesting to fix it.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Fixes: 6bc9b56433b76 ("mm: fix race on soft-offlining")
Cc: <stable@vger.kernel.org> # v4.19+
---
 mm/memory-failure.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Andrew Morton June 10, 2019, 9:20 p.m. UTC | #1
On Mon, 10 Jun 2019 17:18:05 +0900 Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:

> The pass/fail of soft offline should be judged by checking whether the
> raw error page was finally contained or not (i.e. the result of
> set_hwpoison_free_buddy_page()), but current code do not work like that.
> So this patch is suggesting to fix it.

Please describe the user-visible runtime effects of this change?
Naoya Horiguchi June 10, 2019, 10:51 p.m. UTC | #2
On Mon, Jun 10, 2019 at 02:20:33PM -0700, Andrew Morton wrote:
> On Mon, 10 Jun 2019 17:18:05 +0900 Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:
> 
> > The pass/fail of soft offline should be judged by checking whether the
> > raw error page was finally contained or not (i.e. the result of
> > set_hwpoison_free_buddy_page()), but current code do not work like that.
> > So this patch is suggesting to fix it.
> 
> Please describe the user-visible runtime effects of this change?

Sorry, could you replace the description as follows (I inserted one sentence)?

    The pass/fail of soft offline should be judged by checking whether the
    raw error page was finally contained or not (i.e. the result of
    set_hwpoison_free_buddy_page()), but current code do not work like that.
    It might lead us to misjudge the test result when
    set_hwpoison_free_buddy_page() fails.  So this patch is suggesting to
    fix it.

Thanks,
Naoya Horiguchi
Mike Kravetz June 11, 2019, 12:19 a.m. UTC | #3
On 6/10/19 1:18 AM, Naoya Horiguchi wrote:
> The pass/fail of soft offline should be judged by checking whether the
> raw error page was finally contained or not (i.e. the result of
> set_hwpoison_free_buddy_page()), but current code do not work like that.
> So this patch is suggesting to fix it.
> 
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Fixes: 6bc9b56433b76 ("mm: fix race on soft-offlining")
> Cc: <stable@vger.kernel.org> # v4.19+

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>

To follow-up on Andrew's comment/question about user visible effects.  Without
this fix, there are cases where madvise(MADV_SOFT_OFFLINE) may not offline the
original page and will not return an error.  Are there any other visible
effects?
Naoya Horiguchi June 11, 2019, 12:57 a.m. UTC | #4
On Mon, Jun 10, 2019 at 05:19:45PM -0700, Mike Kravetz wrote:
> On 6/10/19 1:18 AM, Naoya Horiguchi wrote:
> > The pass/fail of soft offline should be judged by checking whether the
> > raw error page was finally contained or not (i.e. the result of
> > set_hwpoison_free_buddy_page()), but current code do not work like that.
> > So this patch is suggesting to fix it.
> > 
> > Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> > Fixes: 6bc9b56433b76 ("mm: fix race on soft-offlining")
> > Cc: <stable@vger.kernel.org> # v4.19+
> 
> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>

Thank you, Mike.

> 
> To follow-up on Andrew's comment/question about user visible effects.  Without
> this fix, there are cases where madvise(MADV_SOFT_OFFLINE) may not offline the
> original page and will not return an error.

Yes, that's right.

>  Are there any other visible
> effects?

I can't think of other ones.

- Naoya
Anshuman Khandual June 11, 2019, 8:14 a.m. UTC | #5
On 06/11/2019 06:27 AM, Naoya Horiguchi wrote:
> On Mon, Jun 10, 2019 at 05:19:45PM -0700, Mike Kravetz wrote:
>> On 6/10/19 1:18 AM, Naoya Horiguchi wrote:
>>> The pass/fail of soft offline should be judged by checking whether the
>>> raw error page was finally contained or not (i.e. the result of
>>> set_hwpoison_free_buddy_page()), but current code do not work like that.
>>> So this patch is suggesting to fix it.
>>>
>>> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>>> Fixes: 6bc9b56433b76 ("mm: fix race on soft-offlining")
>>> Cc: <stable@vger.kernel.org> # v4.19+
>>
>> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> 
> Thank you, Mike.
> 
>>
>> To follow-up on Andrew's comment/question about user visible effects.  Without
>> this fix, there are cases where madvise(MADV_SOFT_OFFLINE) may not offline the
>> original page and will not return an error.
> 
> Yes, that's right.

Then should this be included in the commit message as well ?
Naoya Horiguchi June 12, 2019, 6:48 a.m. UTC | #6
On Tue, Jun 11, 2019 at 01:44:46PM +0530, Anshuman Khandual wrote:
> 
> 
> On 06/11/2019 06:27 AM, Naoya Horiguchi wrote:
> > On Mon, Jun 10, 2019 at 05:19:45PM -0700, Mike Kravetz wrote:
> >> On 6/10/19 1:18 AM, Naoya Horiguchi wrote:
> >>> The pass/fail of soft offline should be judged by checking whether the
> >>> raw error page was finally contained or not (i.e. the result of
> >>> set_hwpoison_free_buddy_page()), but current code do not work like that.
> >>> So this patch is suggesting to fix it.
> >>>
> >>> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> >>> Fixes: 6bc9b56433b76 ("mm: fix race on soft-offlining")
> >>> Cc: <stable@vger.kernel.org> # v4.19+
> >>
> >> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> > 
> > Thank you, Mike.
> > 
> >>
> >> To follow-up on Andrew's comment/question about user visible effects.  Without
> >> this fix, there are cases where madvise(MADV_SOFT_OFFLINE) may not offline the
> >> original page and will not return an error.
> > 
> > Yes, that's right.
> 
> Then should this be included in the commit message as well ?

Right, I'll clarify the point in the description.

Thanks,
- Naoya
diff mbox series

Patch

diff --git v5.2-rc3/mm/memory-failure.c v5.2-rc3_patched/mm/memory-failure.c
index fc8b517..7ea485e 100644
--- v5.2-rc3/mm/memory-failure.c
+++ v5.2-rc3_patched/mm/memory-failure.c
@@ -1733,6 +1733,8 @@  static int soft_offline_huge_page(struct page *page, int flags)
 		if (!ret) {
 			if (set_hwpoison_free_buddy_page(page))
 				num_poisoned_pages_inc();
+			else
+				ret = -EBUSY;
 		}
 	}
 	return ret;