Message ID | 20210609072029.74645-1-nao.horiguchi@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3] mm/hwpoison: do not lock page again when me_huge_page() successfully recovers | expand |
On Fri, 11 Jun 2021 00:23:29 +0000 HORIGUCHI NAOYA(堀口 直也) <naoya.horiguchi@nec.com> wrote: > > > > --- mm/memory-failure.c > > +++ mm/memory-failure.c > > @@ -1782,6 +1796,8 @@ int memory_failure(unsigned long pfn, int flags) > > > > identify_page_state: > > res = identify_page_state(pfn, p, page_flags); > > + mutex_unlock(&mf_mutex); > > + return res; > > unlock_page: > > unlock_page(p); > > unlock_mutex: > > > > and... That mutex_unlock() looks odd. The patch adds no matching > > mutex_lock? > > Yes, memory_failure() already has one mutex_lock (introduced by > mm-memory-failure-use-a-mutex-to-avoid-memory_failure-races.patch, > sorry for not clarifying that), and the change introduces a separate > return path. But I now think that I should have used "goto unlock_mutex" > to use existing return path. But mm-memory-failure-use-a-mutex-to-avoid-memory_failure-races.patch is part of Tony's three patch series which is not marked for -stable. So it isn't appropriate that this patch be based on top of that three patch series. Should Tony's patchset also be targeted to -stable? If so then OK. If not then please let's prepare your -stable patch against current mainline, as it is higher priority than the 5.14-rc1 material in linux-next.
On Mon, Jun 14, 2021 at 07:35:28PM -0700, Andrew Morton wrote: > On Fri, 11 Jun 2021 00:23:29 +0000 HORIGUCHI NAOYA(堀口 直也) <naoya.horiguchi@nec.com> wrote: > > > > > > > --- mm/memory-failure.c > > > +++ mm/memory-failure.c > > > @@ -1782,6 +1796,8 @@ int memory_failure(unsigned long pfn, int flags) > > > > > > identify_page_state: > > > res = identify_page_state(pfn, p, page_flags); > > > + mutex_unlock(&mf_mutex); > > > + return res; > > > unlock_page: > > > unlock_page(p); > > > unlock_mutex: > > > > > > and... That mutex_unlock() looks odd. The patch adds no matching > > > mutex_lock? > > > > Yes, memory_failure() already has one mutex_lock (introduced by > > mm-memory-failure-use-a-mutex-to-avoid-memory_failure-races.patch, > > sorry for not clarifying that), and the change introduces a separate > > return path. But I now think that I should have used "goto unlock_mutex" > > to use existing return path. > > But mm-memory-failure-use-a-mutex-to-avoid-memory_failure-races.patch > is part of Tony's three patch series which is not marked for -stable. > So it isn't appropriate that this patch be based on top of that three > patch series. > > Should Tony's patchset also be targeted to -stable? If so then OK. Yes, that's fine. And I think that the first two patches (mm-memory-failure-use-a-mutex-to-avoid-memory_failure-races.pathc and mmhwpoison-return-ehwpoison-to-denote-that-the-page-has-already-been-poisoned.patch) can be marked to stable, but 3/3 (mmhwpoison-send-sigbus-with-error-virutal-address.patch) may not because it's a little too large and not the main part of the fixes. Thanks, Naoya Horiguchi > > If not then please let's prepare your -stable patch against current > mainline, as it is higher priority than the 5.14-rc1 material in > linux-next. >
On Tue, 15 Jun 2021 03:58:47 +0000 HORIGUCHI NAOYA(堀口 直也) <naoya.horiguchi@nec.com> wrote: > > Should Tony's patchset also be targeted to -stable? If so then OK. > > Yes, that's fine. And I think that the first two patches > (mm-memory-failure-use-a-mutex-to-avoid-memory_failure-races.pathc and > mmhwpoison-return-ehwpoison-to-denote-that-the-page-has-already-been-poisoned.patch) > can be marked to stable, but 3/3 (mmhwpoison-send-sigbus-with-error-virutal-address.patch) > may not because it's a little too large and not the main part of the fixes. OK, thanks, I queued mm-memory-failure-use-a-mutex-to-avoid-memory_failure-races.patch mm-memory-failure-use-a-mutex-to-avoid-memory_failure-races-fix.patch mmhwpoison-return-ehwpoison-to-denote-that-the-page-has-already-been-poisoned.patch # mm-hwpoison-do-not-lock-page-again-when-me_huge_page-successfully-recovers.patch for 5.13 with cc:stable and mmhwpoison-send-sigbus-with-error-virutal-address.patch mmhwpoison-send-sigbus-with-error-virutal-address-fix.patch for 5.14.
diff --git v5.13-rc2/mm/memory-failure.c v5.13-rc2_patched/mm/memory-failure.c index 763955dd6431..e5a1531f7f4e 100644 --- v5.13-rc2/mm/memory-failure.c +++ v5.13-rc2_patched/mm/memory-failure.c @@ -801,6 +801,7 @@ static int truncate_error_page(struct page *p, unsigned long pfn, */ static int me_kernel(struct page *p, unsigned long pfn) { + unlock_page(p); return MF_IGNORED; } @@ -810,6 +811,7 @@ static int me_kernel(struct page *p, unsigned long pfn) static int me_unknown(struct page *p, unsigned long pfn) { pr_err("Memory failure: %#lx: Unknown page state\n", pfn); + unlock_page(p); return MF_FAILED; } @@ -818,6 +820,7 @@ static int me_unknown(struct page *p, unsigned long pfn) */ static int me_pagecache_clean(struct page *p, unsigned long pfn) { + int ret; struct address_space *mapping; delete_from_lru_cache(p); @@ -826,8 +829,10 @@ static int me_pagecache_clean(struct page *p, unsigned long pfn) * For anonymous pages we're done the only reference left * should be the one m_f() holds. */ - if (PageAnon(p)) - return MF_RECOVERED; + if (PageAnon(p)) { + ret = MF_RECOVERED; + goto out; + } /* * Now truncate the page in the page cache. This is really @@ -841,7 +846,8 @@ static int me_pagecache_clean(struct page *p, unsigned long pfn) /* * Page has been teared down in the meanwhile */ - return MF_FAILED; + ret = MF_FAILED; + goto out; } /* @@ -849,7 +855,10 @@ static int me_pagecache_clean(struct page *p, unsigned long pfn) * * Open: to take i_mutex or not for this? Right now we don't. */ - return truncate_error_page(p, pfn, mapping); + ret = truncate_error_page(p, pfn, mapping); +out: + unlock_page(p); + return ret; } /* @@ -925,24 +934,26 @@ static int me_pagecache_dirty(struct page *p, unsigned long pfn) */ static int me_swapcache_dirty(struct page *p, unsigned long pfn) { + int ret; + ClearPageDirty(p); /* Trigger EIO in shmem: */ ClearPageUptodate(p); - if (!delete_from_lru_cache(p)) - return MF_DELAYED; - else - return MF_FAILED; + ret = delete_from_lru_cache(p) ? MF_FAILED : MF_DELAYED; + unlock_page(p); + return ret; } static int me_swapcache_clean(struct page *p, unsigned long pfn) { + int ret; + delete_from_swap_cache(p); - if (!delete_from_lru_cache(p)) - return MF_RECOVERED; - else - return MF_FAILED; + ret = delete_from_lru_cache(p) ? MF_FAILED : MF_RECOVERED; + unlock_page(p); + return ret; } /* @@ -963,6 +974,7 @@ static int me_huge_page(struct page *p, unsigned long pfn) mapping = page_mapping(hpage); if (mapping) { res = truncate_error_page(hpage, pfn, mapping); + unlock_page(hpage); } else { res = MF_FAILED; unlock_page(hpage); @@ -977,7 +989,6 @@ static int me_huge_page(struct page *p, unsigned long pfn) page_ref_inc(p); res = MF_RECOVERED; } - lock_page(hpage); } return res; @@ -1009,6 +1020,8 @@ static struct page_state { unsigned long mask; unsigned long res; enum mf_action_page_type type; + + /* Callback ->action() has to unlock the relevant page inside it. */ int (*action)(struct page *p, unsigned long pfn); } error_states[] = { { reserved, reserved, MF_MSG_KERNEL, me_kernel }, @@ -1072,6 +1085,7 @@ static int page_action(struct page_state *ps, struct page *p, int result; int count; + /* page p should be unlocked after returning from ps->action(). */ result = ps->action(p, pfn); count = page_count(p) - 1; @@ -1476,7 +1490,7 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags) goto out; } - res = identify_page_state(pfn, p, page_flags); + return identify_page_state(pfn, p, page_flags); out: unlock_page(head); return res; @@ -1768,6 +1782,8 @@ int memory_failure(unsigned long pfn, int flags) identify_page_state: res = identify_page_state(pfn, p, page_flags); + mutex_unlock(&mf_mutex); + return res; unlock_page: unlock_page(p); unlock_mutex: