Message ID | 20210614021212.223326-3-nao.horiguchi@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm/hwpoison: fix unpoison_memory() | expand |
On 2021/6/14 10:12, Naoya Horiguchi wrote: > @@ -1956,17 +1938,6 @@ int unpoison_memory(unsigned long pfn) > goto unlock_mutex; > } > > - /* > - * unpoison_memory() can encounter thp only when the thp is being > - * worked by memory_failure() and the page lock is not held yet. > - * In such case, we yield to memory_failure() and make unpoison fail. > - */ > - if (!PageHuge(page) && PageTransHuge(page)) { > - unpoison_pr_info("Unpoison: Memory failure is now running on %#lx\n", > - pfn, &unpoison_rs); > - goto unlock_mutex; > - } > - if a huge page is in process of alloc or free, HUGETLB_PAGE_DTOR can be set after __SetPageHead() or be cleared before __ClearPageHead(), so this condition may be true in racy. Do we need the racy test for this situation? > if (!get_hwpoison_page(p, flags)) { > if (TestClearPageHWPoison(p)) > num_poisoned_pages_dec();
On Tue, Jun 15, 2021 at 08:57:06PM +0800, Ding Hui wrote: > On 2021/6/14 10:12, Naoya Horiguchi wrote: > > @@ -1956,17 +1938,6 @@ int unpoison_memory(unsigned long pfn) > > goto unlock_mutex; > > } > > - /* > > - * unpoison_memory() can encounter thp only when the thp is being > > - * worked by memory_failure() and the page lock is not held yet. > > - * In such case, we yield to memory_failure() and make unpoison fail. > > - */ > > - if (!PageHuge(page) && PageTransHuge(page)) { > > - unpoison_pr_info("Unpoison: Memory failure is now running on %#lx\n", > > - pfn, &unpoison_rs); > > - goto unlock_mutex; > > - } > > - > > if a huge page is in process of alloc or free, HUGETLB_PAGE_DTOR can be set > after __SetPageHead() or be cleared before __ClearPageHead(), so this > condition may be true in racy. Hi Ding, We confirm PageHWPoison() before reaching this if-block and hwpoisoned pages are prohibited from allocation, so it seems to me that this check never races with hugetlb allocation. And according to the original patch introduced this if-block (0cea3fdc416d: "mm/hwpoison: fix race against poison thp"), this if-block intended to close the race between memory_failure() and unpoison_memory(), so that's no longer necessary due to mf_mutex. > Do we need the racy test for this situation? I'm not sure, but I think that we need more stress/fuzz testing focusing on this subsystem, and "unpoison vs allocation" race can be covered in the topic. Thank you, Naoya Horiguchi
On 2021/6/16 8:11, HORIGUCHI NAOYA(堀口 直也) wrote: > On Tue, Jun 15, 2021 at 08:57:06PM +0800, Ding Hui wrote: >> On 2021/6/14 10:12, Naoya Horiguchi wrote: >>> @@ -1956,17 +1938,6 @@ int unpoison_memory(unsigned long pfn) >>> goto unlock_mutex; >>> } >>> - /* >>> - * unpoison_memory() can encounter thp only when the thp is being >>> - * worked by memory_failure() and the page lock is not held yet. >>> - * In such case, we yield to memory_failure() and make unpoison fail. >>> - */ >>> - if (!PageHuge(page) && PageTransHuge(page)) { >>> - unpoison_pr_info("Unpoison: Memory failure is now running on %#lx\n", >>> - pfn, &unpoison_rs); >>> - goto unlock_mutex; >>> - } >>> - >> >> if a huge page is in process of alloc or free, HUGETLB_PAGE_DTOR can be set >> after __SetPageHead() or be cleared before __ClearPageHead(), so this >> condition may be true in racy. > > Hi Ding, > > We confirm PageHWPoison() before reaching this if-block and hwpoisoned pages > are prohibited from allocation, so it seems to me that this check never > races with hugetlb allocation. > > And according to the original patch introduced this if-block (0cea3fdc416d: > "mm/hwpoison: fix race against poison thp"), this if-block intended to close > the race between memory_failure() and unpoison_memory(), so that's no longer > necessary due to mf_mutex. > I got it and thanks for your explanation. >> Do we need the racy test for this situation? > > I'm not sure, but I think that we need more stress/fuzz testing focusing on > this subsystem, and "unpoison vs allocation" race can be covered in the topic. > > Thank you, > Naoya Horiguchi >
diff --git v5.13-rc5/mm/memory-failure.c v5.13-rc5_patched/mm/memory-failure.c index 280eb6d6dd15..e7910386fc9c 100644 --- v5.13-rc5/mm/memory-failure.c +++ v5.13-rc5_patched/mm/memory-failure.c @@ -1461,14 +1461,6 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags) lock_page(head); page_flags = head->flags; - if (!PageHWPoison(head)) { - pr_err("Memory failure: %#lx: just unpoisoned\n", pfn); - num_poisoned_pages_dec(); - unlock_page(head); - put_page(head); - return 0; - } - /* * TODO: hwpoison for pud-sized hugetlb doesn't work right now, so * simply disable it. In order to make it work properly, we need @@ -1730,16 +1722,6 @@ int memory_failure(unsigned long pfn, int flags) */ page_flags = p->flags; - /* - * unpoison always clear PG_hwpoison inside page lock - */ - if (!PageHWPoison(p)) { - pr_err("Memory failure: %#lx: just unpoisoned\n", pfn); - num_poisoned_pages_dec(); - unlock_page(p); - put_page(p); - goto unlock_mutex; - } if (hwpoison_filter(p)) { if (TestClearPageHWPoison(p)) num_poisoned_pages_dec(); @@ -1956,17 +1938,6 @@ int unpoison_memory(unsigned long pfn) goto unlock_mutex; } - /* - * unpoison_memory() can encounter thp only when the thp is being - * worked by memory_failure() and the page lock is not held yet. - * In such case, we yield to memory_failure() and make unpoison fail. - */ - if (!PageHuge(page) && PageTransHuge(page)) { - unpoison_pr_info("Unpoison: Memory failure is now running on %#lx\n", - pfn, &unpoison_rs); - goto unlock_mutex; - } - if (!get_hwpoison_page(p, flags)) { if (TestClearPageHWPoison(p)) num_poisoned_pages_dec(); @@ -1975,20 +1946,12 @@ int unpoison_memory(unsigned long pfn) goto unlock_mutex; } - lock_page(page); - /* - * This test is racy because PG_hwpoison is set outside of page lock. - * That's acceptable because that won't trigger kernel panic. Instead, - * the PG_hwpoison page will be caught and isolated on the entrance to - * the free buddy page pool. - */ if (TestClearPageHWPoison(page)) { unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n", pfn, &unpoison_rs); num_poisoned_pages_dec(); freeit = 1; } - unlock_page(page); put_page(page); if (freeit && !(pfn == my_zero_pfn(0) && page_count(p) == 1))