Message ID | 153074044986.27838.16910122305490506387.stgit@dwillia2-desk3.amr.corp.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hello Dan, On Wed, Jul 04, 2018 at 02:40:49PM -0700, Dan Williams wrote: > The madvise_inject_error() routine uses get_user_pages() to lookup the > pfn and other information for injected error, but it does not release > that pin. The assumption is that failed pages should be taken out of > circulation. > > However, for dax mappings it is not possible to take pages out of > circulation since they are 1:1 physically mapped as filesystem blocks, > or device-dax capacity. They also typically represent persistent memory > which has an error clearing capability. > > In preparation for adding a special handler for dax mappings, shift the > responsibility of taking the page reference to memory_failure(). I.e. > drop the page reference and do not specify MF_COUNT_INCREASED to > memory_failure(). > > Cc: Michal Hocko <mhocko@suse.com> > Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> > Signed-off-by: Dan Williams <dan.j.williams@intel.com> > --- > mm/madvise.c | 18 +++++++++++++++--- > 1 file changed, 15 insertions(+), 3 deletions(-) > > diff --git a/mm/madvise.c b/mm/madvise.c > index 4d3c922ea1a1..b731933dddae 100644 > --- a/mm/madvise.c > +++ b/mm/madvise.c > @@ -631,11 +631,13 @@ static int madvise_inject_error(int behavior, > > > for (; start < end; start += PAGE_SIZE << order) { > + unsigned long pfn; > int ret; > > ret = get_user_pages_fast(start, 1, 0, &page); > if (ret != 1) > return ret; > + pfn = page_to_pfn(page); > > /* > * When soft offlining hugepages, after migrating the page > @@ -651,17 +653,27 @@ static int madvise_inject_error(int behavior, > > if (behavior == MADV_SOFT_OFFLINE) { > pr_info("Soft offlining pfn %#lx at process virtual address %#lx\n", > - page_to_pfn(page), start); > + pfn, start); > > ret = soft_offline_page(page, MF_COUNT_INCREASED); > if (ret) > return ret; > continue; > } > + > pr_info("Injecting memory failure for pfn %#lx at process virtual address %#lx\n", > - page_to_pfn(page), start); > + pfn, start); > + > + ret = memory_failure(pfn, 0); > + > + /* > + * Drop the page reference taken by get_user_pages_fast(). In > + * the absence of MF_COUNT_INCREASED the memory_failure() > + * routine is responsible for pinning the page to prevent it > + * from being released back to the page allocator. > + */ > + put_page(page); > > - ret = memory_failure(page_to_pfn(page), MF_COUNT_INCREASED); MF_COUNT_INCREASED means that the page refcount for memory error handling is taken by the caller so you don't have to take one inside memory_failure(). So this code don't keep with the definition, then another refcount can be taken in memory_failure() in normal LRU page's case for example. As a result the error message "Memory failure: %#lx: %s still referenced by %d users\n" will be dumped in page_action(). So if you want to put put_page() in madvise_inject_error(), I think that put_page(page); ret = memory_failure(pfn, 0); can be acceptable because the purpose of get_user_pages_fast() here is just getting pfn, and the refcount itself is not so important. IOW, memory_failure() is called only with pfn which never changes depending on the page's status. In production system memory_failure() is called via machine check code without taking any pagecount, so I don't think the this injection interface is properly mocking the real thing. So I'm feeling that this flag will be wiped out at some point. Thanks, Naoya Horiguchi
On Thu, Jul 12, 2018 at 11:31 PM, Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote: > Hello Dan, > > On Wed, Jul 04, 2018 at 02:40:49PM -0700, Dan Williams wrote: >> The madvise_inject_error() routine uses get_user_pages() to lookup the >> pfn and other information for injected error, but it does not release >> that pin. The assumption is that failed pages should be taken out of >> circulation. >> >> However, for dax mappings it is not possible to take pages out of >> circulation since they are 1:1 physically mapped as filesystem blocks, >> or device-dax capacity. They also typically represent persistent memory >> which has an error clearing capability. >> >> In preparation for adding a special handler for dax mappings, shift the >> responsibility of taking the page reference to memory_failure(). I.e. >> drop the page reference and do not specify MF_COUNT_INCREASED to >> memory_failure(). >> >> Cc: Michal Hocko <mhocko@suse.com> >> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> >> Signed-off-by: Dan Williams <dan.j.williams@intel.com> >> --- >> mm/madvise.c | 18 +++++++++++++++--- >> 1 file changed, 15 insertions(+), 3 deletions(-) >> >> diff --git a/mm/madvise.c b/mm/madvise.c >> index 4d3c922ea1a1..b731933dddae 100644 >> --- a/mm/madvise.c >> +++ b/mm/madvise.c >> @@ -631,11 +631,13 @@ static int madvise_inject_error(int behavior, >> >> >> for (; start < end; start += PAGE_SIZE << order) { >> + unsigned long pfn; >> int ret; >> >> ret = get_user_pages_fast(start, 1, 0, &page); >> if (ret != 1) >> return ret; >> + pfn = page_to_pfn(page); >> >> /* >> * When soft offlining hugepages, after migrating the page >> @@ -651,17 +653,27 @@ static int madvise_inject_error(int behavior, >> >> if (behavior == MADV_SOFT_OFFLINE) { >> pr_info("Soft offlining pfn %#lx at process virtual address %#lx\n", >> - page_to_pfn(page), start); >> + pfn, start); >> >> ret = soft_offline_page(page, MF_COUNT_INCREASED); >> if (ret) >> return ret; >> continue; >> } >> + >> pr_info("Injecting memory failure for pfn %#lx at process virtual address %#lx\n", >> - page_to_pfn(page), start); >> + pfn, start); >> + >> + ret = memory_failure(pfn, 0); >> + >> + /* >> + * Drop the page reference taken by get_user_pages_fast(). In >> + * the absence of MF_COUNT_INCREASED the memory_failure() >> + * routine is responsible for pinning the page to prevent it >> + * from being released back to the page allocator. >> + */ >> + put_page(page); >> >> - ret = memory_failure(page_to_pfn(page), MF_COUNT_INCREASED); > > MF_COUNT_INCREASED means that the page refcount for memory error handling > is taken by the caller so you don't have to take one inside memory_failure(). > So this code don't keep with the definition, then another refcount can be > taken in memory_failure() in normal LRU page's case for example. > As a result the error message "Memory failure: %#lx: %s still referenced by > %d users\n" will be dumped in page_action(). > > So if you want to put put_page() in madvise_inject_error(), I think that > > put_page(page); > ret = memory_failure(pfn, 0); > > can be acceptable because the purpose of get_user_pages_fast() here is > just getting pfn, and the refcount itself is not so important. > IOW, memory_failure() is called only with pfn which never changes depending > on the page's status. Ok, I'll resend with the put_page() moved before memory_failure() to make it more clear that memory_failure() is responsible for taking its own reference and that there is no dependency to hold the reference in madvise_inject_error(). > In production system memory_failure() is called via machine check code > without taking any pagecount, so I don't think the this injection interface > is properly mocking the real thing. So I'm feeling that this flag will be > wiped out at some point. Ok, makes sense.
diff --git a/mm/madvise.c b/mm/madvise.c index 4d3c922ea1a1..b731933dddae 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -631,11 +631,13 @@ static int madvise_inject_error(int behavior, for (; start < end; start += PAGE_SIZE << order) { + unsigned long pfn; int ret; ret = get_user_pages_fast(start, 1, 0, &page); if (ret != 1) return ret; + pfn = page_to_pfn(page); /* * When soft offlining hugepages, after migrating the page @@ -651,17 +653,27 @@ static int madvise_inject_error(int behavior, if (behavior == MADV_SOFT_OFFLINE) { pr_info("Soft offlining pfn %#lx at process virtual address %#lx\n", - page_to_pfn(page), start); + pfn, start); ret = soft_offline_page(page, MF_COUNT_INCREASED); if (ret) return ret; continue; } + pr_info("Injecting memory failure for pfn %#lx at process virtual address %#lx\n", - page_to_pfn(page), start); + pfn, start); + + ret = memory_failure(pfn, 0); + + /* + * Drop the page reference taken by get_user_pages_fast(). In + * the absence of MF_COUNT_INCREASED the memory_failure() + * routine is responsible for pinning the page to prevent it + * from being released back to the page allocator. + */ + put_page(page); - ret = memory_failure(page_to_pfn(page), MF_COUNT_INCREASED); if (ret) return ret; }
The madvise_inject_error() routine uses get_user_pages() to lookup the pfn and other information for injected error, but it does not release that pin. The assumption is that failed pages should be taken out of circulation. However, for dax mappings it is not possible to take pages out of circulation since they are 1:1 physically mapped as filesystem blocks, or device-dax capacity. They also typically represent persistent memory which has an error clearing capability. In preparation for adding a special handler for dax mappings, shift the responsibility of taking the page reference to memory_failure(). I.e. drop the page reference and do not specify MF_COUNT_INCREASED to memory_failure(). Cc: Michal Hocko <mhocko@suse.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- mm/madvise.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-)