Message ID | 20220325161428.5068d97e@imladris.surriel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm,hwpoison: unmap poisoned page before invalidation | expand |
On 2022/3/26 4:14, Rik van Riel wrote: > In some cases it appears the invalidation of a hwpoisoned page > fails because the page is still mapped in another process. This > can cause a program to be continuously restarted and die when > it page faults on the page that was not invalidated. Avoid that > problem by unmapping the hwpoisoned page when we find it. > > Another issue is that sometimes we end up oopsing in finish_fault, > if the code tries to do something with the now-NULL vmf->page. > I did not hit this error when submitting the previous patch because > there are several opportunities for alloc_set_pte to bail out before > accessing vmf->page, and that apparently happened on those systems, > and most of the time on other systems, too. > > However, across several million systems that error does occur a > handful of times a day. It can be avoided by returning VM_FAULT_NOPAGE > which will cause do_read_fault to return before calling finish_fault. > > Fixes: e53ac7374e64 ("mm: invalidate hwpoison page cache page in fault path") > Cc: Oscar Salvador <osalvador@suse.de> > Cc: Miaohe Lin <linmiaohe@huawei.com> > Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> > Cc: Mel Gorman <mgorman@suse.de> > Cc: Johannes Weiner <hannes@cmpxchg.org> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: stable@vger.kernel.org > --- > mm/memory.c | 12 ++++++++---- > 1 file changed, 8 insertions(+), 4 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index be44d0b36b18..76e3af9639d9 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -3918,14 +3918,18 @@ static vm_fault_t __do_fault(struct vm_fault *vmf) > return ret; > > if (unlikely(PageHWPoison(vmf->page))) { > + struct page *page = vmf->page; > vm_fault_t poisonret = VM_FAULT_HWPOISON; > if (ret & VM_FAULT_LOCKED) { > + if (page_mapped(page)) > + unmap_mapping_pages(page_mapping(page), > + page->index, 1, false); It seems this unmap_mapping_pages also helps the success rate of the below invalidate_inode_page. > /* Retry if a clean page was removed from the cache. */ > - if (invalidate_inode_page(vmf->page)) > - poisonret = 0; > - unlock_page(vmf->page); > + if (invalidate_inode_page(page)) > + poisonret = VM_FAULT_NOPAGE; > + unlock_page(page); > } > - put_page(vmf->page); > + put_page(page); Do we use page instead of vmf->page just for simplicity? Or there is some other concern? > vmf->page = NULL; We return either VM_FAULT_NOPAGE or VM_FAULT_HWPOISON with vmf->page = NULL. If any case, finish_fault won't be called later. So I think your fix is right. > return poisonret; > } > Many thanks for your patch.
On Sat, 2022-03-26 at 15:48 +0800, Miaohe Lin wrote: > On 2022/3/26 4:14, Rik van Riel wrote: > > > > +++ b/mm/memory.c > > @@ -3918,14 +3918,18 @@ static vm_fault_t __do_fault(struct > > vm_fault *vmf) > > return ret; > > > > if (unlikely(PageHWPoison(vmf->page))) { > > + struct page *page = vmf->page; > > vm_fault_t poisonret = VM_FAULT_HWPOISON; > > if (ret & VM_FAULT_LOCKED) { > > + if (page_mapped(page)) > > + unmap_mapping_pages(page_mapping(pa > > ge), > > + page->index, 1, > > false); > > It seems this unmap_mapping_pages also helps the success rate of the > below invalidate_inode_page. > That is indeed what it is supposed to do. It isn't fool proof, since you can still end up with dirty pages that don't get cleaned immediately, but it seems to turn infinite loops of a program being killed every time it's started into a more manageable situation where the task succeeds again pretty quickly. > > /* Retry if a clean page was removed from > > the cache. */ > > - if (invalidate_inode_page(vmf->page)) > > - poisonret = 0; > > - unlock_page(vmf->page); > > + if (invalidate_inode_page(page)) > > + poisonret = VM_FAULT_NOPAGE; > > + unlock_page(page); > > } > > - put_page(vmf->page); > > + put_page(page); > > Do we use page instead of vmf->page just for simplicity? Or there is > some other concern? > Just a simplification, and not dereferencing the same thing 6 times. > > vmf->page = NULL; > > We return either VM_FAULT_NOPAGE or VM_FAULT_HWPOISON with vmf->page > = NULL. If any case, > finish_fault won't be called later. So I think your fix is right. Want to send in a Reviewed-by or Acked-by? :)
On 2022/3/27 4:14, Rik van Riel wrote: > On Sat, 2022-03-26 at 15:48 +0800, Miaohe Lin wrote: >> On 2022/3/26 4:14, Rik van Riel wrote: >>> >>> +++ b/mm/memory.c >>> @@ -3918,14 +3918,18 @@ static vm_fault_t __do_fault(struct >>> vm_fault *vmf) >>> return ret; >>> >>> if (unlikely(PageHWPoison(vmf->page))) { >>> + struct page *page = vmf->page; >>> vm_fault_t poisonret = VM_FAULT_HWPOISON; >>> if (ret & VM_FAULT_LOCKED) { >>> + if (page_mapped(page)) >>> + unmap_mapping_pages(page_mapping(pa >>> ge), >>> + page->index, 1, >>> false); >> >> It seems this unmap_mapping_pages also helps the success rate of the >> below invalidate_inode_page. >> > > That is indeed what it is supposed to do. > > It isn't fool proof, since you can still end up > with dirty pages that don't get cleaned immediately, > but it seems to turn infinite loops of a program > being killed every time it's started into a more > manageable situation where the task succeeds again > pretty quickly. Looks convincing to me. > >>> /* Retry if a clean page was removed from >>> the cache. */ >>> - if (invalidate_inode_page(vmf->page)) >>> - poisonret = 0; >>> - unlock_page(vmf->page); >>> + if (invalidate_inode_page(page)) >>> + poisonret = VM_FAULT_NOPAGE; >>> + unlock_page(page); >>> } >>> - put_page(vmf->page); >>> + put_page(page); >> >> Do we use page instead of vmf->page just for simplicity? Or there is >> some other concern? >> > > Just a simplification, and not dereferencing the same thing > 6 times. > I see. :) >>> vmf->page = NULL; >> >> We return either VM_FAULT_NOPAGE or VM_FAULT_HWPOISON with vmf->page >> = NULL. If any case, >> finish_fault won't be called later. So I think your fix is right. > > Want to send in a Reviewed-by or Acked-by? :) > Sure, but when I think more about this, it seems this fix isn't ideal: If VM_FAULT_NOPAGE is returned with page table unset, the process will re-trigger page fault again and again until invalidate_inode_page succeeds to evict the inode page. This might hang the process a really long time. Or am I miss something? Thanks.
On Mon, 2022-03-28 at 10:14 +0800, Miaohe Lin wrote: > On 2022/3/27 4:14, Rik van Riel wrote: > > > > > > > > /* Retry if a clean page was removed > > > > from > > > > the cache. */ > > > > - if (invalidate_inode_page(vmf->page)) > > > > - poisonret = 0; > > > > - unlock_page(vmf->page); > > > > + if (invalidate_inode_page(page)) > > > > + poisonret = VM_FAULT_NOPAGE; > > > > + unlock_page(page); > > > > Sure, but when I think more about this, it seems this fix isn't > ideal: > If VM_FAULT_NOPAGE is returned with page table unset, the process > will > re-trigger page fault again and again until invalidate_inode_page > succeeds > to evict the inode page. This might hang the process a really long > time. > Or am I miss something? > If invalidate_inode_page fails, we will return VM_FAULT_HWPOISON, and kill the task, instead of looping indefinitely.
On 2022/3/28 10:24, Rik van Riel wrote: > On Mon, 2022-03-28 at 10:14 +0800, Miaohe Lin wrote: >> On 2022/3/27 4:14, Rik van Riel wrote: >> >> >>> >>>>> /* Retry if a clean page was removed >>>>> from >>>>> the cache. */ >>>>> - if (invalidate_inode_page(vmf->page)) >>>>> - poisonret = 0; >>>>> - unlock_page(vmf->page); >>>>> + if (invalidate_inode_page(page)) >>>>> + poisonret = VM_FAULT_NOPAGE; >>>>> + unlock_page(page); >>> >> >> Sure, but when I think more about this, it seems this fix isn't >> ideal: >> If VM_FAULT_NOPAGE is returned with page table unset, the process >> will >> re-trigger page fault again and again until invalidate_inode_page >> succeeds >> to evict the inode page. This might hang the process a really long >> time. >> Or am I miss something? >> > If invalidate_inode_page fails, we will return > VM_FAULT_HWPOISON, and kill the task, instead > of looping indefinitely. Oh, really sorry! It's a drowsy Monday morning. :) This patch looks good to me. Thanks! Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> >
On Fri, Mar 25, 2022 at 04:14:28PM -0400, Rik van Riel wrote: > In some cases it appears the invalidation of a hwpoisoned page > fails because the page is still mapped in another process. This > can cause a program to be continuously restarted and die when > it page faults on the page that was not invalidated. Avoid that > problem by unmapping the hwpoisoned page when we find it. > > Another issue is that sometimes we end up oopsing in finish_fault, > if the code tries to do something with the now-NULL vmf->page. > I did not hit this error when submitting the previous patch because > there are several opportunities for alloc_set_pte to bail out before > accessing vmf->page, and that apparently happened on those systems, > and most of the time on other systems, too. > > However, across several million systems that error does occur a > handful of times a day. It can be avoided by returning VM_FAULT_NOPAGE > which will cause do_read_fault to return before calling finish_fault. > > Fixes: e53ac7374e64 ("mm: invalidate hwpoison page cache page in fault path") > Cc: Oscar Salvador <osalvador@suse.de> > Cc: Miaohe Lin <linmiaohe@huawei.com> > Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> > Cc: Mel Gorman <mgorman@suse.de> > Cc: Johannes Weiner <hannes@cmpxchg.org> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: stable@vger.kernel.org > --- > mm/memory.c | 12 ++++++++---- > 1 file changed, 8 insertions(+), 4 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index be44d0b36b18..76e3af9639d9 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -3918,14 +3918,18 @@ static vm_fault_t __do_fault(struct vm_fault *vmf) > return ret; > > if (unlikely(PageHWPoison(vmf->page))) { > + struct page *page = vmf->page; > vm_fault_t poisonret = VM_FAULT_HWPOISON; > if (ret & VM_FAULT_LOCKED) { > + if (page_mapped(page)) > + unmap_mapping_pages(page_mapping(page), > + page->index, 1, false); > /* Retry if a clean page was removed from the cache. */ > - if (invalidate_inode_page(vmf->page)) > - poisonret = 0; > - unlock_page(vmf->page); > + if (invalidate_inode_page(page)) > + poisonret = VM_FAULT_NOPAGE; What is the effect of returning VM_FAULT_NOPAGE? I take that we are cool because the pte has been installed and points to a new page? (I could not find where that is being done).
On Fri, Mar 25, 2022 at 04:14:28PM -0400, Rik van Riel wrote: > In some cases it appears the invalidation of a hwpoisoned page > fails because the page is still mapped in another process. This > can cause a program to be continuously restarted and die when > it page faults on the page that was not invalidated. Avoid that > problem by unmapping the hwpoisoned page when we find it. > > Another issue is that sometimes we end up oopsing in finish_fault, > if the code tries to do something with the now-NULL vmf->page. > I did not hit this error when submitting the previous patch because > there are several opportunities for alloc_set_pte to bail out before > accessing vmf->page, and that apparently happened on those systems, > and most of the time on other systems, too. > > However, across several million systems that error does occur a > handful of times a day. It can be avoided by returning VM_FAULT_NOPAGE > which will cause do_read_fault to return before calling finish_fault. I artificially created clean/dirty page cache pages with PageHWPoison flag (with SystemTap), then reproduced NULL pointer dereference by page fault on current mainline branch (with e53ac7374e64). And confirmed that the bug was fixed with this patch, so the fix seems to work. (Maybe I should've done this kind of testing before merging e53ac7374e64, sorry..) Anyway, thank you very much. Tested-by: Naoya Horiguchi <naoya.horiguchi@nec.com> > > Fixes: e53ac7374e64 ("mm: invalidate hwpoison page cache page in fault path") > Cc: Oscar Salvador <osalvador@suse.de> > Cc: Miaohe Lin <linmiaohe@huawei.com> > Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> > Cc: Mel Gorman <mgorman@suse.de> > Cc: Johannes Weiner <hannes@cmpxchg.org> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: stable@vger.kernel.org > --- > mm/memory.c | 12 ++++++++---- > 1 file changed, 8 insertions(+), 4 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index be44d0b36b18..76e3af9639d9 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -3918,14 +3918,18 @@ static vm_fault_t __do_fault(struct vm_fault *vmf) > return ret; > > if (unlikely(PageHWPoison(vmf->page))) { > + struct page *page = vmf->page; > vm_fault_t poisonret = VM_FAULT_HWPOISON; > if (ret & VM_FAULT_LOCKED) { > + if (page_mapped(page)) > + unmap_mapping_pages(page_mapping(page), > + page->index, 1, false); > /* Retry if a clean page was removed from the cache. */ > - if (invalidate_inode_page(vmf->page)) > - poisonret = 0; > - unlock_page(vmf->page); > + if (invalidate_inode_page(page)) > + poisonret = VM_FAULT_NOPAGE; > + unlock_page(page); > } > - put_page(vmf->page); > + put_page(page); > vmf->page = NULL; > return poisonret; > } > -- > 2.35.1 >
On Mon, 2022-03-28 at 11:00 +0200, Oscar Salvador wrote: > On Fri, Mar 25, 2022 at 04:14:28PM -0400, Rik van Riel wrote: > > + if (invalidate_inode_page(page)) > > + poisonret = VM_FAULT_NOPAGE; > > What is the effect of returning VM_FAULT_NOPAGE? > I take that we are cool because the pte has been installed and points > to > a new page? (I could not find where that is being done). > It results in us returning to userspace as if the page fault had been handled, resulting in a second fault on the same address. However, now the page is no longer in the page cache, and we can read it in from disk, to a page that is not hardware poisoned, and we can then use that second page without issues.
On Tue, Mar 29, 2022 at 11:49:53AM -0400, Rik van Riel wrote: > It results in us returning to userspace as if the page > fault had been handled, resulting in a second fault on > the same address. > > However, now the page is no longer in the page cache, > and we can read it in from disk, to a page that is not > hardware poisoned, and we can then use that second page > without issues. Ok, I see, thanks a lot for the explanation Rik.
On Fri, Mar 25, 2022 at 04:14:28PM -0400, Rik van Riel wrote: > In some cases it appears the invalidation of a hwpoisoned page > fails because the page is still mapped in another process. This > can cause a program to be continuously restarted and die when > it page faults on the page that was not invalidated. Avoid that > problem by unmapping the hwpoisoned page when we find it. > > Another issue is that sometimes we end up oopsing in finish_fault, > if the code tries to do something with the now-NULL vmf->page. > I did not hit this error when submitting the previous patch because > there are several opportunities for alloc_set_pte to bail out before > accessing vmf->page, and that apparently happened on those systems, > and most of the time on other systems, too. > > However, across several million systems that error does occur a > handful of times a day. It can be avoided by returning VM_FAULT_NOPAGE > which will cause do_read_fault to return before calling finish_fault. > > Fixes: e53ac7374e64 ("mm: invalidate hwpoison page cache page in fault path") > Cc: Oscar Salvador <osalvador@suse.de> > Cc: Miaohe Lin <linmiaohe@huawei.com> > Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> > Cc: Mel Gorman <mgorman@suse.de> > Cc: Johannes Weiner <hannes@cmpxchg.org> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: stable@vger.kernel.org Reviewed-by: Oscar Salvador <osalvador@suse.de> > --- > mm/memory.c | 12 ++++++++---- > 1 file changed, 8 insertions(+), 4 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index be44d0b36b18..76e3af9639d9 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -3918,14 +3918,18 @@ static vm_fault_t __do_fault(struct vm_fault *vmf) > return ret; > > if (unlikely(PageHWPoison(vmf->page))) { > + struct page *page = vmf->page; > vm_fault_t poisonret = VM_FAULT_HWPOISON; > if (ret & VM_FAULT_LOCKED) { > + if (page_mapped(page)) > + unmap_mapping_pages(page_mapping(page), > + page->index, 1, false); > /* Retry if a clean page was removed from the cache. */ > - if (invalidate_inode_page(vmf->page)) > - poisonret = 0; > - unlock_page(vmf->page); > + if (invalidate_inode_page(page)) > + poisonret = VM_FAULT_NOPAGE; > + unlock_page(page); > } > - put_page(vmf->page); > + put_page(page); > vmf->page = NULL; > return poisonret; > } > -- > 2.35.1 > > >
diff --git a/mm/memory.c b/mm/memory.c index be44d0b36b18..76e3af9639d9 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3918,14 +3918,18 @@ static vm_fault_t __do_fault(struct vm_fault *vmf) return ret; if (unlikely(PageHWPoison(vmf->page))) { + struct page *page = vmf->page; vm_fault_t poisonret = VM_FAULT_HWPOISON; if (ret & VM_FAULT_LOCKED) { + if (page_mapped(page)) + unmap_mapping_pages(page_mapping(page), + page->index, 1, false); /* Retry if a clean page was removed from the cache. */ - if (invalidate_inode_page(vmf->page)) - poisonret = 0; - unlock_page(vmf->page); + if (invalidate_inode_page(page)) + poisonret = VM_FAULT_NOPAGE; + unlock_page(page); } - put_page(vmf->page); + put_page(page); vmf->page = NULL; return poisonret; }