diff mbox series

filemap: Remove PageHWPoison check from next_uptodate_page()

Message ID 20211120174429.2596303-1-willy@infradead.org (mailing list archive)
State New
Headers show
Series filemap: Remove PageHWPoison check from next_uptodate_page() | expand

Commit Message

Matthew Wilcox (Oracle) Nov. 20, 2021, 5:44 p.m. UTC
Pages are individually marked as suffering from hardware poisoning.
Checking that the head page is not hardware poisoned doesn't make
sense; we might be after a subpage.  We check each page individually
before we use it, so this was an optimisation gone wrong.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/filemap.c | 2 --
 1 file changed, 2 deletions(-)

Comments

HORIGUCHI NAOYA(堀口 直也) Nov. 21, 2021, 11:35 p.m. UTC | #1
On Sat, Nov 20, 2021 at 05:44:29PM +0000, Matthew Wilcox (Oracle) wrote:
> Pages are individually marked as suffering from hardware poisoning.
> Checking that the head page is not hardware poisoned doesn't make
> sense; we might be after a subpage.  We check each page individually
> before we use it, so this was an optimisation gone wrong.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Looks good to me. Thank you.

Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Yang Shi Nov. 22, 2021, 7:28 p.m. UTC | #2
On Sat, Nov 20, 2021 at 9:44 AM Matthew Wilcox (Oracle)
<willy@infradead.org> wrote:
>
> Pages are individually marked as suffering from hardware poisoning.
> Checking that the head page is not hardware poisoned doesn't make
> sense; we might be after a subpage.  We check each page individually
> before we use it, so this was an optimisation gone wrong.

Yeah, it doesn't make too much sense to check the head page. And it
seems the non-poisoned subpages could be PTE mapped instead of
skipping the whole THP.

Not sure if this is by design, it seems the hwpoisoned check in
filemap_map_pages() does skip the subpages after the poisoned page. Or
we should just skip the poisoned page itself? If so the below change
may be needed:

diff --git a/mm/filemap.c b/mm/filemap.c
index daa0e23a6ee6..f1f0cb263b4a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3318,7 +3318,7 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
        do {
                page = find_subpage(head, xas.xa_index);
                if (PageHWPoison(page))
-                       goto unlock;
+                       goto skip;

                if (mmap_miss > 0)
                        mmap_miss--;
@@ -3337,6 +3337,7 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
                do_set_pte(vmf, page, addr);
                /* no need to invalidate: a not-present page won't be cached */
                update_mmu_cache(vma, addr, vmf->pte);
+skip:
                unlock_page(head);
                continue;
 unlock:


>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  mm/filemap.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 0b6f996108b4..65973204112d 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -3239,8 +3239,6 @@ static struct page *next_uptodate_page(struct page *page,
>                         goto skip;
>                 if (!PageUptodate(page) || PageReadahead(page))
>                         goto skip;
> -               if (PageHWPoison(page))
> -                       goto skip;
>                 if (!trylock_page(page))
>                         goto skip;
>                 if (page->mapping != mapping)
> --
> 2.33.0
>
HORIGUCHI NAOYA(堀口 直也) Nov. 24, 2021, 12:11 a.m. UTC | #3
On Mon, Nov 22, 2021 at 11:28:10AM -0800, Yang Shi wrote:
> On Sat, Nov 20, 2021 at 9:44 AM Matthew Wilcox (Oracle)
> <willy@infradead.org> wrote:
> >
> > Pages are individually marked as suffering from hardware poisoning.
> > Checking that the head page is not hardware poisoned doesn't make
> > sense; we might be after a subpage.  We check each page individually
> > before we use it, so this was an optimisation gone wrong.
> 
> Yeah, it doesn't make too much sense to check the head page. And it
> seems the non-poisoned subpages could be PTE mapped instead of
> skipping the whole THP.
> 
> Not sure if this is by design, it seems the hwpoisoned check in
> filemap_map_pages() does skip the subpages after the poisoned page. Or
> we should just skip the poisoned page itself? If so the below change
> may be needed:
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index daa0e23a6ee6..f1f0cb263b4a 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -3318,7 +3318,7 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
>         do {
>                 page = find_subpage(head, xas.xa_index);
>                 if (PageHWPoison(page))
> -                       goto unlock;
> +                       goto skip;
> 
>                 if (mmap_miss > 0)
>                         mmap_miss--;
> @@ -3337,6 +3337,7 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
>                 do_set_pte(vmf, page, addr);
>                 /* no need to invalidate: a not-present page won't be cached */
>                 update_mmu_cache(vma, addr, vmf->pte);
> +skip:
>                 unlock_page(head);
>                 continue;
>  unlock:

first_map_page() or next_map_page() returns a page (if found) with
holding the refcount, and the new 'goto skip' path skips releasing it.
So this looks to me lead to the mismatch of refcount.
Could you explain the intention a little more (maybe related to your
recent patch about keeping hwpoison page in pagecache?) ?

Thanks,
Naoya Horiguchi

> 
> 
> >
> > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > ---
> >  mm/filemap.c | 2 --
> >  1 file changed, 2 deletions(-)
> >
> > diff --git a/mm/filemap.c b/mm/filemap.c
> > index 0b6f996108b4..65973204112d 100644
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -3239,8 +3239,6 @@ static struct page *next_uptodate_page(struct page *page,
> >                         goto skip;
> >                 if (!PageUptodate(page) || PageReadahead(page))
> >                         goto skip;
> > -               if (PageHWPoison(page))
> > -                       goto skip;
> >                 if (!trylock_page(page))
> >                         goto skip;
> >                 if (page->mapping != mapping)
> > --
> > 2.33.0
> >
Yang Shi Nov. 24, 2021, 12:32 a.m. UTC | #4
On Tue, Nov 23, 2021 at 4:11 PM HORIGUCHI NAOYA(堀口 直也)
<naoya.horiguchi@nec.com> wrote:
>
> On Mon, Nov 22, 2021 at 11:28:10AM -0800, Yang Shi wrote:
> > On Sat, Nov 20, 2021 at 9:44 AM Matthew Wilcox (Oracle)
> > <willy@infradead.org> wrote:
> > >
> > > Pages are individually marked as suffering from hardware poisoning.
> > > Checking that the head page is not hardware poisoned doesn't make
> > > sense; we might be after a subpage.  We check each page individually
> > > before we use it, so this was an optimisation gone wrong.
> >
> > Yeah, it doesn't make too much sense to check the head page. And it
> > seems the non-poisoned subpages could be PTE mapped instead of
> > skipping the whole THP.
> >
> > Not sure if this is by design, it seems the hwpoisoned check in
> > filemap_map_pages() does skip the subpages after the poisoned page. Or
> > we should just skip the poisoned page itself? If so the below change
> > may be needed:
> >
> > diff --git a/mm/filemap.c b/mm/filemap.c
> > index daa0e23a6ee6..f1f0cb263b4a 100644
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -3318,7 +3318,7 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
> >         do {
> >                 page = find_subpage(head, xas.xa_index);
> >                 if (PageHWPoison(page))
> > -                       goto unlock;
> > +                       goto skip;
> >
> >                 if (mmap_miss > 0)
> >                         mmap_miss--;
> > @@ -3337,6 +3337,7 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
> >                 do_set_pte(vmf, page, addr);
> >                 /* no need to invalidate: a not-present page won't be cached */
> >                 update_mmu_cache(vma, addr, vmf->pte);
> > +skip:
> >                 unlock_page(head);
> >                 continue;
> >  unlock:
>
> first_map_page() or next_map_page() returns a page (if found) with
> holding the refcount, and the new 'goto skip' path skips releasing it.
> So this looks to me lead to the mismatch of refcount.
> Could you explain the intention a little more (maybe related to your
> recent patch about keeping hwpoison page in pagecache?) ?

No, not related to my patches.

The current code maps the subpages by PTEs *before* the poisoned page,
but skips the subpages *after* the poisoned page IIUC. It seems not
right, I thought the code was intended to map all subpages by PTEs
except the poisoned pages. So the suggested code is trying to fix the
misbehavior.

That code is just a quick and untested illustration to the above
hypothesis. The corrected version:

diff --git a/mm/filemap.c b/mm/filemap.c
index daa0e23a6ee6..1a76e3edc878 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3317,8 +3317,11 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
        vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, addr, &vmf->ptl);
        do {
                page = find_subpage(head, xas.xa_index);
-               if (PageHWPoison(page))
-                       goto unlock;
+               if (PageHWPoison(page)) {
+                       unlock_page(page);
+                       put_page(page);
+                       continue;
+               }

                if (mmap_miss > 0)
                        mmap_miss--;

>
> Thanks,
> Naoya Horiguchi
>
> >
> >
> > >
> > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > > ---
> > >  mm/filemap.c | 2 --
> > >  1 file changed, 2 deletions(-)
> > >
> > > diff --git a/mm/filemap.c b/mm/filemap.c
> > > index 0b6f996108b4..65973204112d 100644
> > > --- a/mm/filemap.c
> > > +++ b/mm/filemap.c
> > > @@ -3239,8 +3239,6 @@ static struct page *next_uptodate_page(struct page *page,
> > >                         goto skip;
> > >                 if (!PageUptodate(page) || PageReadahead(page))
> > >                         goto skip;
> > > -               if (PageHWPoison(page))
> > > -                       goto skip;
> > >                 if (!trylock_page(page))
> > >                         goto skip;
> > >                 if (page->mapping != mapping)
> > > --
> > > 2.33.0
> > >
Yang Shi Nov. 24, 2021, 12:57 a.m. UTC | #5
On Tue, Nov 23, 2021 at 4:32 PM Yang Shi <shy828301@gmail.com> wrote:
>
> On Tue, Nov 23, 2021 at 4:11 PM HORIGUCHI NAOYA(堀口 直也)
> <naoya.horiguchi@nec.com> wrote:
> >
> > On Mon, Nov 22, 2021 at 11:28:10AM -0800, Yang Shi wrote:
> > > On Sat, Nov 20, 2021 at 9:44 AM Matthew Wilcox (Oracle)
> > > <willy@infradead.org> wrote:
> > > >
> > > > Pages are individually marked as suffering from hardware poisoning.
> > > > Checking that the head page is not hardware poisoned doesn't make
> > > > sense; we might be after a subpage.  We check each page individually
> > > > before we use it, so this was an optimisation gone wrong.
> > >
> > > Yeah, it doesn't make too much sense to check the head page. And it
> > > seems the non-poisoned subpages could be PTE mapped instead of
> > > skipping the whole THP.
> > >
> > > Not sure if this is by design, it seems the hwpoisoned check in
> > > filemap_map_pages() does skip the subpages after the poisoned page. Or
> > > we should just skip the poisoned page itself? If so the below change
> > > may be needed:
> > >
> > > diff --git a/mm/filemap.c b/mm/filemap.c
> > > index daa0e23a6ee6..f1f0cb263b4a 100644
> > > --- a/mm/filemap.c
> > > +++ b/mm/filemap.c
> > > @@ -3318,7 +3318,7 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
> > >         do {
> > >                 page = find_subpage(head, xas.xa_index);
> > >                 if (PageHWPoison(page))
> > > -                       goto unlock;
> > > +                       goto skip;
> > >
> > >                 if (mmap_miss > 0)
> > >                         mmap_miss--;
> > > @@ -3337,6 +3337,7 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
> > >                 do_set_pte(vmf, page, addr);
> > >                 /* no need to invalidate: a not-present page won't be cached */
> > >                 update_mmu_cache(vma, addr, vmf->pte);
> > > +skip:
> > >                 unlock_page(head);
> > >                 continue;
> > >  unlock:
> >
> > first_map_page() or next_map_page() returns a page (if found) with
> > holding the refcount, and the new 'goto skip' path skips releasing it.
> > So this looks to me lead to the mismatch of refcount.
> > Could you explain the intention a little more (maybe related to your
> > recent patch about keeping hwpoison page in pagecache?) ?
>
> No, not related to my patches.
>
> The current code maps the subpages by PTEs *before* the poisoned page,
> but skips the subpages *after* the poisoned page IIUC. It seems not
> right, I thought the code was intended to map all subpages by PTEs
> except the poisoned pages. So the suggested code is trying to fix the
> misbehavior.

Err... I think I misread the code. It does iterate every subpages to
map each of them by PTE and just skip the hwpoisoned subpages.

Sorry for the confusion.

>
> That code is just a quick and untested illustration to the above
> hypothesis. The corrected version:
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index daa0e23a6ee6..1a76e3edc878 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -3317,8 +3317,11 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
>         vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, addr, &vmf->ptl);
>         do {
>                 page = find_subpage(head, xas.xa_index);
> -               if (PageHWPoison(page))
> -                       goto unlock;
> +               if (PageHWPoison(page)) {
> +                       unlock_page(page);
> +                       put_page(page);
> +                       continue;
> +               }
>
>                 if (mmap_miss > 0)
>                         mmap_miss--;
>
> >
> > Thanks,
> > Naoya Horiguchi
> >
> > >
> > >
> > > >
> > > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > > > ---
> > > >  mm/filemap.c | 2 --
> > > >  1 file changed, 2 deletions(-)
> > > >
> > > > diff --git a/mm/filemap.c b/mm/filemap.c
> > > > index 0b6f996108b4..65973204112d 100644
> > > > --- a/mm/filemap.c
> > > > +++ b/mm/filemap.c
> > > > @@ -3239,8 +3239,6 @@ static struct page *next_uptodate_page(struct page *page,
> > > >                         goto skip;
> > > >                 if (!PageUptodate(page) || PageReadahead(page))
> > > >                         goto skip;
> > > > -               if (PageHWPoison(page))
> > > > -                       goto skip;
> > > >                 if (!trylock_page(page))
> > > >                         goto skip;
> > > >                 if (page->mapping != mapping)
> > > > --
> > > > 2.33.0
> > > >
Andrew Morton Nov. 24, 2021, 3:24 a.m. UTC | #6
On Sat, 20 Nov 2021 17:44:29 +0000 "Matthew Wilcox (Oracle)" <willy@infradead.org> wrote:

> Pages are individually marked as suffering from hardware poisoning.
> Checking that the head page is not hardware poisoned doesn't make
> sense; we might be after a subpage.  We check each page individually
> before we use it, so this was an optimisation gone wrong.

What were the runtime effects of this bug?
Matthew Wilcox (Oracle) Nov. 25, 2021, 2:51 p.m. UTC | #7
On Tue, Nov 23, 2021 at 07:24:42PM -0800, Andrew Morton wrote:
> On Sat, 20 Nov 2021 17:44:29 +0000 "Matthew Wilcox (Oracle)" <willy@infradead.org> wrote:
> 
> > Pages are individually marked as suffering from hardware poisoning.
> > Checking that the head page is not hardware poisoned doesn't make
> > sense; we might be after a subpage.  We check each page individually
> > before we use it, so this was an optimisation gone wrong.
> 
> What were the runtime effects of this bug?

We'd fall back to the slow path when there was no need to do that.
diff mbox series

Patch

diff --git a/mm/filemap.c b/mm/filemap.c
index 0b6f996108b4..65973204112d 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3239,8 +3239,6 @@  static struct page *next_uptodate_page(struct page *page,
 			goto skip;
 		if (!PageUptodate(page) || PageReadahead(page))
 			goto skip;
-		if (PageHWPoison(page))
-			goto skip;
 		if (!trylock_page(page))
 			goto skip;
 		if (page->mapping != mapping)