diff mbox series

[v18,31/32] mm: Add explicit page decrement in exception path for isolate_lru_pages

Message ID 1598273705-69124-32-git-send-email-alex.shi@linux.alibaba.com (mailing list archive)
State New, archived
Headers show
Series per memcg lru_lock | expand

Commit Message

Alex Shi Aug. 24, 2020, 12:55 p.m. UTC
From: Alexander Duyck <alexander.h.duyck@linux.intel.com>

In isolate_lru_pages we have an exception path where if we call
get_page_unless_zero and that succeeds, but TestClearPageLRU fails we call
put_page. Normally this would be problematic but due to the way that the
calls are ordered and the fact that we are holding the LRU lock we know
that the caller must be holding another reference for the page. Since we
can assume that we can replace the put_page with a call to
put_page_testzero contained within a WARN_ON. By doing this we should see
if we ever leak a page as a result of the reference count somehow hitting
zero when it shouldn't, and can avoid the overhead and confusion of using
the full put_page call.

Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
---
 mm/vmscan.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

Comments

Matthew Wilcox Sept. 9, 2020, 1:01 a.m. UTC | #1
On Mon, Aug 24, 2020 at 08:55:04PM +0800, Alex Shi wrote:
> +++ b/mm/vmscan.c
> @@ -1688,10 +1688,13 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>  
>  			if (!TestClearPageLRU(page)) {
>  				/*
> -				 * This page may in other isolation path,
> -				 * but we still hold lru_lock.
> +				 * This page is being isolated in another
> +				 * thread, but we still hold lru_lock. The
> +				 * other thread must be holding a reference
> +				 * to the page so this should never hit a
> +				 * reference count of 0.
>  				 */
> -				put_page(page);
> +				WARN_ON(put_page_testzero(page));
>  				goto busy;

I read Hugh's review and that led me to take a look at this.  We don't
do it like this.  Use the same pattern as elsewhere in mm:

        page_ref_sub(page, nr);
        VM_BUG_ON_PAGE(page_count(page) <= 0, page);
Alexander Duyck Sept. 9, 2020, 3:43 p.m. UTC | #2
On Tue, Sep 8, 2020 at 6:01 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Mon, Aug 24, 2020 at 08:55:04PM +0800, Alex Shi wrote:
> > +++ b/mm/vmscan.c
> > @@ -1688,10 +1688,13 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
> >
> >                       if (!TestClearPageLRU(page)) {
> >                               /*
> > -                              * This page may in other isolation path,
> > -                              * but we still hold lru_lock.
> > +                              * This page is being isolated in another
> > +                              * thread, but we still hold lru_lock. The
> > +                              * other thread must be holding a reference
> > +                              * to the page so this should never hit a
> > +                              * reference count of 0.
> >                                */
> > -                             put_page(page);
> > +                             WARN_ON(put_page_testzero(page));
> >                               goto busy;
>
> I read Hugh's review and that led me to take a look at this.  We don't
> do it like this.  Use the same pattern as elsewhere in mm:
>
>         page_ref_sub(page, nr);
>         VM_BUG_ON_PAGE(page_count(page) <= 0, page);
>
>

Actually for this case page_ref_dec(page) would make more sense
wouldn't it? Otherwise I agree that would be a better change if that
is the way it has been handled before. I just wasn't familiar with
those other spots.

Thanks.

- Alex
Matthew Wilcox Sept. 9, 2020, 5:07 p.m. UTC | #3
On Wed, Sep 09, 2020 at 08:43:38AM -0700, Alexander Duyck wrote:
> On Tue, Sep 8, 2020 at 6:01 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Mon, Aug 24, 2020 at 08:55:04PM +0800, Alex Shi wrote:
> > > +++ b/mm/vmscan.c
> > > @@ -1688,10 +1688,13 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
> > >
> > >                       if (!TestClearPageLRU(page)) {
> > >                               /*
> > > -                              * This page may in other isolation path,
> > > -                              * but we still hold lru_lock.
> > > +                              * This page is being isolated in another
> > > +                              * thread, but we still hold lru_lock. The
> > > +                              * other thread must be holding a reference
> > > +                              * to the page so this should never hit a
> > > +                              * reference count of 0.
> > >                                */
> > > -                             put_page(page);
> > > +                             WARN_ON(put_page_testzero(page));
> > >                               goto busy;
> >
> > I read Hugh's review and that led me to take a look at this.  We don't
> > do it like this.  Use the same pattern as elsewhere in mm:
> >
> >         page_ref_sub(page, nr);
> >         VM_BUG_ON_PAGE(page_count(page) <= 0, page);
> 
> Actually for this case page_ref_dec(page) would make more sense
> wouldn't it? Otherwise I agree that would be a better change if that
> is the way it has been handled before. I just wasn't familiar with
> those other spots.

Yes, page_ref_dec() should be fine.  It's hard to remember which of
VM_BUG_ON, WARN_ON, etc, compile down to nothing with various CONFIG
options, and which ones actually evalauate their arguments.  Safer not
to put things with side-effects inside macros.
Hugh Dickins Sept. 9, 2020, 6:24 p.m. UTC | #4
On Wed, 9 Sep 2020, Alexander Duyck wrote:
> On Tue, Sep 8, 2020 at 6:01 PM Matthew Wilcox <willy@infradead.org> wrote:
> > On Mon, Aug 24, 2020 at 08:55:04PM +0800, Alex Shi wrote:
> > > +++ b/mm/vmscan.c
> > > @@ -1688,10 +1688,13 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
> > >
> > >                       if (!TestClearPageLRU(page)) {
> > >                               /*
> > > -                              * This page may in other isolation path,
> > > -                              * but we still hold lru_lock.
> > > +                              * This page is being isolated in another
> > > +                              * thread, but we still hold lru_lock. The
> > > +                              * other thread must be holding a reference
> > > +                              * to the page so this should never hit a
> > > +                              * reference count of 0.
> > >                                */
> > > -                             put_page(page);
> > > +                             WARN_ON(put_page_testzero(page));
> > >                               goto busy;
> >
> > I read Hugh's review and that led me to take a look at this.  We don't
> > do it like this.  Use the same pattern as elsewhere in mm:
> >
> >         page_ref_sub(page, nr);
> >         VM_BUG_ON_PAGE(page_count(page) <= 0, page);
> >
> >
> 
> Actually for this case page_ref_dec(page) would make more sense
> wouldn't it? Otherwise I agree that would be a better change if that
> is the way it has been handled before. I just wasn't familiar with
> those other spots.

After overnight reflection, my own preference would be simply to
drop this patch.  I think we are making altogether too much of a
fuss here over what was simply correct as plain put_page()
(and further from correct if we change it to leak the page in an
unforeseen circumstance).

And if Alex's comment was not quite grammatically correct, never mind,
it said as much as was worth saying.  I got more worried by his
placement of the "busy:" label, but that does appear to work correctly.

There's probably a thousand places where put_page() is used, where
it would be troublesome if it were the final put_page(): this one
bothered you because you'd been looking at isolate_migratepages_block(),
and its necessary avoidance of lru_lock recursion on put_page();
but let's just just leave this put_page() as is.

Hugh
Matthew Wilcox Sept. 9, 2020, 8:15 p.m. UTC | #5
On Wed, Sep 09, 2020 at 11:24:14AM -0700, Hugh Dickins wrote:
> After overnight reflection, my own preference would be simply to
> drop this patch.  I think we are making altogether too much of a
> fuss here over what was simply correct as plain put_page()
> (and further from correct if we change it to leak the page in an
> unforeseen circumstance).
> 
> And if Alex's comment was not quite grammatically correct, never mind,
> it said as much as was worth saying.  I got more worried by his
> placement of the "busy:" label, but that does appear to work correctly.
> 
> There's probably a thousand places where put_page() is used, where
> it would be troublesome if it were the final put_page(): this one
> bothered you because you'd been looking at isolate_migratepages_block(),
> and its necessary avoidance of lru_lock recursion on put_page();
> but let's just just leave this put_page() as is.

My problem with put_page() is that it's no longer the simple
decrement-and-branch-to-slow-path-if-zero that it used to be.  It has the
awful devmap excrement in it so it really expands into a lot of code.
I really wish that "feature" could be backed out again.  It clearly
wasn't ready for merge.
Hugh Dickins Sept. 9, 2020, 9:05 p.m. UTC | #6
On Wed, 9 Sep 2020, Matthew Wilcox wrote:
> On Wed, Sep 09, 2020 at 11:24:14AM -0700, Hugh Dickins wrote:
> > After overnight reflection, my own preference would be simply to
> > drop this patch.  I think we are making altogether too much of a
> > fuss here over what was simply correct as plain put_page()
> > (and further from correct if we change it to leak the page in an
> > unforeseen circumstance).
> > 
> > And if Alex's comment was not quite grammatically correct, never mind,
> > it said as much as was worth saying.  I got more worried by his
> > placement of the "busy:" label, but that does appear to work correctly.
> > 
> > There's probably a thousand places where put_page() is used, where
> > it would be troublesome if it were the final put_page(): this one
> > bothered you because you'd been looking at isolate_migratepages_block(),
> > and its necessary avoidance of lru_lock recursion on put_page();
> > but let's just just leave this put_page() as is.
> 
> My problem with put_page() is that it's no longer the simple
> decrement-and-branch-to-slow-path-if-zero that it used to be.  It has the
> awful devmap excrement in it so it really expands into a lot of code.
> I really wish that "feature" could be backed out again.  It clearly
> wasn't ready for merge.

And I suppose I should thank you for opening my eyes to that.
I knew there was "dev" stuff inside __put_page(), but didn't
realize that the inline put_page() has now been defiled.
Yes, I agree, that is horrid and begs to be undone.

But this is not the mail thread for discussing that, and we should
not use strange alternatives to put_page(), here or elsewhere,
just to avoid that (surely? hopefully?) temporary excrescence.

Hugh
Alexander Duyck Sept. 9, 2020, 9:17 p.m. UTC | #7
On Wed, Sep 9, 2020 at 11:24 AM Hugh Dickins <hughd@google.com> wrote:
>
> On Wed, 9 Sep 2020, Alexander Duyck wrote:
> > On Tue, Sep 8, 2020 at 6:01 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > On Mon, Aug 24, 2020 at 08:55:04PM +0800, Alex Shi wrote:
> > > > +++ b/mm/vmscan.c
> > > > @@ -1688,10 +1688,13 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
> > > >
> > > >                       if (!TestClearPageLRU(page)) {
> > > >                               /*
> > > > -                              * This page may in other isolation path,
> > > > -                              * but we still hold lru_lock.
> > > > +                              * This page is being isolated in another
> > > > +                              * thread, but we still hold lru_lock. The
> > > > +                              * other thread must be holding a reference
> > > > +                              * to the page so this should never hit a
> > > > +                              * reference count of 0.
> > > >                                */
> > > > -                             put_page(page);
> > > > +                             WARN_ON(put_page_testzero(page));
> > > >                               goto busy;
> > >
> > > I read Hugh's review and that led me to take a look at this.  We don't
> > > do it like this.  Use the same pattern as elsewhere in mm:
> > >
> > >         page_ref_sub(page, nr);
> > >         VM_BUG_ON_PAGE(page_count(page) <= 0, page);
> > >
> > >
> >
> > Actually for this case page_ref_dec(page) would make more sense
> > wouldn't it? Otherwise I agree that would be a better change if that
> > is the way it has been handled before. I just wasn't familiar with
> > those other spots.
>
> After overnight reflection, my own preference would be simply to
> drop this patch.  I think we are making altogether too much of a
> fuss here over what was simply correct as plain put_page()
> (and further from correct if we change it to leak the page in an
> unforeseen circumstance).
>
> And if Alex's comment was not quite grammatically correct, never mind,
> it said as much as was worth saying.  I got more worried by his
> placement of the "busy:" label, but that does appear to work correctly.
>
> There's probably a thousand places where put_page() is used, where
> it would be troublesome if it were the final put_page(): this one
> bothered you because you'd been looking at isolate_migratepages_block(),
> and its necessary avoidance of lru_lock recursion on put_page();
> but let's just just leave this put_page() as is.

I'd be fine with that, but I would still like to see the comment
updated. At a minimum we should make it clear that we believe that
put_page is safe here as it should never reach zero and if it does
then we are looking at a bug. Then if this starts triggering soft
lockups  we at least have documentation somewhere that someone can
reference on what we expected and why we triggered a lockup.

- Alex
diff mbox series

Patch

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 601fbcb994fb..604240303ea2 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1688,10 +1688,13 @@  static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 
 			if (!TestClearPageLRU(page)) {
 				/*
-				 * This page may in other isolation path,
-				 * but we still hold lru_lock.
+				 * This page is being isolated in another
+				 * thread, but we still hold lru_lock. The
+				 * other thread must be holding a reference
+				 * to the page so this should never hit a
+				 * reference count of 0.
 				 */
-				put_page(page);
+				WARN_ON(put_page_testzero(page));
 				goto busy;
 			}