mm: Remove bogus VM_BUG_ON

Message ID	20210818144932.940640-1-willy@infradead.org (mailing list archive)
State	New
Headers	show Return-Path: <SRS0=AujH=NJ=kvack.org=owner-linux-mm@kernel.org> DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 21BDE60FE6 From: "Matthew Wilcox (Oracle)" <willy@infradead.org> To: Andrew Morton <akpm@linux-foundation.org>, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>, stable@vger.kernel.org Subject: [PATCH] mm: Remove bogus VM_BUG_ON Date: Wed, 18 Aug 2021 15:49:32 +0100 Message-Id: <20210818144932.940640-1-willy@infradead.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	mm: Remove bogus VM_BUG_ON \| expand mm: Remove bogus VM_BUG_ON

Message ID

20210818144932.940640-1-willy@infradead.org (mailing list archive)

State

New

Headers

DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 21BDE60FE6
From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
To: Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>,
	stable@vger.kernel.org
Subject: [PATCH] mm: Remove bogus VM_BUG_ON
Date: Wed, 18 Aug 2021 15:49:32 +0100
Message-Id: <20210818144932.940640-1-willy@infradead.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

mm: Remove bogus VM_BUG_ON | expand

Commit Message

Matthew Wilcox Aug. 18, 2021, 2:49 p.m. UTC

It is not safe to check page->index without holding the page lock.
It can be changed if the page is moved between the swap cache and the
page cache for a shmem file, for example.  There is a VM_BUG_ON below
which checks page->index is correct after taking the page lock.

Cc: stable@vger.kernel.org
Fixes: 5c211ba29deb ("mm: add and use find_lock_entries")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/filemap.c | 1 -
 1 file changed, 1 deletion(-)

Comments

Hugh Dickins Aug. 18, 2021, 4:34 p.m. UTC | #1

On Wed, 18 Aug 2021, Matthew Wilcox (Oracle) wrote:

> It is not safe to check page->index without holding the page lock.
> It can be changed if the page is moved between the swap cache and the
> page cache for a shmem file, for example.  There is a VM_BUG_ON below
> which checks page->index is correct after taking the page lock.
> 
> Cc: stable@vger.kernel.org
> Fixes: 5c211ba29deb ("mm: add and use find_lock_entries")

I don't mind that VM_BUG_ON_PAGE() being removed, but question whether
this Fixes anything, and needs to go to stable. Or maybe it's just that
the shmem example is wrong - moving shmem from page to swap cache does
not change page->index. Or maybe you have later changes in your tree
which change that and do require this. Otherwise, I'll have to worry
why my testing has missed it for six months.

Hugh

> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  mm/filemap.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index d1458ecf2f51..34de0b14aaa9 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2033,17 +2033,16 @@ unsigned find_lock_entries(struct address_space *mapping, pgoff_t start,
>  	XA_STATE(xas, &mapping->i_pages, start);
>  	struct page *page;
>  
>  	rcu_read_lock();
>  	while ((page = find_get_entry(&xas, end, XA_PRESENT))) {
>  		if (!xa_is_value(page)) {
>  			if (page->index < start)
>  				goto put;
> -			VM_BUG_ON_PAGE(page->index != xas.xa_index, page);
>  			if (page->index + thp_nr_pages(page) - 1 > end)
>  				goto put;
>  			if (!trylock_page(page))
>  				goto put;
>  			if (page->mapping != mapping || PageWriteback(page))
>  				goto unlock;
>  			VM_BUG_ON_PAGE(!thp_contains(page, xas.xa_index),
>  					page);
> -- 
> 2.30.2

Matthew Wilcox Aug. 18, 2021, 4:45 p.m. UTC | #2

On Wed, Aug 18, 2021 at 09:34:51AM -0700, Hugh Dickins wrote:
> On Wed, 18 Aug 2021, Matthew Wilcox (Oracle) wrote:
> 
> > It is not safe to check page->index without holding the page lock.
> > It can be changed if the page is moved between the swap cache and the
> > page cache for a shmem file, for example.  There is a VM_BUG_ON below
> > which checks page->index is correct after taking the page lock.
> > 
> > Cc: stable@vger.kernel.org
> > Fixes: 5c211ba29deb ("mm: add and use find_lock_entries")
> 
> I don't mind that VM_BUG_ON_PAGE() being removed, but question whether
> this Fixes anything, and needs to go to stable. Or maybe it's just that
> the shmem example is wrong - moving shmem from page to swap cache does
> not change page->index. Or maybe you have later changes in your tree
> which change that and do require this. Otherwise, I'll have to worry
> why my testing has missed it for six months.

I'm sorry, I think you're going to have to worry :-(  Syzbot found
it initially:

https://lore.kernel.org/linux-mm/0000000000009cfcda05c926b34b@google.com/

and then I hit it today during my testing (which is definitely due to
further changes in my tree).

I should have added:

Reported-by: syzbot+c87be4f669d920c76330@syzkaller.appspotmail.com

Hugh Dickins Aug. 19, 2021, 8:42 p.m. UTC | #3

On Wed, 18 Aug 2021, Matthew Wilcox wrote:
> On Wed, Aug 18, 2021 at 09:34:51AM -0700, Hugh Dickins wrote:
> > On Wed, 18 Aug 2021, Matthew Wilcox (Oracle) wrote:
> > 
> > > It is not safe to check page->index without holding the page lock.
> > > It can be changed if the page is moved between the swap cache and the
> > > page cache for a shmem file, for example.  There is a VM_BUG_ON below
> > > which checks page->index is correct after taking the page lock.
> > > 
> > > Cc: stable@vger.kernel.org
> > > Fixes: 5c211ba29deb ("mm: add and use find_lock_entries")
> > 
> > I don't mind that VM_BUG_ON_PAGE() being removed, but question whether
> > this Fixes anything, and needs to go to stable. Or maybe it's just that
> > the shmem example is wrong - moving shmem from page to swap cache does
> > not change page->index. Or maybe you have later changes in your tree
> > which change that and do require this. Otherwise, I'll have to worry
> > why my testing has missed it for six months.
> 
> I'm sorry, I think you're going to have to worry :-(

Indeed, it seems that way; or maybe I can leave this testing to syzbot.

> Syzbot found it initially:
> 
> https://lore.kernel.org/linux-mm/0000000000009cfcda05c926b34b@google.com/

Ah, that's useful info.  Though I can quite see why you didn't mention
that originally: it looks as if syzbot hit a find_lock_entries() crash
and an irqstate warning about the same time, and its bisection went off
and found the commit that introduced those irqstate warnings: neither
the root cause of the irqstate warning, nor the cause of the
find_lock_entries() crash which it claims in the Subject.

I have briefly tried the C reproducer, but didn't get anything out of it;
and suspect it may be a reproducer of the irqstate warning rather than
the crash which interests you and me.  And I can't tell more from the
dump, no dump_page() info is shown, and the "Code:" just points into a
function epilog of assorted ud2s.

> 
> and then I hit it today during my testing (which is definitely due to
> further changes in my tree).

Okay, and it's perfectly reasonable for your tree to make changes which
require that VM_BUG_ON_PAGE to be removed. But I do not yet understand
why it needs to be removed from the current or stable tree.

I don't believe it has anything to do with swap cache. The reproducer
is mounting with "huge=within_size", and doing lots of truncation: my
supposition is that a shmem THP is being collapsed or split,
concurrently with that find_lock_entries().

I don't actually see how that would lead to this VM_BUG_ON_PAGE:
I imagine find_get_entry()'s xas_reload check after get_speculative
should be good enough - but don't know my way around XArray,
so mention this in case it triggers an Aha from you.

While there's certainly a sense in which removing the VM_BUG_ON_PAGE
removes the root cause of the crash, I don't think we understand
what is going on here yet: and therefore I'm reluctant to remove it.

But I have not given this issue much time, busy with other stuff.

> 
> I should have added:
> 
> Reported-by: syzbot+c87be4f669d920c76330@syzkaller.appspotmail.com

That's fair, it did report it, if confusingly.

Hugh

(p.s. in parentheses, to minimize confusion from going slightly
off-topic, but I think I'd be wrong not to mention a separate
issue in this area, with mmotm and linux-next since your folios
went in: doesn't happen easily, but I have twice hit the
include/linux/pagemap.h:258 VM_BUG_ON_PAGE(PageTail(page), page),
in page_cache_add_speculative() - both times when serving
filemap_map_pages(). I have not thought about it at all, but
expect that when you do, you'll simply decide that one is unsafe
and has to be deleted.)

diff --git a/mm/filemap.c b/mm/filemap.c
index d1458ecf2f51..34de0b14aaa9 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2033,17 +2033,16 @@  unsigned find_lock_entries(struct address_space *mapping, pgoff_t start,
 	XA_STATE(xas, &mapping->i_pages, start);
 	struct page *page;
 
 	rcu_read_lock();
 	while ((page = find_get_entry(&xas, end, XA_PRESENT))) {
 		if (!xa_is_value(page)) {
 			if (page->index < start)
 				goto put;
-			VM_BUG_ON_PAGE(page->index != xas.xa_index, page);
 			if (page->index + thp_nr_pages(page) - 1 > end)
 				goto put;
 			if (!trylock_page(page))
 				goto put;
 			if (page->mapping != mapping || PageWriteback(page))
 				goto unlock;
 			VM_BUG_ON_PAGE(!thp_contains(page, xas.xa_index),
 					page);

mm: Remove bogus VM_BUG_ON

Commit Message

Comments

Patch