bio linked list corruption.

Message ID	CA+55aFwC3nfmum6oPoBRU325CKjK-g3MuqW+bhq3+9DnE_4fgw@mail.gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-btrfs-owner@kernel.org> MIME-Version: 1.0 In-Reply-To: <CA+55aFzEG-6pkJXy4itiXsHPu_VV5q2vd6zJLq6vmi=AzdnaUw@mail.gmail.com> References: <CA+55aFyiJGk-A_cJH41Ec8Xj0Zz9M3EU-igJ9bgusj=nm28tFQ@mail.gmail.com> <2bdc068d-afd5-7a78-f334-26970c91aaca@fb.com> <CA+55aFxqRCEVxocpZVDMqinP=EsmgzcnvWpOT6x8NMmcBpWb2Q@mail.gmail.com> <203e0319-bc9b-245c-e162-709267540d22@fb.com> <20161026233808.GC15247@clm-mbp.thefacebook.com> <20161026234751.e66xyzjiwifvbuha@codemonkey.org.uk> <20161031185514.b22zvbxvga4xcinz@codemonkey.org.uk> <CA+55aFwUBYzPcbecpHw9=f8h_JuX18x5bE=Kd_k9QkCcgoiBsA@mail.gmail.com> <20161031194454.GA49877@clm-mbp.thefacebook.com> <20161123193419.pq7adje2eanky2wx@codemonkey.org.uk> <20161123195845.iphzr7ac4mu5ewjt@codemonkey.org.uk> <CAOMGZ=HPUoVZ8_yBLsMUHLigvKadKwBio-8gJ5-WM+2Fe4BZ6A@mail.gmail.com> <CAOMGZ=EsaGs9B3jc5X+3Sa1vyBfBW6AznYy3JR1=6iJMgCgG0g@mail.gmail.com> <CAOMGZ=HfdHLXrRCbzvXcQTgfA-PAErtwZQfPzFr4P8H0MJWZ5g@mail.gmail.com> <CA+55aFxeWe2VQaW30qGR0syiZ75jSwFwg3Ac+wS20KDtf5UKNw@mail.gmail.com> <CAOMGZ=Eb8mw6CfGaruM-tRir7pWn78vLTaCqp1pGPqkagfzPPA@mail.gmail.com> <CA+55aFzEG-6pkJXy4itiXsHPu_VV5q2vd6zJLq6vmi=AzdnaUw@mail.gmail.com> From: Linus Torvalds <torvalds@linux-foundation.org> Date: Mon, 5 Dec 2016 12:35:52 -0800 Message-ID: <CA+55aFwC3nfmum6oPoBRU325CKjK-g3MuqW+bhq3+9DnE_4fgw@mail.gmail.com> Subject: Re: bio linked list corruption. To: Vegard Nossum <vegard.nossum@gmail.com>, Ingo Molnar <mingo@kernel.org>, Peter Zijlstra <peterz@infradead.org> Cc: Dave Jones <davej@codemonkey.org.uk>, Chris Mason <clm@fb.com>, Jens Axboe <axboe@fb.com>, Andy Lutomirski <luto@amacapital.net>, Andy Lutomirski <luto@kernel.org>, Al Viro <viro@zeniv.linux.org.uk>, Josef Bacik <jbacik@fb.com>, David Sterba <dsterba@suse.com>, linux-btrfs <linux-btrfs@vger.kernel.org>, Linux Kernel <linux-kernel@vger.kernel.org>, Dave Chinner <david@fromorbit.com> Content-Type: multipart/mixed; boundary=001a1140b544e98c390542ef3b1e Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk

Message ID

CA+55aFwC3nfmum6oPoBRU325CKjK-g3MuqW+bhq3+9DnE_4fgw@mail.gmail.com (mailing list archive)

State

New, archived

Headers

MIME-Version: 1.0
In-Reply-To: <CA+55aFzEG-6pkJXy4itiXsHPu_VV5q2vd6zJLq6vmi=AzdnaUw@mail.gmail.com>
References: <CA+55aFyiJGk-A_cJH41Ec8Xj0Zz9M3EU-igJ9bgusj=nm28tFQ@mail.gmail.com>
	<2bdc068d-afd5-7a78-f334-26970c91aaca@fb.com>
	<CA+55aFxqRCEVxocpZVDMqinP=EsmgzcnvWpOT6x8NMmcBpWb2Q@mail.gmail.com>
	<203e0319-bc9b-245c-e162-709267540d22@fb.com>
	<20161026233808.GC15247@clm-mbp.thefacebook.com>
	<20161026234751.e66xyzjiwifvbuha@codemonkey.org.uk>
	<20161031185514.b22zvbxvga4xcinz@codemonkey.org.uk>
	<CA+55aFwUBYzPcbecpHw9=f8h_JuX18x5bE=Kd_k9QkCcgoiBsA@mail.gmail.com>
	<20161031194454.GA49877@clm-mbp.thefacebook.com>
	<20161123193419.pq7adje2eanky2wx@codemonkey.org.uk>
	<20161123195845.iphzr7ac4mu5ewjt@codemonkey.org.uk>
	<CAOMGZ=HPUoVZ8_yBLsMUHLigvKadKwBio-8gJ5-WM+2Fe4BZ6A@mail.gmail.com>
	<CAOMGZ=EsaGs9B3jc5X+3Sa1vyBfBW6AznYy3JR1=6iJMgCgG0g@mail.gmail.com>
	<CAOMGZ=HfdHLXrRCbzvXcQTgfA-PAErtwZQfPzFr4P8H0MJWZ5g@mail.gmail.com>
	<CA+55aFxeWe2VQaW30qGR0syiZ75jSwFwg3Ac+wS20KDtf5UKNw@mail.gmail.com>
	<CAOMGZ=Eb8mw6CfGaruM-tRir7pWn78vLTaCqp1pGPqkagfzPPA@mail.gmail.com>
	<CA+55aFzEG-6pkJXy4itiXsHPu_VV5q2vd6zJLq6vmi=AzdnaUw@mail.gmail.com>
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Mon, 5 Dec 2016 12:35:52 -0800
Message-ID: <CA+55aFwC3nfmum6oPoBRU325CKjK-g3MuqW+bhq3+9DnE_4fgw@mail.gmail.com>
Subject: Re: bio linked list corruption.
To: Vegard Nossum <vegard.nossum@gmail.com>, Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>
Cc: Dave Jones <davej@codemonkey.org.uk>, Chris Mason <clm@fb.com>,
	Jens Axboe <axboe@fb.com>, Andy Lutomirski <luto@amacapital.net>,
	Andy Lutomirski <luto@kernel.org>,
	Al Viro <viro@zeniv.linux.org.uk>, Josef Bacik <jbacik@fb.com>,
	David Sterba <dsterba@suse.com>,
	linux-btrfs <linux-btrfs@vger.kernel.org>,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	Dave Chinner <david@fromorbit.com>
Content-Type: multipart/mixed; boundary=001a1140b544e98c390542ef3b1e
Sender: linux-btrfs-owner@vger.kernel.org
Precedence: bulk

Commit Message

Linus Torvalds Dec. 5, 2016, 8:35 p.m. UTC

Adding the scheduler people to the participants list, and re-attaching
the patch, because while this patch is internal to the VM code, the
issue itself is not.

There might well be other cases where somebody goes "wake_up_all()"
will wake everybody up, so I can put the wait queue head on the stack,
and then after I've woken people up I can return".

Ingo/PeterZ: the reason that does *not* work is that "wake_up_all()"
does make sure that everybody is woken up, but the usual autoremove
wake function only removes the wakeup entry if the process was woken
up by that particular wakeup. If something else had woken it up, the
entry remains on the list, and the waker in this case returned with
the wait head still containing entries.

Which is deadly when the wait queue head is on the stack.

So I'm wondering if we should make that "synchronous_wake_function()"
available generally, and maybe introduce a DEFINE_WAIT_SYNC() helper
that uses it.

Of course, I'm really hoping that this shmem.c use is the _only_ such
case.  But I doubt it.

Comments?

Note for Ingo and Peter: this patch has not been tested at all. But
Vegard did test an earlier patch of mine that just verified that yes,
the issue really was that wait queue entries remained on the wait
queue head just as we were about to return and free it.

           Linus

On Mon, Dec 5, 2016 at 12:10 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Anyway, can you try this patch instead? It should actually cause the
> wake_up_all() to always remove all entries, and thus the WARN_ON()
> should no longer happen (and I removed the "list_del()" hackery).
>
>                        Linus

Comments

Vegard Nossum Dec. 5, 2016, 9:33 p.m. UTC | #1

On 5 December 2016 at 21:35, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> Note for Ingo and Peter: this patch has not been tested at all. But
> Vegard did test an earlier patch of mine that just verified that yes,
> the issue really was that wait queue entries remained on the wait
> queue head just as we were about to return and free it.

The second patch has been running for 1h+ without any problems of any
kind. I should typically have seen 2 crashes by now. I'll let it run
overnight to be sure.


Vegard
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Vegard Nossum Dec. 6, 2016, 8:42 a.m. UTC | #2

On 5 December 2016 at 22:33, Vegard Nossum <vegard.nossum@gmail.com> wrote:
> On 5 December 2016 at 21:35, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>> Note for Ingo and Peter: this patch has not been tested at all. But
>> Vegard did test an earlier patch of mine that just verified that yes,
>> the issue really was that wait queue entries remained on the wait
>> queue head just as we were about to return and free it.
>
> The second patch has been running for 1h+ without any problems of any
> kind. I should typically have seen 2 crashes by now. I'll let it run
> overnight to be sure.

Alright, so nearly 12 hours later I don't see either the new warning
or the original crash at all, so feel free to add:

Tested-by: Vegard Nossum <vegard.nossum@oracle.com>.

That said, my 8 VMs had all panicked in some way due to OOMs (which is
new since v4.8), although some got page allocation stalls for >20s and
died because "khugepaged blocked for more than 120 seconds", others
got "Out of memory and no killable processes".

Vegard
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/mm/shmem.c b/mm/shmem.c
index 166ebf5d2bce..17beb44e9f4f 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1848,6 +1848,19 @@  alloc_nohuge:		page = shmem_alloc_and_acct_page(gfp, info, sbinfo,
 	return error;
 }
 
+/*
+ * This is like autoremove_wake_function, but it removes the wait queue
+ * entry unconditionally - even if something else had already woken the
+ * target.
+ */
+static int synchronous_wake_function(wait_queue_t *wait, unsigned mode, int sync, void *key)
+{
+	int ret = default_wake_function(wait, mode, sync, key);
+	list_del_init(&wait->task_list);
+	return ret;
+}
+
+
 static int shmem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 {
 	struct inode *inode = file_inode(vma->vm_file);
@@ -1883,7 +1896,7 @@  static int shmem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 		    vmf->pgoff >= shmem_falloc->start &&
 		    vmf->pgoff < shmem_falloc->next) {
 			wait_queue_head_t *shmem_falloc_waitq;
-			DEFINE_WAIT(shmem_fault_wait);
+			DEFINE_WAIT_FUNC(shmem_fault_wait, synchronous_wake_function);
 
 			ret = VM_FAULT_NOPAGE;
 			if ((vmf->flags & FAULT_FLAG_ALLOW_RETRY) &&
@@ -2665,6 +2678,7 @@  static long shmem_fallocate(struct file *file, int mode, loff_t offset,
 		spin_lock(&inode->i_lock);
 		inode->i_private = NULL;
 		wake_up_all(&shmem_falloc_waitq);
+		WARN_ON_ONCE(!list_empty(&shmem_falloc_waitq.task_list));
 		spin_unlock(&inode->i_lock);
 		error = 0;
 		goto out;

bio linked list corruption.

Commit Message

Comments

Patch