From patchwork Wed Jul 11 10:19:54 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
X-Patchwork-Id: 10519267
Return-Path: <linux-fsdevel-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	F38D86054E for <patchwork-linux-fsdevel@patchwork.kernel.org>;
	Wed, 11 Jul 2018 10:20:28 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D90AC28C1C
	for <patchwork-linux-fsdevel@patchwork.kernel.org>;
	Wed, 11 Jul 2018 10:20:28 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id CD65128C11; Wed, 11 Jul 2018 10:20:28 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1C3912891C
	for <patchwork-linux-fsdevel@patchwork.kernel.org>;
	Wed, 11 Jul 2018 10:20:23 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1732468AbeGKKXl (ORCPT
	<rfc822;patchwork-linux-fsdevel@patchwork.kernel.org>);
	Wed, 11 Jul 2018 06:23:41 -0400
Received: from www262.sakura.ne.jp ([202.181.97.72]:40725 "EHLO
	www262.sakura.ne.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1726463AbeGKKXl (ORCPT
	<rfc822;linux-fsdevel@vger.kernel.org>);
	Wed, 11 Jul 2018 06:23:41 -0400
Received: from fsav403.sakura.ne.jp (fsav403.sakura.ne.jp [133.242.250.102])
	by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTP id
	w6BAJtRr010327; Wed, 11 Jul 2018 19:19:55 +0900 (JST)
	(envelope-from penguin-kernel@I-love.SAKURA.ne.jp)
Received: from www262.sakura.ne.jp (202.181.97.72) by fsav403.sakura.ne.jp
	(F-Secure/fsigk_smtp/530/fsav403.sakura.ne.jp);
	Wed, 11 Jul 2018 19:19:54 +0900 (JST)
X-Virus-Status: clean(F-Secure/fsigk_smtp/530/fsav403.sakura.ne.jp)
Received: from [192.168.1.8] (softbank126074194044.bbtec.net [126.74.194.44])
	(authenticated bits=0)
	by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTPSA id w6BAJntU010282
	(version=TLSv1.2 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Wed, 11 Jul 2018 19:19:54 +0900 (JST)
	(envelope-from penguin-kernel@I-love.SAKURA.ne.jp)
Subject: Re: INFO: task hung in iterate_supers
To: viro@zeniv.linux.org.uk
Cc: syzbot <syzbot+2349f5067b1772c1d8a5@syzkaller.appspotmail.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	syzkaller-bugs@googlegroups.com
References: <000000000000da8a9b0570a29c01@google.com>
	<0145d376-7ef8-3e17-5a24-94de946a01e5@I-love.SAKURA.ne.jp>
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Message-ID: <7dc64116-f43d-3739-7b35-4eef7a80e343@I-love.SAKURA.ne.jp>
Date: Wed, 11 Jul 2018 19:19:54 +0900
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:52.0) Gecko/20100101
	Thunderbird/52.9.1
MIME-Version: 1.0
In-Reply-To: <0145d376-7ef8-3e17-5a24-94de946a01e5@I-love.SAKURA.ne.jp>
Content-Language: en-US
Sender: linux-fsdevel-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-fsdevel.vger.kernel.org>
X-Mailing-List: linux-fsdevel@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

On 2018/07/10 19:34, Tetsuo Handa wrote:
> It turned out that, although the reason of stalling v9fs_mount() is currently
> unknown, the reason of many processes stuck at iterate_supers() is that
> they are unable to take s->s_umount object due to down_write_nested() below.
> 
> 	/*
> 	 * sget() can have s_umount recursion.
> 	 *
> 	 * When it cannot find a suitable sb, it allocates a new
> 	 * one (this one), and tries again to find a suitable old
> 	 * one.
> 	 *
> 	 * In case that succeeds, it will acquire the s_umount
> 	 * lock of the old one. Since these are clearly distrinct
> 	 * locks, and this object isn't exposed yet, there's no
> 	 * risk of deadlocks.
> 	 *
> 	 * Annotate this by putting this lock in a different
> 	 * subclass.
> 	 */
> 	down_write_nested(&s->s_umount, SINGLE_DEPTH_NESTING);
> 

Guessing from commit dabe0dc194d5d56d ("vfs: fix the rest of sget() races"),
this is an overlooked sget() race.

iterate_supers() is calling down_read(&sb_s_umount) on all superblocks found by
traversing super_blocks list. But sget_userns() adds "newly created superblock
to super_blocks list with ->s_umount held by alloc_super() for write" before
"mount_fs() sets SB_BORN flag after returning from type->mount()".

Therefore, if type->mount() (e.g. v9fs_mount()) took long after sget() for
some reason, processes will be blocked on a superblock on SB_BORN not yet set.
Doing the test (which is done after ->s_umount is held) before holding ->s_umount
(patch shown below) works for me. Although there seems to be many bugs in v9fs
code, I think that making sure that other threads won't try to wait for superblock
with SB_BORN not yet set is better.
---
 fs/super.c | 31 ++++++++++++++++++++++---------
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 50728d9..03c3b3f 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -37,6 +37,11 @@
 #include <linux/user_namespace.h>
 #include "internal.h"
 
+static inline bool sb_is_alive(struct super_block *sb)
+{
+	return sb->s_root && (sb->s_flags & SB_BORN);
+}
+
 static int thaw_super_locked(struct super_block *sb);
 
 static LIST_HEAD(super_blocks);
@@ -407,8 +412,7 @@ static int grab_super(struct super_block *s) __releases(sb_lock)
 bool trylock_super(struct super_block *sb)
 {
 	if (down_read_trylock(&sb->s_umount)) {
-		if (!hlist_unhashed(&sb->s_instances) &&
-		    sb->s_root && (sb->s_flags & SB_BORN))
+		if (!hlist_unhashed(&sb->s_instances) && sb_is_alive(sb))
 			return true;
 		up_read(&sb->s_umount);
 	}
@@ -590,6 +594,8 @@ static void __iterate_supers(void (*f)(struct super_block *))
 
 	spin_lock(&sb_lock);
 	list_for_each_entry(sb, &super_blocks, s_list) {
+		if (!sb_is_alive(sb))
+			continue;
 		if (hlist_unhashed(&sb->s_instances))
 			continue;
 		sb->s_count++;
@@ -620,13 +626,15 @@ void iterate_supers(void (*f)(struct super_block *, void *), void *arg)
 
 	spin_lock(&sb_lock);
 	list_for_each_entry(sb, &super_blocks, s_list) {
+		if (!sb_is_alive(sb))
+			continue;
 		if (hlist_unhashed(&sb->s_instances))
 			continue;
 		sb->s_count++;
 		spin_unlock(&sb_lock);
 
 		down_read(&sb->s_umount);
-		if (sb->s_root && (sb->s_flags & SB_BORN))
+		if (sb_is_alive(sb))
 			f(sb, arg);
 		up_read(&sb->s_umount);
 
@@ -656,11 +664,13 @@ void iterate_supers_type(struct file_system_type *type,
 
 	spin_lock(&sb_lock);
 	hlist_for_each_entry(sb, &type->fs_supers, s_instances) {
+		if (!sb_is_alive(sb))
+			continue;
 		sb->s_count++;
 		spin_unlock(&sb_lock);
 
 		down_read(&sb->s_umount);
-		if (sb->s_root && (sb->s_flags & SB_BORN))
+		if (sb_is_alive(sb))
 			f(sb, arg);
 		up_read(&sb->s_umount);
 
@@ -689,6 +699,8 @@ static struct super_block *__get_super(struct block_device *bdev, bool excl)
 		if (hlist_unhashed(&sb->s_instances))
 			continue;
 		if (sb->s_bdev == bdev) {
+			if (!sb_is_alive(sb))
+				continue;
 			sb->s_count++;
 			spin_unlock(&sb_lock);
 			if (!excl)
@@ -696,7 +708,7 @@ static struct super_block *__get_super(struct block_device *bdev, bool excl)
 			else
 				down_write(&sb->s_umount);
 			/* still alive? */
-			if (sb->s_root && (sb->s_flags & SB_BORN))
+			if (sb_is_alive(sb))
 				return sb;
 			if (!excl)
 				up_read(&sb->s_umount);
@@ -813,11 +825,13 @@ struct super_block *user_get_super(dev_t dev)
 		if (hlist_unhashed(&sb->s_instances))
 			continue;
 		if (sb->s_dev ==  dev) {
+			if (!sb_is_alive(sb))
+				continue;
 			sb->s_count++;
 			spin_unlock(&sb_lock);
 			down_read(&sb->s_umount);
 			/* still alive? */
-			if (sb->s_root && (sb->s_flags & SB_BORN))
+			if (sb_is_alive(sb))
 				return sb;
 			up_read(&sb->s_umount);
 			/* nope, got unmounted */
@@ -916,8 +930,7 @@ int do_remount_sb(struct super_block *sb, int sb_flags, void *data, int force)
 static void do_emergency_remount_callback(struct super_block *sb)
 {
 	down_write(&sb->s_umount);
-	if (sb->s_root && sb->s_bdev && (sb->s_flags & SB_BORN) &&
-	    !sb_rdonly(sb)) {
+	if (sb_is_alive(sb) && sb->s_bdev && !sb_rdonly(sb)) {
 		/*
 		 * What lock protects sb->s_flags??
 		 */
@@ -947,7 +960,7 @@ void emergency_remount(void)
 static void do_thaw_all_callback(struct super_block *sb)
 {
 	down_write(&sb->s_umount);
-	if (sb->s_root && sb->s_flags & SB_BORN) {
+	if (sb_is_alive(sb)) {
 		emergency_thaw_bdev(sb);
 		thaw_super_locked(sb);
 	} else {