From patchwork Sun Jun 10 20:38:24 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Roman Pen <roman.penyaev@profitbricks.com>
X-Patchwork-Id: 10456761
Return-Path: <linux-block-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	A03EF601F7 for <patchwork-linux-block@patchwork.kernel.org>;
	Sun, 10 Jun 2018 20:39:08 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 842B627528
	for <patchwork-linux-block@patchwork.kernel.org>;
	Sun, 10 Jun 2018 20:39:08 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 77B12276D6; Sun, 10 Jun 2018 20:39:08 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID, MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5185C27528
	for <patchwork-linux-block@patchwork.kernel.org>;
	Sun, 10 Jun 2018 20:39:07 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1753682AbeFJUjG (ORCPT
	<rfc822;patchwork-linux-block@patchwork.kernel.org>);
	Sun, 10 Jun 2018 16:39:06 -0400
Received: from mail-wm0-f67.google.com ([74.125.82.67]:36195 "EHLO
	mail-wm0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1752133AbeFJUjF (ORCPT
	<rfc822;linux-block@vger.kernel.org>);
	Sun, 10 Jun 2018 16:39:05 -0400
Received: by mail-wm0-f67.google.com with SMTP id v131-v6so12656292wma.1
	for <linux-block@vger.kernel.org>;
	Sun, 10 Jun 2018 13:39:05 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=profitbricks-com.20150623.gappssmtp.com; s=20150623;
	h=from:to:cc:subject:date:message-id;
	bh=k15KxeyQ0xlrXXhdbRErBBqERT5qTH9P1uWiQ5LrpPg=;
	b=n/vnhFDK8yy2Wbq5wEte9MH1SpFe/UERIVmDbfJgXskyZj+UhFDYWx0iryVMwf1+BO
	+pnPOjJTFDbHb1MIcjhAGT1hmgEKJTmff0YauAAM1+4VtnyJd6nYFbe5z6biGxIeWU+T
	+oOCyHmGtHSX7MuOIBT0rLdFs8FS2B02egRx1c+rtHoR8nOeL4wq2kJlMD3Y7U1GA4Ak
	VG/MA4TNj+Q/KSrf//wWhZG9Gj3WqRH5QX1w0lZNj/tINtLlcBbsQu2VZtL3o7k5GE/H
	Z0GnibFThtQ9xZzTe8K3KpBopBjI+ISEEn3/AjTd9VmPHYGodbJhFOEj82LBiYnTWLEm
	DXrg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:from:to:cc:subject:date:message-id;
	bh=k15KxeyQ0xlrXXhdbRErBBqERT5qTH9P1uWiQ5LrpPg=;
	b=ChljLRuDEZoCyaCrcxcsv2ZVonm55sI18mhNua7tdHV3rM6QrSd7malS1qvbLGv+A6
	QKVka8UvSnghubZUgSFHopf/3uwdtJERU0WvqjFv8GZioFv614UWGM0dbjUot/t5ch9T
	8BUZ3n8km5VGj+QaD7HXwMjNyVi02mbRr0agkcOlqhkU+7162o8Vzrhmi1VfxsVTsV4e
	H1lQq+oXdNZ10n4JNekxW3m30pVtw7bFRPFnTgfvnIS83QmNazwXK1g0WfMuBU+b+hzI
	aZH/IMv/+/w0dZnr5UacklaN3gTlU+DAOQAfJp88vIs/znOTjOtvRwuabp8GGVbi1+Yu
	nTkA==
X-Gm-Message-State: APt69E2+a7rua2nYiu4BMa4QCe1rqj2C4qM267hNiN6KpYy1yoZ42eh4
	io48WQ1DfvmxsR4Fq9QaPiFZ+jIl4wM=
X-Google-Smtp-Source: 
 ADUXVKJjTBLMJ56PalJvfX2d0gIhLl8YDQIDZtRoPhFFfGvJfawHtKNR38+709wWJZqBKmLZbrPc6A==
X-Received: by 2002:a1c:387:: with SMTP id
	129-v6mr1929355wmd.53.1528663144127;
	Sun, 10 Jun 2018 13:39:04 -0700 (PDT)
Received: from pb.pb.local ([62.217.45.26]) by smtp.gmail.com with ESMTPSA id
	q77-v6sm13831459wmg.25.2018.06.10.13.39.02
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Sun, 10 Jun 2018 13:39:03 -0700 (PDT)
From: Roman Pen <roman.penyaev@profitbricks.com>
To: linux-block@vger.kernel.org
Cc: Jinpu Wang <jinpu.wang@profitbricks.com>,
	Gi-Oh Kim <gi-oh.kim@profitbricks.com>,
	Danil Kipnis <danil.kipnis@profitbricks.com>,
	Roman Pen <roman.penyaev@profitbricks.com>, Jens Axboe <axboe@kernel.dk>,
	Bart Van Assche <bart.vanassche@wdc.com>, Christoph Hellwig <hch@lst.de>,
	Sagi Grimberg <sagi@grimberg.me>, Ming Lei <ming.lei@redhat.com>
Subject: [PATCH 1/1] blk-mq: reinit q->tag_set_list entry only after grace
	period
Date: Sun, 10 Jun 2018 22:38:24 +0200
Message-Id: <20180610203824.16512-1-roman.penyaev@profitbricks.com>
X-Mailer: git-send-email 2.13.1
Sender: linux-block-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-block.vger.kernel.org>
X-Mailing-List: linux-block@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

It is not allowed to reinit q->tag_set_list list entry while RCU grace
period has not completed yet, otherwise the following soft lockup in
blk_mq_sched_restart() happens:

[ 1064.252652] watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [fio:9270]
[ 1064.254445] task: ffff99b912e8b900 task.stack: ffffa6d54c758000
[ 1064.254613] RIP: 0010:blk_mq_sched_restart+0x96/0x150
[ 1064.256510] Call Trace:
[ 1064.256664]  <IRQ>
[ 1064.256824]  blk_mq_free_request+0xea/0x100
[ 1064.256987]  msg_io_conf+0x59/0xd0 [ibnbd_client]
[ 1064.257175]  complete_rdma_req+0xf2/0x230 [ibtrs_client]
[ 1064.257340]  ? ibtrs_post_recv_empty+0x4d/0x70 [ibtrs_core]
[ 1064.257502]  ibtrs_clt_rdma_done+0xd1/0x1e0 [ibtrs_client]
[ 1064.257669]  ib_create_qp+0x321/0x380 [ib_core]
[ 1064.257841]  ib_process_cq_direct+0xbd/0x120 [ib_core]
[ 1064.258007]  irq_poll_softirq+0xb7/0xe0
[ 1064.258165]  __do_softirq+0x106/0x2a2
[ 1064.258328]  irq_exit+0x92/0xa0
[ 1064.258509]  do_IRQ+0x4a/0xd0
[ 1064.258660]  common_interrupt+0x7a/0x7a
[ 1064.258818]  </IRQ>

Meanwhile another context frees other queue but with the same set of
shared tags:

[ 1288.201183] INFO: task bash:5910 blocked for more than 180 seconds.
[ 1288.201833] bash            D    0  5910   5820 0x00000000
[ 1288.202016] Call Trace:
[ 1288.202315]  schedule+0x32/0x80
[ 1288.202462]  schedule_timeout+0x1e5/0x380
[ 1288.203838]  wait_for_completion+0xb0/0x120
[ 1288.204137]  __wait_rcu_gp+0x125/0x160
[ 1288.204287]  synchronize_sched+0x6e/0x80
[ 1288.204770]  blk_mq_free_queue+0x74/0xe0
[ 1288.204922]  blk_cleanup_queue+0xc7/0x110
[ 1288.205073]  ibnbd_clt_unmap_device+0x1bc/0x280 [ibnbd_client]
[ 1288.205389]  ibnbd_clt_unmap_dev_store+0x169/0x1f0 [ibnbd_client]
[ 1288.205548]  kernfs_fop_write+0x109/0x180
[ 1288.206328]  vfs_write+0xb3/0x1a0
[ 1288.206476]  SyS_write+0x52/0xc0
[ 1288.206624]  do_syscall_64+0x68/0x1d0
[ 1288.206774]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2

What happened is the following:

1. There are several MQ queues with shared tags.
2. One queue is about to be freed and now task is in
   blk_mq_del_queue_tag_set().
3. Other CPU is in blk_mq_sched_restart() and loops over all queues in
   tag list in order to find hctx to restart.

Because linked list entry was modified in blk_mq_del_queue_tag_set()
without proper waiting for a grace period, blk_mq_sched_restart()
never ends, spining in list_for_each_entry_rcu_rr(), thus soft lockup.

Fix is simple: reinit list entry after an RCU grace period elapsed.

Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: linux-block@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
---
 block/blk-mq.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 0dc9e341c2a7..2a40d60950f4 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2422,7 +2422,6 @@ static void blk_mq_del_queue_tag_set(struct request_queue *q)
 
 	mutex_lock(&set->tag_list_lock);
 	list_del_rcu(&q->tag_set_list);
-	INIT_LIST_HEAD(&q->tag_set_list);
 	if (list_is_singular(&set->tag_list)) {
 		/* just transitioned to unshared */
 		set->flags &= ~BLK_MQ_F_TAG_SHARED;
@@ -2430,8 +2429,8 @@ static void blk_mq_del_queue_tag_set(struct request_queue *q)
 		blk_mq_update_tag_set_depth(set, false);
 	}
 	mutex_unlock(&set->tag_list_lock);
-
 	synchronize_rcu();
+	INIT_LIST_HEAD(&q->tag_set_list);
 }
 
 static void blk_mq_add_queue_tag_set(struct blk_mq_tag_set *set,