From patchwork Wed Dec  6 06:57:43 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Omar Sandoval <osandov@osandov.com>
X-Patchwork-Id: 10094969
Return-Path: <linux-block-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	66FFC60210 for <patchwork-linux-block@patchwork.kernel.org>;
	Wed,  6 Dec 2017 06:57:52 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 53F7328829
	for <patchwork-linux-block@patchwork.kernel.org>;
	Wed,  6 Dec 2017 06:57:52 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 46AA329B6B; Wed,  6 Dec 2017 06:57:52 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2C28E28829
	for <patchwork-linux-block@patchwork.kernel.org>;
	Wed,  6 Dec 2017 06:57:51 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1753068AbdLFG5u (ORCPT
	<rfc822;patchwork-linux-block@patchwork.kernel.org>);
	Wed, 6 Dec 2017 01:57:50 -0500
Received: from mail-pf0-f195.google.com ([209.85.192.195]:36252 "EHLO
	mail-pf0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1752838AbdLFG5u (ORCPT
	<rfc822; linux-block@vger.kernel.org>); Wed, 6 Dec 2017 01:57:50 -0500
Received: by mail-pf0-f195.google.com with SMTP id p84so1905093pfd.3
	for <linux-block@vger.kernel.org>;
	Tue, 05 Dec 2017 22:57:49 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=osandov-com.20150623.gappssmtp.com; s=20150623;
	h=from:to:cc:subject:date:message-id;
	bh=RAoaybvLU43P4KOgOHTExjF9LHqLgj/cY7ItRsQ/aSU=;
	b=w8EoNla0jy8OcqUgaGNPh09DTSsBKg60+XiNEVbAMWKg13fDi5k5BMPgFDj49ysNfh
	JN0ytr3JCRzLLc+3PyfnZoqzOQq67u+idF3/Odr+BLkNjX1a9OFvpsfDcjcGUuzVcWVN
	+8etL86QpUdmHXPXyf7EjII6JIo5c6wewbPbdB1HGhCEB5Jb1YC9fD9ADoIWR95A0N07
	h37G0enoLI1dVFpVwQCIPC8s/euLy55xxLFyer1IvNv83Yn10+1hxtNM+L7UGmLSj5Zv
	arsn7rmbgbTT6+yZwgrNYlZ5pR0RKxIBADfVQ2tAGVVdmOTKevqpMqGLKJ5brlZaoYjS
	DA+A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:from:to:cc:subject:date:message-id;
	bh=RAoaybvLU43P4KOgOHTExjF9LHqLgj/cY7ItRsQ/aSU=;
	b=hhIV5qiW/OKcI6st1McJX9m2EALpIYRRgVD5fjoLn3oY1gJ4WDWwj/CCkf+RjANeGA
	g6hCdzhBv6zN88RP4HXWLKFXwwz/wljoZljJF6qm1xAQ/CTwkjj+CfZJgU9AgZS94LwH
	y/cY22eoR9lDzfflisljUEkdH/yHjrDGlMnC1GN/dxpVQ5GjDcAylTWPd3VcLIpj/qKx
	509OJFJdHdHusVULKZ1/hPS3Wf6VgZnTIBDUxEz1P3Nnw8vCHnea1TU65+iZQVgVQcKM
	t85RQp4uwHw5xDO8VOdrLXHM4ndGaGOXfL1yYK+xFe4egl1svgy3GWAdWxBDvI9UdIHY
	zTrg==
X-Gm-Message-State: AJaThX4gF4zXZ+TOFYpUgTcZri7d8iK+tfQMZ6+W1xpR1sKJGB2UGR3N
	Y2ORaPQbo5DIpmkJPCOuNCs4ksrIUlQ=
X-Google-Smtp-Source: 
 AGs4zMbiafKXjdy6fb1qD7eJbxD9s+t72ANCxqZ3dfMRvUIEr4KHklmdcFzVqge+KoiTnxAHrJn+pg==
X-Received: by 10.99.94.71 with SMTP id s68mr20288237pgb.242.1512543468954;
	Tue, 05 Dec 2017 22:57:48 -0800 (PST)
Received: from localhost.localdomain
	([2601:602:8801:8110:e6a7:a0ff:fe0b:c9a8])
	by smtp.gmail.com with ESMTPSA id
	i11sm2824213pfk.25.2017.12.05.22.57.48
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Tue, 05 Dec 2017 22:57:48 -0800 (PST)
From: Omar Sandoval <osandov@osandov.com>
To: linux-block@vger.kernel.org
Cc: Jens Axboe <axboe@fb.com>, kernel-team@fb.com
Subject: [PATCH v2] kyber: fix another domain token wait queue hang
Date: Tue,  5 Dec 2017 22:57:43 -0800
Message-Id: 
 <e5dfd4063f73561dde6508d70e51159595d42185.1512513903.git.osandov@fb.com>
X-Mailer: git-send-email 2.15.1
Sender: linux-block-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-block.vger.kernel.org>
X-Mailing-List: linux-block@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

From: Omar Sandoval <osandov@fb.com>

Commit 8cf466602028 ("kyber: fix hang on domain token wait queue") fixed
a hang caused by leaving wait entries on the domain token wait queue
after the __sbitmap_queue_get() retry succeeded, making that wait entry
a "dud" which won't in turn wake more entries up. However, we can also
get a dud entry if kyber_get_domain_token() fails once but is then
called again and succeeds. This can happen if the hardware queue is
rerun for some other reason, or, more likely, kyber_dispatch_request()
tries the same domain twice.

The fix is to remove our entry from the wait queue whenever we
successfully get a token. The only complication is that we might be on
one of many wait queues in the struct sbitmap_queue, but that's easily
fixed by remembering which wait queue we were put on.

While we're here, only initialize the wait queue entry once instead of
on every wait, and use spin_lock_irq() instead of spin_lock_irqsave(),
since this is always called from process context with irqs enabled.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
Changes from v1:
- Fixed the unintentional change from add_wait_queue() to
  add_wait_queue_exclusive(). This is likely the second bug I was
  chasing, but my stress run is still going.
- Fixed a commit message typo.

 block/kyber-iosched.c | 37 ++++++++++++++++++++++++-------------
 1 file changed, 24 insertions(+), 13 deletions(-)

diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c
index b4df317c2916..f95c60774ce8 100644
--- a/block/kyber-iosched.c
+++ b/block/kyber-iosched.c
@@ -100,9 +100,13 @@ struct kyber_hctx_data {
 	unsigned int cur_domain;
 	unsigned int batching;
 	wait_queue_entry_t domain_wait[KYBER_NUM_DOMAINS];
+	struct sbq_wait_state *domain_ws[KYBER_NUM_DOMAINS];
 	atomic_t wait_index[KYBER_NUM_DOMAINS];
 };
 
+static int kyber_domain_wake(wait_queue_entry_t *wait, unsigned mode, int flags,
+			     void *key);
+
 static int rq_sched_domain(const struct request *rq)
 {
 	unsigned int op = rq->cmd_flags;
@@ -385,6 +389,9 @@ static int kyber_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx)
 
 	for (i = 0; i < KYBER_NUM_DOMAINS; i++) {
 		INIT_LIST_HEAD(&khd->rqs[i]);
+		init_waitqueue_func_entry(&khd->domain_wait[i],
+					  kyber_domain_wake);
+		khd->domain_wait[i].private = hctx;
 		INIT_LIST_HEAD(&khd->domain_wait[i].entry);
 		atomic_set(&khd->wait_index[i], 0);
 	}
@@ -524,35 +531,39 @@ static int kyber_get_domain_token(struct kyber_queue_data *kqd,
 	int nr;
 
 	nr = __sbitmap_queue_get(domain_tokens);
-	if (nr >= 0)
-		return nr;
 
 	/*
 	 * If we failed to get a domain token, make sure the hardware queue is
 	 * run when one becomes available. Note that this is serialized on
 	 * khd->lock, but we still need to be careful about the waker.
 	 */
-	if (list_empty_careful(&wait->entry)) {
-		init_waitqueue_func_entry(wait, kyber_domain_wake);
-		wait->private = hctx;
+	if (nr < 0 && list_empty_careful(&wait->entry)) {
 		ws = sbq_wait_ptr(domain_tokens,
 				  &khd->wait_index[sched_domain]);
+		khd->domain_ws[sched_domain] = ws;
 		add_wait_queue(&ws->wait, wait);
 
 		/*
 		 * Try again in case a token was freed before we got on the wait
-		 * queue. The waker may have already removed the entry from the
-		 * wait queue, but list_del_init() is okay with that.
+		 * queue.
 		 */
 		nr = __sbitmap_queue_get(domain_tokens);
-		if (nr >= 0) {
-			unsigned long flags;
+	}
 
-			spin_lock_irqsave(&ws->wait.lock, flags);
-			list_del_init(&wait->entry);
-			spin_unlock_irqrestore(&ws->wait.lock, flags);
-		}
+	/*
+	 * If we got a token while we were on the wait queue, remove ourselves
+	 * from the wait queue to ensure that all wake ups make forward
+	 * progress. It's possible that the waker already deleted the entry
+	 * between the !list_empty_careful() check and us grabbing the lock, but
+	 * list_del_init() is okay with that.
+	 */
+	if (nr >= 0 && !list_empty_careful(&wait->entry)) {
+		ws = khd->domain_ws[sched_domain];
+		spin_lock_irq(&ws->wait.lock);
+		list_del_init(&wait->entry);
+		spin_unlock_irq(&ws->wait.lock);
 	}
+
 	return nr;
 }