From patchwork Mon Sep 18 22:03:06 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Keith Busch <keith.busch@intel.com>
X-Patchwork-Id: 9957701
Return-Path: <linux-block-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	C891360385 for <patchwork-linux-block@patchwork.kernel.org>;
	Mon, 18 Sep 2017 21:58:21 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BB76528CE5
	for <patchwork-linux-block@patchwork.kernel.org>;
	Mon, 18 Sep 2017 21:58:21 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id B040B28D81; Mon, 18 Sep 2017 21:58:21 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI
	autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1374728CE5
	for <patchwork-linux-block@patchwork.kernel.org>;
	Mon, 18 Sep 2017 21:58:21 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751548AbdIRV6U (ORCPT
	<rfc822;patchwork-linux-block@patchwork.kernel.org>);
	Mon, 18 Sep 2017 17:58:20 -0400
Received: from mga07.intel.com ([134.134.136.100]:3745 "EHLO mga07.intel.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1750768AbdIRV6T (ORCPT <rfc822;linux-block@vger.kernel.org>);
	Mon, 18 Sep 2017 17:58:19 -0400
Received: from orsmga002.jf.intel.com ([10.7.209.21])
	by orsmga105.jf.intel.com with ESMTP; 18 Sep 2017 14:58:19 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.42,414,1500966000"; d="scan'208";a="136758341"
Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.96])
	by orsmga002.jf.intel.com with ESMTP; 18 Sep 2017 14:58:18 -0700
From: Keith Busch <keith.busch@intel.com>
To: linux-block@vger.kernel.org, Jens Axboe <axboe@kernel.dk>,
	Christoph Hellwig <hch@lst.de>
Cc: Keith Busch <keith.busch@intel.com>
Subject: [RFC PATCH] blk-mq: Fix lost request during timeout
Date: Mon, 18 Sep 2017 18:03:06 -0400
Message-Id: <1505772186-4601-1-git-send-email-keith.busch@intel.com>
X-Mailer: git-send-email 2.5.5
Sender: linux-block-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-block.vger.kernel.org>
X-Mailing-List: linux-block@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

I think we've always known it's possible to lose a request during timeout
handling, but just accepted that possibility. It seems to be causing
problems, though, leading to unnecessary error escalation and IO failures.

The possiblity arises when the block layer marks the request complete
prior to running the timeout handler. If that request happens to complete
while the handler is running, the request will be lost, inevitably
triggering a second timeout.

This patch attempts to shorten the window for this race condition by
clearing the started flag when the driver completes a request. The block
layer's timeout handler will then complete the command if it observes
the started flag is no longer set.

Note it's possible to lose the command even with this patch. It's just
less likely to happen.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 block/blk-mq.c | 37 +++++++++++++++++++------------------
 1 file changed, 19 insertions(+), 18 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 98a1860..37144ef 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -566,6 +566,7 @@ void blk_mq_complete_request(struct request *rq)
 
 	if (unlikely(blk_should_fake_timeout(q)))
 		return;
+	clear_bit(REQ_ATOM_STARTED, &rq->atomic_flags);
 	if (!blk_mark_rq_complete(rq))
 		__blk_mq_complete_request(rq);
 }
@@ -605,19 +606,19 @@ void blk_mq_start_request(struct request *rq)
 	 * complete. So be sure to clear complete again when we start
 	 * the request, otherwise we'll ignore the completion event.
 	 */
-	if (!test_bit(REQ_ATOM_STARTED, &rq->atomic_flags))
+	if (!test_bit(REQ_ATOM_STARTED, &rq->atomic_flags)) {
 		set_bit(REQ_ATOM_STARTED, &rq->atomic_flags);
+		if (q->dma_drain_size && blk_rq_bytes(rq)) {
+			/*
+			 * Make sure space for the drain appears.  We know we can do
+			 * this because max_hw_segments has been adjusted to be one
+			 * fewer than the device can handle.
+			 */
+			rq->nr_phys_segments++;
+		}
+	}
 	if (test_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags))
 		clear_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags);
-
-	if (q->dma_drain_size && blk_rq_bytes(rq)) {
-		/*
-		 * Make sure space for the drain appears.  We know we can do
-		 * this because max_hw_segments has been adjusted to be one
-		 * fewer than the device can handle.
-		 */
-		rq->nr_phys_segments++;
-	}
 }
 EXPORT_SYMBOL(blk_mq_start_request);
 
@@ -637,11 +638,6 @@ static void __blk_mq_requeue_request(struct request *rq)
 	trace_block_rq_requeue(q, rq);
 	wbt_requeue(q->rq_wb, &rq->issue_stat);
 	blk_mq_sched_requeue_request(rq);
-
-	if (test_and_clear_bit(REQ_ATOM_STARTED, &rq->atomic_flags)) {
-		if (q->dma_drain_size && blk_rq_bytes(rq))
-			rq->nr_phys_segments--;
-	}
 }
 
 void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list)
@@ -763,10 +759,15 @@ void blk_mq_rq_timed_out(struct request *req, bool reserved)
 		__blk_mq_complete_request(req);
 		break;
 	case BLK_EH_RESET_TIMER:
-		blk_add_timer(req);
-		blk_clear_rq_complete(req);
-		break;
+		if (test_bit(REQ_ATOM_STARTED, &req->atomic_flags)) {
+			blk_add_timer(req);
+			blk_clear_rq_complete(req);
+			break;
+		}
+		/* Fall through */
 	case BLK_EH_NOT_HANDLED:
+		if (!test_bit(REQ_ATOM_STARTED, &req->atomic_flags))
+			__blk_mq_complete_request(req);
 		break;
 	default:
 		printk(KERN_ERR "block: bad eh return: %d\n", ret);