From patchwork Wed Aug 24 20:36:38 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jens Axboe <axboe@kernel.dk>
X-Patchwork-Id: 9298283
Return-Path: <linux-block-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	DAF3B607F0 for <patchwork-linux-block@patchwork.kernel.org>;
	Wed, 24 Aug 2016 20:36:47 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CC8932911A
	for <patchwork-linux-block@patchwork.kernel.org>;
	Wed, 24 Aug 2016 20:36:47 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id C13412911D; Wed, 24 Aug 2016 20:36:47 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED,
	RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 379622911A
	for <patchwork-linux-block@patchwork.kernel.org>;
	Wed, 24 Aug 2016 20:36:47 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1756050AbcHXUgm (ORCPT
	<rfc822;patchwork-linux-block@patchwork.kernel.org>);
	Wed, 24 Aug 2016 16:36:42 -0400
Received: from mail-it0-f53.google.com ([209.85.214.53]:38604 "EHLO
	mail-it0-f53.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1755728AbcHXUgl (ORCPT
	<rfc822;linux-block@vger.kernel.org>);
	Wed, 24 Aug 2016 16:36:41 -0400
Received: by mail-it0-f53.google.com with SMTP id n128so55942073ith.1
	for <linux-block@vger.kernel.org>;
	Wed, 24 Aug 2016 13:36:40 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=kernel-dk.20150623.gappssmtp.com; s=20150623;
	h=subject:to:references:cc:from:message-id:date:user-agent
	:mime-version:in-reply-to:content-transfer-encoding;
	bh=NuYpPPOgFQr913u2EJ/JZSaaFKN5GZXPB0V1h+VK6kU=;
	b=JGFFX/enK/ZdVh7gfyy6tla3MM2/TDo7vVxi9OBFo+ps6G3aR3YWCgbaTYYCQ7ZbEL
	RAso9oNciB/4TcsjtdqUu6hzEnbfOBGiego7j5iEQpKhoNwxooj2zGA6YoInGceOUV8J
	S6StrdMNw93R4nrKHBDjcroqlLvlxef7PPExQKU2Nfo47dasv1+0xQxxkLt5RhzrD0a8
	c0z5wKw7U7mFNApC79s/QCtgtB2RQdyDo1KwLJUOXNfympj0SN1DOKTTSB5xpFEL7PkA
	HU759Yq/U/6wAE4YR71Ex28b2LDpfswSv+x69kjfp2qKv2O4P92iW6Y+TMyj9FTzb3Yz
	WhhQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20130820;
	h=x-gm-message-state:subject:to:references:cc:from:message-id:date
	:user-agent:mime-version:in-reply-to:content-transfer-encoding;
	bh=NuYpPPOgFQr913u2EJ/JZSaaFKN5GZXPB0V1h+VK6kU=;
	b=MkVwHga1zgI5SKxAOVi1MeRfl5tpHCGHyxrrl64rtVh/1MhJHFFQCx1GZyokzOvNA8
	e/cKc6T0QEdZCNRi82tBGf+aQHgf6lSnoZoLqKyE+A02MCt0H/n65AH2HyhNpAEfj0Zv
	uMq9nQG4CfJCPAvN72hXvpL8AsJeuXhyPSxoAckdg5Hp41OFz7UdBpoTWsfAdwWSXEWq
	vNOAZbnKkcIf7udL4ECPzpdmnLMiXK6/1Amuey5fsfsIfs4ToKAIqmS/l8B1AjR5mNnr
	Fc7euul1dee3VdgYUncQXewltsUHNcSIITBsnzKJcKQVfLWhFMTjv6pKR3C4gaYRLhN6
	C4xA==
X-Gm-Message-State: 
 AE9vXwMFH+puACBomyztSB7kW+BHS/6sypPFEFuK4kjmpMsXT+J+QXF5t7EhUoLGF+xmiw==
X-Received: by 10.107.10.92 with SMTP id u89mr7052877ioi.152.1472070999898;
	Wed, 24 Aug 2016 13:36:39 -0700 (PDT)
Received: from [192.168.1.153] ([216.160.245.98])
	by smtp.gmail.com with ESMTPSA id
	j63sm4039996itj.19.2016.08.24.13.36.38
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Wed, 24 Aug 2016 13:36:39 -0700 (PDT)
Subject: Re: Oops when completing request on the wrong queue
To: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
References: <87a8gltgks.fsf@linux.vnet.ibm.com>
	<871t1kq455.fsf@linux.vnet.ibm.com>
	<bb6f8757-4c3d-8f49-8aff-aa0fdb8bae89@kernel.dk>
	<8fc9ae38-9488-ef52-f620-08499edebffa@kernel.dk>
	<87shu0hfye.fsf@linux.vnet.ibm.com>
	<87a8g39pg4.fsf@linux.vnet.ibm.com>
	<43693064-dd37-92ce-7753-2a8edb43eab5@kernel.dk>
	<164a4c63-065b-b766-36f3-bcef4aa46a38@kernel.dk>
	<49a954e6-2f96-8a63-ce15-2c82c1a1d36d@kernel.dk>
Cc: Keith Busch <keith.busch@intel.com>,
	Christoph Hellwig <hch@lst.de>, linux-nvme@lists.infradead.org,
	Brian King <brking@linux.vnet.ibm.com>,
	linux-block@vger.kernel.org, linux-scsi@vger.kernel.org
From: Jens Axboe <axboe@kernel.dk>
Message-ID: <dbe42007-8109-2e21-d0f3-0778007cd152@kernel.dk>
Date: Wed, 24 Aug 2016 14:36:38 -0600
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
	Thunderbird/45.2.0
MIME-Version: 1.0
In-Reply-To: <49a954e6-2f96-8a63-ce15-2c82c1a1d36d@kernel.dk>
Sender: linux-block-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-block.vger.kernel.org>
X-Mailing-List: linux-block@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

On 08/24/2016 12:34 PM, Jens Axboe wrote:
> On 08/23/2016 03:14 PM, Jens Axboe wrote:
>> On 08/23/2016 03:11 PM, Jens Axboe wrote:
>>> On 08/23/2016 02:54 PM, Gabriel Krisman Bertazi wrote:
>>>> Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> writes:
>>>>
>>>>>> Can you share what you ran to online/offline CPUs? I can't reproduce
>>>>>> this here.
>>>>>
>>>>> I was using the ppc64_cpu tool, which shouldn't do nothing more than
>>>>> write to sysfs.  but I just reproduced it with the script below.
>>>>>
>>>>> Note that this is ppc64le.  I don't have a x86 in hand to attempt to
>>>>> reproduce right now, but I'll look for one and see how it goes.
>>>>
>>>> Hi,
>>>>
>>>> Any luck on reproducing it?  We were initially reproducing with a
>>>> proprietary stress test, but I gave a try to a generated fio jobfile
>>>> associated with the SMT script I shared earlier and I could reproduce
>>>> the crash consistently in less than 10 minutes of execution.  this was
>>>> still ppc64le, though.  I couldn't get my hands on nvme on x86 yet.
>>>
>>> Nope, I have not been able to reproduce it. How long does the CPU
>>> offline/online actions take on ppc64? It's pretty slow on x86, which may
>>> hide the issue. I took out the various printk's associated with bringing
>>> a CPU off/online, as well as IRQ breaking parts, but didn't help in
>>> reproducing it.
>>>
>>>> The job file I used, as well as the smt.sh script, in case you want to
>>>> give it a try:
>>>>
>>>> jobfile: http://krisman.be/k/nvmejob.fio
>>>> smt.sh:  http://krisman.be/k/smt.sh
>>>>
>>>> Still, the trigger seems to be consistently a heavy load of IO
>>>> associated with CPU addition/removal.
>>>
>>> My workload looks similar to yours, in that it's high depth and with a
>>> lot of jobs to keep most CPUs loaded. My bash script is different than
>>> yours, I'll try that and see if it helps here.
>>
>> Actually, I take that back. You're not using O_DIRECT, hence all your
>> jobs are running at QD=1, not the 256 specified. That looks odd, but
>> I'll try, maybe it'll hit something different.
>
> Can you try this patch? It's not perfect, but I'll be interested if it
> makes a difference for you.

This one should handle the WARN_ON() for running the hw queue on the
wrong CPU as well.


  	/*
@@ -1075,15 +1082,11 @@ static void __blk_mq_insert_request(struct 
blk_mq_hw_ctx *hctx,
  }

  void blk_mq_insert_request(struct request *rq, bool at_head, bool 
run_queue,
-		bool async)
+			   bool async)
  {
+	struct blk_mq_ctx *ctx = rq->mq_ctx;
  	struct request_queue *q = rq->q;
  	struct blk_mq_hw_ctx *hctx;
-	struct blk_mq_ctx *ctx = rq->mq_ctx, *current_ctx;
-
-	current_ctx = blk_mq_get_ctx(q);
-	if (!cpu_online(ctx->cpu))
-		rq->mq_ctx = ctx = current_ctx;

  	hctx = q->mq_ops->map_queue(q, ctx->cpu);

@@ -1093,8 +1096,6 @@ void blk_mq_insert_request(struct request *rq, 
bool at_head, bool run_queue,

  	if (run_queue)
  		blk_mq_run_hw_queue(hctx, async);
-
-	blk_mq_put_ctx(current_ctx);
  }

  static void blk_mq_insert_requests(struct request_queue *q,
@@ -1105,14 +1106,9 @@ static void blk_mq_insert_requests(struct 
request_queue *q,

  {
  	struct blk_mq_hw_ctx *hctx;
-	struct blk_mq_ctx *current_ctx;

  	trace_block_unplug(q, depth, !from_schedule);

-	current_ctx = blk_mq_get_ctx(q);
-
-	if (!cpu_online(ctx->cpu))
-		ctx = current_ctx;
  	hctx = q->mq_ops->map_queue(q, ctx->cpu);

  	/*
@@ -1125,14 +1121,12 @@ static void blk_mq_insert_requests(struct 
request_queue *q,

  		rq = list_first_entry(list, struct request, queuelist);
  		list_del_init(&rq->queuelist);
-		rq->mq_ctx = ctx;
  		__blk_mq_insert_req_list(hctx, ctx, rq, false);
  	}
  	blk_mq_hctx_mark_pending(hctx, ctx);
  	spin_unlock(&ctx->lock);

  	blk_mq_run_hw_queue(hctx, from_schedule);
-	blk_mq_put_ctx(current_ctx);
  }

  static int plug_ctx_cmp(void *priv, struct list_head *a, struct 
list_head *b)
@@ -1692,6 +1686,11 @@ static int blk_mq_hctx_cpu_offline(struct 
blk_mq_hw_ctx *hctx, int cpu)
  	while (!list_empty(&tmp)) {
  		struct request *rq;

+		/*
+		 * FIXME: we can't just move the req here. We'd have to
+		 * pull off the bio chain and add it to a new request
+		 * on the target hw queue
+		 */
  		rq = list_first_entry(&tmp, struct request, queuelist);
  		rq->mq_ctx = ctx;
  		list_move_tail(&rq->queuelist, &ctx->rq_list);

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 758a9b5..b21a9b9 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -810,11 +810,12 @@ static void __blk_mq_run_hw_queue(struct 
blk_mq_hw_ctx *hctx)
  	struct list_head *dptr;
  	int queued;

-	WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask));
-
  	if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
  		return;

+	WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
+		cpu_online(hctx->next_cpu));
+
  	hctx->run++;