From patchwork Fri Nov 11 19:24:12 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 9423591 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id D029D6047D for ; Fri, 11 Nov 2016 19:24:16 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BADB029B31 for ; Fri, 11 Nov 2016 19:24:16 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id AFEAF29B52; Fri, 11 Nov 2016 19:24:16 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.3 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM, T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1CA0529B31 for ; Fri, 11 Nov 2016 19:24:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934789AbcKKTYP (ORCPT ); Fri, 11 Nov 2016 14:24:15 -0500 Received: from mail-it0-f51.google.com ([209.85.214.51]:33415 "EHLO mail-it0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932697AbcKKTYO (ORCPT ); Fri, 11 Nov 2016 14:24:14 -0500 Received: by mail-it0-f51.google.com with SMTP id e187so218771itc.0 for ; Fri, 11 Nov 2016 11:24:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding; bh=w8Qk/jbDtz1944H8BZsYyK9euZ7/ftmmzNMuxH+xhxE=; b=upfQ4tiB3Cr/F+VxdXH5nNVcscUHZSRQQaA4532712+3Fl9l/XBfkdqQZX6diy9pkY RW6oDaAsPdaZji7GqPuacCaJdIs9ifzkNBjgWEZLz7DBzFpR9vOinCp/RuFZW5m1UyEe yove2AZbhrMdBqk0ycF2Bf91dHtiLKOzrpaqQDx7cw0CdzXLxmdRhWUfmV8jy5mNJTn4 KkW3phzBISPnTeH8JyMHC2ruWv5wbtq+1Nfw1IHgpX24+QWcrHcKplPOG9vO8McfRwpN VjGi7wGRWKfI1HLXThZA45NjKYOiRoVyKLqZ/zbx8Hoj08YQE5xPAhxAsmuTf24GLN+p DxAg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=w8Qk/jbDtz1944H8BZsYyK9euZ7/ftmmzNMuxH+xhxE=; b=FOGIqPUWjaf0tELqbKTHUnspsbVV+PdngPuvzIMNjeFtLiuA6sX7E+wAGk2l181EqU cinhdhYpyxqlf0cqylwu5Fsl1hJOwd6vbBYQqrF+CoAu9JCaxWCDtzrMMbna+PWuc/AR QlYU59uw8clC7OYqY+MkdMUiNUC6Ot3ZLJjixXDNrPqviACAam4oaaEWGniZbzXpkjBz Tx/CD/h1+FulpL3NFVftAjcsz2h1crv19SPqO9bjTqvMavx/NP+etGgetgRXlwOokq9L Xg2czuNnYMA6YUX5AKyVjDSt4La5Z+fMLRq63zQ63eVdn+cSQiVj9CU6saEwgVmpABmv wz+g== X-Gm-Message-State: ABUngvdapICGvcvGW9vV35Gxde8B70r4t79PmBWa4fYp9dlzPAvgX/Om3mraEG1E+UNMDQ== X-Received: by 10.36.77.75 with SMTP id l72mr10294009itb.65.1478892253511; Fri, 11 Nov 2016 11:24:13 -0800 (PST) Received: from [192.168.1.129] ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id l14sm4401660ioi.18.2016.11.11.11.24.12 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 11 Nov 2016 11:24:12 -0800 (PST) Subject: Re: Regression: nvme timeouts and oopses To: Bart Van Assche References: <6cd8bb58-6749-02e2-ee27-46d8813cbfd2@kernel.dk> Cc: "linux-block@vger.kernel.org" , Christoph Hellwig , Keith Busch From: Jens Axboe Message-ID: <01578594-1b99-a80d-8032-c18a67f1ed3f@kernel.dk> Date: Fri, 11 Nov 2016 12:24:12 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <6cd8bb58-6749-02e2-ee27-46d8813cbfd2@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 11/11/2016 12:18 PM, Jens Axboe wrote: > On 11/11/2016 11:47 AM, Jens Axboe wrote: >> Hi, >> >> I've been running into problems when stability testing my 4.10 branch, >> and I finally got an easy reproducer today (on the laptop, no less) and >> was able to bisect it. Boils down to this: >> >> 2253efc850c4cf690516bbc07854eeb1077202ba is the first bad commit >> commit 2253efc850c4cf690516bbc07854eeb1077202ba >> Author: Bart Van Assche >> Date: Fri Oct 28 17:20:02 2016 -0700 >> >> blk-mq: Move more code into blk_mq_direct_issue_request() > > I think I see what it is - you're grabbing a request off the plug list, > and then you assume that it's the same hardware queue. That's not safe. > So you end up issuing a request from hwq A to hwq B, oops. Let me test > that theory with a patch. Yep that's it. With the below patch, things work fine again. .list = NULL, @@ -1414,11 +1414,11 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio) if (!(data.hctx->flags & BLK_MQ_F_BLOCKING)) { rcu_read_lock(); - blk_mq_try_issue_directly(data.hctx, old_rq, &cookie); + blk_mq_try_issue_directly(old_rq, &cookie); rcu_read_unlock(); } else { srcu_idx = srcu_read_lock(&data.hctx->queue_rq_srcu); - blk_mq_try_issue_directly(data.hctx, old_rq, &cookie); + blk_mq_try_issue_directly(old_rq, &cookie); srcu_read_unlock(&data.hctx->queue_rq_srcu, srcu_idx); } goto done; diff --git a/block/blk-mq.c b/block/blk-mq.c index d180c989a0e5..77110aed24ea 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1291,11 +1291,11 @@ static struct request *blk_mq_map_request(struct request_queue *q, return rq; } -static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, - struct request *rq, blk_qc_t *cookie) +static void blk_mq_try_issue_directly(struct request *rq, blk_qc_t *cookie) { int ret; struct request_queue *q = rq->q; + struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, rq->mq_ctx->cpu); struct blk_mq_queue_data bd = { .rq = rq,