From patchwork Tue Apr 10 21:54:56 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 10333887 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 6E1E06028A for ; Tue, 10 Apr 2018 21:55:03 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5AAAD2848B for ; Tue, 10 Apr 2018 21:55:03 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4C0C8284B3; Tue, 10 Apr 2018 21:55:03 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI, T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B7DA52848B for ; Tue, 10 Apr 2018 21:55:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752078AbeDJVzB (ORCPT ); Tue, 10 Apr 2018 17:55:01 -0400 Received: from mail-yb0-f194.google.com ([209.85.213.194]:33697 "EHLO mail-yb0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751799AbeDJVzA (ORCPT ); Tue, 10 Apr 2018 17:55:00 -0400 Received: by mail-yb0-f194.google.com with SMTP id f5-v6so4867109ybg.0; Tue, 10 Apr 2018 14:55:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=mwdLHvXcFtyfgTraS3igeGPrYs209mB+xz4dvfS23CA=; b=ZiQxRREKwjCqfZutx6uKTmvO24JfU4stnlusbf2t4n82sQhwTz7f4zCfnWrWWzcxrS HgRmjRVnlfN8uSaht1ptUv8E4sG962vkR6L38osOZRQTDf3n1FYIgUJJE/n20pff4p84 ycNrakT0UHuQHMVcrUPx6f3+IfE45kFktTc0ZOyvuhuvk4inSETTrFoo/VkjLBbmAOag ZLLCFQgFt9YLiPmKtklg92dwd5H6GUebD5barPL8ym9ifkchrPttS9IeBayYm/OGEe0J orOcy0s8/nzmG5jExOiRbEYbW/U6uNj8ywiHeHTy9Z0zqZsIbj4o6RoC6Z05J4JQVPiS Z9KA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=mwdLHvXcFtyfgTraS3igeGPrYs209mB+xz4dvfS23CA=; b=FLOXDlW/w3Xh+D44jU5xOFf50CmMEppBLH4C3ba683ps7S6IkbIYUttV77+Fo1NcmJ m5H090tb2Ru7nEEYIgNF4q0XNxRywC3oy1TXSSGtW+nTQf1nUpmFKkbZ+hWefscrOsjK z4rZAQk/aBwec+3JWICtCRH9iMlh13kUXH2O3TI5Dor809iM6M+WZy59WqqbcsfQf/Xh ViXm7yiicnDlBTfhFnQ+NkQe2kzCm0MP+erUfmJGp7bWL4MnmbOasUL9rG47FP9rEW4s P2WzP/E3iaRQvwkf6ur7lU4QvOexuM2FWd4af+RmyZqTjcRZiZRUQHsQ0qVOT7DXQHqj Ch5w== X-Gm-Message-State: ALQs6tCG+GxonAfZFwcDMH5LBnqfI5MgoT4pTFN5ZYf+X2X1JABktMn4 7hMUar6EpTwEAJQD8qX6qbs= X-Google-Smtp-Source: AIpwx4/6xqTfwTGdQ5FW4/ARLltiua2wE89I/G+vTNgrZUVTTZlJ6G/LQoiTK6skc4ioiSG2Nxtuow== X-Received: by 2002:a25:aba1:: with SMTP id v30-v6mr105367ybi.388.1523397299432; Tue, 10 Apr 2018 14:54:59 -0700 (PDT) Received: from localhost ([2620:10d:c091:180::1:f676]) by smtp.gmail.com with ESMTPSA id n7sm1555739ywc.109.2018.04.10.14.54.57 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 10 Apr 2018 14:54:58 -0700 (PDT) Date: Tue, 10 Apr 2018 14:54:56 -0700 From: "tj@kernel.org" To: Bart Van Assche Cc: "ming.lei@redhat.com" , "hch@lst.de" , "maxg@mellanox.com" , "israelr@mellanox.com" , "linux-block@vger.kernel.org" , "stable@vger.kernel.org" , "sagi@grimberg.me" , "axboe@kernel.dk" Subject: Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling Message-ID: <20180410215456.GD793541@devbig577.frc2.facebook.com> References: <20180410135541.GA22340@ming.t460p> <7c4a8b019182d5a9259ef20e6462b3f6a533abed.camel@wdc.com> <20180410143007.GB22340@ming.t460p> <20180410152553.GC22340@ming.t460p> <20180410153031.GO3126663@devbig577.frc2.facebook.com> <20180410153812.GA3219@ming.t460p> <20180410154043.GP3126663@devbig577.frc2.facebook.com> <20180410213323.GC793541@devbig577.frc2.facebook.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Tue, Apr 10, 2018 at 09:46:23PM +0000, Bart Van Assche wrote: > On Tue, 2018-04-10 at 14:33 -0700, tj@kernel.org wrote: > > + else > > + rq->missed_completion = true; > > In this patch I found code that sets rq->missed_completion but no code that > clears it. Did I perhaps miss something? Ah, yeah, I was moving it out of add_timer but forgot to actully add it to the issue path. Fixed patch below. BTW, no matter what we do w/ the request handover between normal and timeout paths, we'd need something similar. Otherwise, while we can reduce the window, we can't get rid of it. Thanks. --- block/blk-mq.c | 45 ++++++++++++++++++++++++++++++++------------- include/linux/blkdev.h | 2 ++ 2 files changed, 34 insertions(+), 13 deletions(-) --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -642,6 +642,8 @@ void blk_mq_complete_request(struct requ hctx_lock(hctx, &srcu_idx); if (blk_mq_rq_aborted_gstate(rq) != rq->gstate) __blk_mq_complete_request(rq); + else + rq->missed_completion = true; hctx_unlock(hctx, srcu_idx); } EXPORT_SYMBOL(blk_mq_complete_request); @@ -684,6 +686,7 @@ void blk_mq_start_request(struct request blk_mq_rq_update_state(rq, MQ_RQ_IN_FLIGHT); blk_add_timer(rq); + rq->missed_completion = false; write_seqcount_end(&rq->gstate_seq); preempt_enable(); @@ -881,7 +884,7 @@ static void blk_mq_check_expired(struct } } -static void blk_mq_timeout_sync_rcu(struct request_queue *q) +static void blk_mq_timeout_sync_rcu(struct request_queue *q, bool clear) { struct blk_mq_hw_ctx *hctx; bool has_rcu = false; @@ -896,7 +899,8 @@ static void blk_mq_timeout_sync_rcu(stru else synchronize_srcu(hctx->srcu); - hctx->need_sync_rcu = false; + if (clear) + hctx->need_sync_rcu = false; } if (has_rcu) synchronize_rcu(); @@ -917,25 +921,37 @@ static void blk_mq_terminate_expired(str blk_mq_rq_timed_out(hctx, rq, priv, reserved); } -static void blk_mq_finish_timeout_reset(struct blk_mq_hw_ctx *hctx, +static void blk_mq_timeout_reset_return(struct blk_mq_hw_ctx *hctx, struct request *rq, void *priv, bool reserved) { /* * @rq's timer reset has gone through rcu synchronization and is * visible now. Allow normal completions again by resetting * ->aborted_gstate. Don't clear RQF_MQ_TIMEOUT_RESET here as - * there's no memory ordering around ->aborted_gstate making it the - * only field safe to update. Let blk_add_timer() clear it later - * when the request is recycled or times out again. - * - * As nothing prevents from completion happening while - * ->aborted_gstate is set, this may lead to ignored completions - * and further spurious timeouts. + * blk_mq_timeout_reset_cleanup() needs it again and there's no + * memory ordering around ->aborted_gstate making it the only field + * safe to update. Let blk_add_timer() clear it later when the + * request is recycled or times out again. */ if (rq->rq_flags & RQF_MQ_TIMEOUT_RESET) blk_mq_rq_update_aborted_gstate(rq, 0); } +static void blk_mq_timeout_reset_cleanup(struct blk_mq_hw_ctx *hctx, + struct request *rq, void *priv, bool reserved) +{ + /* + * @rq is now fully returned to the normal path. If normal + * completion raced timeout handling, execute the missed completion + * here. This is safe because 1. ->missed_completion can no longer + * be asserted because nothing is timing out right now and 2. if + * ->missed_completion is set, @rq is safe from recycling because + * nobody could have completed it. + */ + if ((rq->rq_flags & RQF_MQ_TIMEOUT_RESET) && rq->missed_completion) + blk_mq_complete_request(rq); +} + static void blk_mq_timeout_work(struct work_struct *work) { struct request_queue *q = @@ -976,7 +992,7 @@ static void blk_mq_timeout_work(struct w * becomes a problem, we can add per-hw_ctx rcu_head and * wait in parallel. */ - blk_mq_timeout_sync_rcu(q); + blk_mq_timeout_sync_rcu(q, true); /* terminate the ones we won */ blk_mq_queue_tag_busy_iter(q, blk_mq_terminate_expired, @@ -988,9 +1004,12 @@ static void blk_mq_timeout_work(struct w * reset racing against recycling. */ if (nr_resets) { - blk_mq_timeout_sync_rcu(q); + blk_mq_timeout_sync_rcu(q, false); + blk_mq_queue_tag_busy_iter(q, + blk_mq_timeout_reset_return, NULL); + blk_mq_timeout_sync_rcu(q, true); blk_mq_queue_tag_busy_iter(q, - blk_mq_finish_timeout_reset, NULL); + blk_mq_timeout_reset_cleanup, NULL); } } --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -227,6 +227,8 @@ struct request { unsigned int extra_len; /* length of alignment and padding */ + bool missed_completion; + /* * On blk-mq, the lower bits of ->gstate (generation number and * state) carry the MQ_RQ_* state value and the upper bits the