From patchwork Mon Apr 2 19:01:20 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 10320215 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 8630B60116 for ; Mon, 2 Apr 2018 19:02:15 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6C3B628AD1 for ; Mon, 2 Apr 2018 19:02:15 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6AE9428AF2; Mon, 2 Apr 2018 19:02:15 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5ACCE28C45 for ; Mon, 2 Apr 2018 19:01:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756654AbeDBTBY (ORCPT ); Mon, 2 Apr 2018 15:01:24 -0400 Received: from mail-yw0-f196.google.com ([209.85.161.196]:37906 "EHLO mail-yw0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755480AbeDBTBX (ORCPT ); Mon, 2 Apr 2018 15:01:23 -0400 Received: by mail-yw0-f196.google.com with SMTP id x20so5316311ywg.5; Mon, 02 Apr 2018 12:01:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=xTuoCsruP+T478lTp89qp63tb7UpgNCNf3lZShx0K/U=; b=PtVV5mtogAk29ZPTByoJj9biHYiRYhKlEQwyubwje3QugAR6whMmYjpOKta1CwgGQv qJRC5adZFVQKGsQx/W8RfB2R+ilAGhpCqjsqLe4imj7ptCa7I721lXC9jMMJ7BbtxKOz xBQ9tw2oCFuPjbby4Z1nhg+7brRDbIkeOkJKWhEX7Pb3071H39wXQ6izysY2qFOyaCrP OOE/tdYn87cSVZcu6+e3rmxUr79D48n7F5dzyIlo3UVOG4OldZ8VNaKYuNEBw5jC8t9M BeZ2q8LicWip7nPKoSgRNN7G03I78GxN/ztEtWkCXdaXVRMq+oVfRAjAlB3KVOAp30X0 lCDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=xTuoCsruP+T478lTp89qp63tb7UpgNCNf3lZShx0K/U=; b=b/tdV17+FMLkfnXs+lVtfIn/xqZZmcFb+8nfiBMr2A8ysWnn0Z+oWIHuNm2SGlYp1L CyY5jOryuAIDS3JCOqP/e1jQQiZCPaZdNKnFtB78WpnuTdLKthqmHCa8EZqDu/SCt5Ti mjG+6OmFyn4jjMaDgDmBnvpe4i8ZOd7Uq5mg/bJDkZxekHaSFEh8myT5IQdNa0fsUqdo fdh5shvH7zcQBcwBdt/ZXcL7xRjmZ6UDdL9c6fGuiw2vJTcnR59kqV5SF0Cln5u+KxKS oU0n5z0svI2Fq2lYqVJy1L7YaagBbZkhrSt73BCS0iyvNBm4PwtYBqJBquJIq0i7cYKo bsmw== X-Gm-Message-State: AElRT7GNdD94OTTS2ecJS9Uy3ewfd5FMyVQ9fb6v+CFSemwN1ND1AfgN p1zZFemPyuLpyEofPm37RNg= X-Google-Smtp-Source: AIpwx4/jD6zjtbLYhLChlLqjkB8nnqrTj64b0k8VOVdMUAQqyjN4lam1V+5d/m9DdnjQoyex/KPg9Q== X-Received: by 10.13.254.5 with SMTP id o5mr6020116ywf.167.1522695682490; Mon, 02 Apr 2018 12:01:22 -0700 (PDT) Received: from localhost ([2620:10d:c091:180::1:23de]) by smtp.gmail.com with ESMTPSA id z20sm439720ywj.70.2018.04.02.12.01.21 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 02 Apr 2018 12:01:21 -0700 (PDT) Date: Mon, 2 Apr 2018 12:01:20 -0700 From: Tejun Heo To: Jens Axboe Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 2/2] blk-mq: Fix request handover from timeout path to normal execution Message-ID: <20180402190120.GD388343@devbig577.frc2.facebook.com> References: <20180402190053.GC388343@devbig577.frc2.facebook.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20180402190053.GC388343@devbig577.frc2.facebook.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When a request is handed over from normal execution to timeout, we synchronize using ->aborted_gstate and RCU grace periods; however, when a request is being returned from timeout handling to normal execution for BLK_EH_RESET_TIMER, we were skipping the same synchronization. This means that it theoretically is possible for a returned request's completion and recycling compete against the reordered and delayed writes from timeout path. This patch adds an equivalent synchronization when a request is returned from timeout path to normal completion path. Signed-off-by: Tejun Heo Cc: Bart Van Assche --- block/blk-mq.c | 49 ++++++++++++++++++++++++++++++++++++++++--------- block/blk-timeout.c | 2 +- include/linux/blkdev.h | 4 +++- 3 files changed, 44 insertions(+), 11 deletions(-) --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -818,7 +818,8 @@ struct blk_mq_timeout_data { unsigned int nr_expired; }; -static void blk_mq_rq_timed_out(struct request *req, bool reserved) +static void blk_mq_rq_timed_out(struct blk_mq_hw_ctx *hctx, struct request *req, + int *nr_resets, bool reserved) { const struct blk_mq_ops *ops = req->q->mq_ops; enum blk_eh_timer_return ret = BLK_EH_RESET_TIMER; @@ -833,13 +834,10 @@ static void blk_mq_rq_timed_out(struct r __blk_mq_complete_request(req); break; case BLK_EH_RESET_TIMER: - /* - * As nothing prevents from completion happening while - * ->aborted_gstate is set, this may lead to ignored - * completions and further spurious timeouts. - */ - blk_mq_rq_update_aborted_gstate(req, 0); blk_add_timer(req); + req->rq_flags |= RQF_MQ_TIMEOUT_RESET; + (*nr_resets)++; + hctx->need_sync_rcu = true; break; case BLK_EH_NOT_HANDLED: break; @@ -916,7 +914,26 @@ static void blk_mq_terminate_expired(str */ if (!(rq->rq_flags & RQF_MQ_TIMEOUT_EXPIRED) && READ_ONCE(rq->gstate) == rq->aborted_gstate) - blk_mq_rq_timed_out(rq, reserved); + blk_mq_rq_timed_out(hctx, rq, priv, reserved); +} + +static void blk_mq_finish_timeout_reset(struct blk_mq_hw_ctx *hctx, + struct request *rq, void *priv, bool reserved) +{ + /* + * @rq's timer reset has gone through rcu synchronization and is + * visible now. Allow normal completions again by resetting + * ->aborted_gstate. Don't clear RQF_MQ_TIMEOUT_RESET here as + * there's no memory ordering around ->aborted_gstate making it the + * only field safe to update. Let blk_add_timer() clear it later + * when the request is recycled or times out again. + * + * As nothing prevents from completion happening while + * ->aborted_gstate is set, this may lead to ignored completions + * and further spurious timeouts. + */ + if (rq->rq_flags & RQF_MQ_TIMEOUT_RESET) + blk_mq_rq_update_aborted_gstate(rq, 0); } static void blk_mq_timeout_work(struct work_struct *work) @@ -951,6 +968,8 @@ static void blk_mq_timeout_work(struct w blk_mq_queue_tag_busy_iter(q, blk_mq_check_expired, &data); if (data.nr_expired) { + int nr_resets = 0; + /* * Wait till everyone sees ->aborted_gstate. The * sequential waits for SRCUs aren't ideal. If this ever @@ -960,7 +979,19 @@ static void blk_mq_timeout_work(struct w blk_mq_timeout_sync_rcu(q); /* terminate the ones we won */ - blk_mq_queue_tag_busy_iter(q, blk_mq_terminate_expired, NULL); + blk_mq_queue_tag_busy_iter(q, blk_mq_terminate_expired, + &nr_resets); + + /* + * For BLK_EH_RESET_TIMER, release the requests after + * blk_add_timer() from above is visible to avoid timer + * reset racing against recycling. + */ + if (nr_resets) { + blk_mq_timeout_sync_rcu(q); + blk_mq_queue_tag_busy_iter(q, + blk_mq_finish_timeout_reset, NULL); + } } if (data.next_set) { --- a/block/blk-timeout.c +++ b/block/blk-timeout.c @@ -216,7 +216,7 @@ void blk_add_timer(struct request *req) req->timeout = q->rq_timeout; blk_rq_set_deadline(req, jiffies + req->timeout); - req->rq_flags &= ~RQF_MQ_TIMEOUT_EXPIRED; + req->rq_flags &= ~(RQF_MQ_TIMEOUT_EXPIRED | RQF_MQ_TIMEOUT_RESET); /* * Only the non-mq case needs to add the request to a protected list. --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -127,8 +127,10 @@ typedef __u32 __bitwise req_flags_t; #define RQF_ZONE_WRITE_LOCKED ((__force req_flags_t)(1 << 19)) /* timeout is expired */ #define RQF_MQ_TIMEOUT_EXPIRED ((__force req_flags_t)(1 << 20)) +/* timeout is expired */ +#define RQF_MQ_TIMEOUT_RESET ((__force req_flags_t)(1 << 21)) /* already slept for hybrid poll */ -#define RQF_MQ_POLL_SLEPT ((__force req_flags_t)(1 << 21)) +#define RQF_MQ_POLL_SLEPT ((__force req_flags_t)(1 << 22)) /* flags that prevent us from merging requests: */ #define RQF_NOMERGE_FLAGS \