From patchwork Sun Feb 18 13:11:44 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 10226751 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 525C0602DC for ; Sun, 18 Feb 2018 13:11:52 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2F40428AA8 for ; Sun, 18 Feb 2018 13:11:52 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1D5A428AC5; Sun, 18 Feb 2018 13:11:52 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7582F28AA8 for ; Sun, 18 Feb 2018 13:11:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751338AbeBRNLu (ORCPT ); Sun, 18 Feb 2018 08:11:50 -0500 Received: from mail-qt0-f195.google.com ([209.85.216.195]:46637 "EHLO mail-qt0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751292AbeBRNLt (ORCPT ); Sun, 18 Feb 2018 08:11:49 -0500 Received: by mail-qt0-f195.google.com with SMTP id u6so9212612qtg.13 for ; Sun, 18 Feb 2018 05:11:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=n20mlDwGkDu0MNAnjQjDlimCHMNCPPvJjV28uvbuDA0=; b=Bgp0t2ZxOsxCyA7ER0ke6SdOWZ9hNUCnWD/yAXL8oU8ewD+3AmU1QqAzUbazjkipCw 4Ubci+cDbsSGY/A1EZcy1sZlYMeEv4q7Foei7IIAGAaA+ToJhIwDV4LgbTdCFkav7Kvg v9goz8HltgSLt//Ud6VoHNHWhqFqmhRhrruAXaz7dvzA61doT9YhZBh6vgvgcdoLYFGH Kua65ZZTczWPCC4XBgK2mxHdh8U3SF37zrKy/5x/MQvIDosn/SoZLIi6Y+iotlCiUzAo wIer2V1Uboq17Y2lGK3ylBANsP8riPNoYj8Eg6kWfnNU4aIJelTv61u9VJp0B5AFE9ox yzJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=n20mlDwGkDu0MNAnjQjDlimCHMNCPPvJjV28uvbuDA0=; b=Zy03uA+YL0GLFcgfo7StSVkdEgJdkd8AcejtRoj4uXnXNQRmV1DY0K2A2+l+4h+9PD lnAFARE0aQqTeavbGyOwwAC1417LV0Se6zfSr+nAq6WEafQlAHgyQrqbuQP+u73KMkrI EPFuz3hVKnpkfGQ/DPmTp2EgvVzBsPek+jQAKZtBa253Rb7nLo3NbdJ//3hVb/SnTtCk g7VmPYq3es+GwQxGU/B1/vPSfpl1EjZPyqd87XOnCsarUtsUk6X57v73moa0xmjLB7X/ zwsyLqHlR9ussUVgV1p2UVg/iKd4V+WRI75jHEdARtDPBOc4ZmGXPtcO+o8FasG9BwC0 jG0w== X-Gm-Message-State: APf1xPDmwie0JyYJXLg1FKDL+anaFhqNoDNwlaPA3rV44GWAZa1/bmil LMpM98Rvf4Yv2IrUV4O0bSk= X-Google-Smtp-Source: AH8x226FaD6LD9VJNsr0Jlh3yQVxV3LCjVkRDvKyyNk+RN1SBTPRymFtney5dDZBeOcDdcfEZnG57w== X-Received: by 10.200.46.210 with SMTP id i18mr20126498qta.157.1518959508288; Sun, 18 Feb 2018 05:11:48 -0800 (PST) Received: from localhost (dhcp-ec-8-6b-ed-7a-cf.cpe.echoes.net. [72.28.5.223]) by smtp.gmail.com with ESMTPSA id 23sm16219460qtx.33.2018.02.18.05.11.46 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 18 Feb 2018 05:11:47 -0800 (PST) Date: Sun, 18 Feb 2018 05:11:44 -0800 From: "tj@kernel.org" To: Bart Van Assche Cc: "hch@lst.de" , "linux-block@vger.kernel.org" , "axboe@kernel.dk" Subject: Re: [PATCH v2] blk-mq: Fix race between resetting the timer and completion handling Message-ID: <20180218131144.GX695913@devbig577.frc2.facebook.com> References: <1518024428.2870.35.camel@wdc.com> <20180207173531.GC695913@devbig577.frc2.facebook.com> <1518027251.2870.53.camel@wdc.com> <20180207200724.GD695913@devbig577.frc2.facebook.com> <1518047297.2870.80.camel@wdc.com> <1518052193.2870.90.camel@wdc.com> <20180208153940.GM695913@devbig577.frc2.facebook.com> <1518107501.3611.19.camel@wdc.com> <20180213212044.GS695913@devbig577.frc2.facebook.com> <1518627534.3147.6.camel@wdc.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <1518627534.3147.6.camel@wdc.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Hello, Bart. On Wed, Feb 14, 2018 at 04:58:56PM +0000, Bart Van Assche wrote: > With this patch applied the tests I ran so far pass. Ah, great to hear. Thanks a lot for testing. Can you please verify the following? It's the same approach but with RCU sync batching. Thanks. Index: work/block/blk-mq.c =================================================================== --- work.orig/block/blk-mq.c +++ work/block/blk-mq.c @@ -816,7 +816,8 @@ struct blk_mq_timeout_data { unsigned int nr_expired; }; -static void blk_mq_rq_timed_out(struct request *req, bool reserved) +static void blk_mq_rq_timed_out(struct blk_mq_hw_ctx *hctx, struct request *req, + int *nr_resets, bool reserved) { const struct blk_mq_ops *ops = req->q->mq_ops; enum blk_eh_timer_return ret = BLK_EH_RESET_TIMER; @@ -831,13 +832,10 @@ static void blk_mq_rq_timed_out(struct r __blk_mq_complete_request(req); break; case BLK_EH_RESET_TIMER: - /* - * As nothing prevents from completion happening while - * ->aborted_gstate is set, this may lead to ignored - * completions and further spurious timeouts. - */ - blk_mq_rq_update_aborted_gstate(req, 0); blk_add_timer(req); + req->rq_flags |= RQF_MQ_TIMEOUT_RESET; + (*nr_resets)++; + hctx->need_sync_rcu = true; break; case BLK_EH_NOT_HANDLED: break; @@ -874,13 +872,34 @@ static void blk_mq_check_expired(struct time_after_eq(jiffies, deadline)) { blk_mq_rq_update_aborted_gstate(rq, gstate); data->nr_expired++; - hctx->nr_expired++; + hctx->need_sync_rcu = true; } else if (!data->next_set || time_after(data->next, deadline)) { data->next = deadline; data->next_set = 1; } } +static void blk_mq_timeout_sync_rcu(struct request_queue *q) +{ + struct blk_mq_hw_ctx *hctx; + bool has_rcu = false; + int i; + + queue_for_each_hw_ctx(q, hctx, i) { + if (!hctx->need_sync_rcu) + continue; + + if (!(hctx->flags & BLK_MQ_F_BLOCKING)) + has_rcu = true; + else + synchronize_srcu(hctx->srcu); + + hctx->need_sync_rcu = false; + } + if (has_rcu) + synchronize_rcu(); +} + static void blk_mq_terminate_expired(struct blk_mq_hw_ctx *hctx, struct request *rq, void *priv, bool reserved) { @@ -893,7 +912,25 @@ static void blk_mq_terminate_expired(str */ if (!(rq->rq_flags & RQF_MQ_TIMEOUT_EXPIRED) && READ_ONCE(rq->gstate) == rq->aborted_gstate) - blk_mq_rq_timed_out(rq, reserved); + blk_mq_rq_timed_out(hctx, rq, priv, reserved); +} + +static void blk_mq_finish_timeout_reset(struct blk_mq_hw_ctx *hctx, + struct request *rq, void *priv, bool reserved) +{ + /* + * @rq's timer reset has gone through rcu synchronization and is + * visible now. Allow normal completions again by resetting + * ->aborted_gstate. Don't clear RQF_MQ_TIMEOUT_RESET here as + * there's no memory barrier around ->aborted_gstate. Let + * blk_add_timer() clear it later. + * + * As nothing prevents from completion happening while + * ->aborted_gstate is set, this may lead to ignored completions + * and further spurious timeouts. + */ + if (rq->rq_flags & RQF_MQ_TIMEOUT_RESET) + blk_mq_rq_update_aborted_gstate(rq, 0); } static void blk_mq_timeout_work(struct work_struct *work) @@ -928,7 +965,7 @@ static void blk_mq_timeout_work(struct w blk_mq_queue_tag_busy_iter(q, blk_mq_check_expired, &data); if (data.nr_expired) { - bool has_rcu = false; + int nr_resets = 0; /* * Wait till everyone sees ->aborted_gstate. The @@ -936,22 +973,22 @@ static void blk_mq_timeout_work(struct w * becomes a problem, we can add per-hw_ctx rcu_head and * wait in parallel. */ - queue_for_each_hw_ctx(q, hctx, i) { - if (!hctx->nr_expired) - continue; - - if (!(hctx->flags & BLK_MQ_F_BLOCKING)) - has_rcu = true; - else - synchronize_srcu(hctx->srcu); - - hctx->nr_expired = 0; - } - if (has_rcu) - synchronize_rcu(); + blk_mq_timeout_sync_rcu(q); /* terminate the ones we won */ - blk_mq_queue_tag_busy_iter(q, blk_mq_terminate_expired, NULL); + blk_mq_queue_tag_busy_iter(q, blk_mq_terminate_expired, + &nr_resets); + + /* + * For BLK_EH_RESET_TIMER, release the requests after + * blk_add_timer() from above is visible to avoid timer + * reset racing against recycling. + */ + if (nr_resets) { + blk_mq_timeout_sync_rcu(q); + blk_mq_queue_tag_busy_iter(q, + blk_mq_finish_timeout_reset, NULL); + } } if (data.next_set) { Index: work/include/linux/blk-mq.h =================================================================== --- work.orig/include/linux/blk-mq.h +++ work/include/linux/blk-mq.h @@ -51,7 +51,7 @@ struct blk_mq_hw_ctx { unsigned int queue_num; atomic_t nr_active; - unsigned int nr_expired; + bool need_sync_rcu; struct hlist_node cpuhp_dead; struct kobject kobj; Index: work/block/blk-timeout.c =================================================================== --- work.orig/block/blk-timeout.c +++ work/block/blk-timeout.c @@ -216,7 +216,7 @@ void blk_add_timer(struct request *req) req->timeout = q->rq_timeout; blk_rq_set_deadline(req, jiffies + req->timeout); - req->rq_flags &= ~RQF_MQ_TIMEOUT_EXPIRED; + req->rq_flags &= ~(RQF_MQ_TIMEOUT_EXPIRED | RQF_MQ_TIMEOUT_RESET); /* * Only the non-mq case needs to add the request to a protected list. Index: work/include/linux/blkdev.h =================================================================== --- work.orig/include/linux/blkdev.h +++ work/include/linux/blkdev.h @@ -127,8 +127,10 @@ typedef __u32 __bitwise req_flags_t; #define RQF_ZONE_WRITE_LOCKED ((__force req_flags_t)(1 << 19)) /* timeout is expired */ #define RQF_MQ_TIMEOUT_EXPIRED ((__force req_flags_t)(1 << 20)) +/* timeout is expired */ +#define RQF_MQ_TIMEOUT_RESET ((__force req_flags_t)(1 << 21)) /* already slept for hybrid poll */ -#define RQF_MQ_POLL_SLEPT ((__force req_flags_t)(1 << 21)) +#define RQF_MQ_POLL_SLEPT ((__force req_flags_t)(1 << 22)) /* flags that prevent us from merging requests: */ #define RQF_NOMERGE_FLAGS \