From patchwork Wed Feb 7 21:02:21 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bart Van Assche X-Patchwork-Id: 10206021 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id CAAB4602D8 for ; Wed, 7 Feb 2018 21:02:25 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B7E2429124 for ; Wed, 7 Feb 2018 21:02:25 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A9E8129127; Wed, 7 Feb 2018 21:02:25 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0EA2B29130 for ; Wed, 7 Feb 2018 21:02:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754355AbeBGVCY (ORCPT ); Wed, 7 Feb 2018 16:02:24 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:37981 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754154AbeBGVCX (ORCPT ); Wed, 7 Feb 2018 16:02:23 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1518037344; x=1549573344; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-id:content-transfer-encoding: mime-version; bh=xhaiohy+Ba7rpFqZsKCw4DRbmXsKOfCjXoLkf/hzKC4=; b=qAwKhzFnBwGXE3WqxNQ6uPj4HwzlgmVJNXYqpy8x68K2tOsPovH2AgKX xhL796kcvFQ8+3WqMi7ounIYu6y7FDgnX3E52hpxQ/dx12krvbdpL1JOj i4d/ZrvGGoqf1vaAHJbAOSqmqB231hIwLS2IrBfZDu1XG+fgQE8bZX7BN pB13y1dU+nMaNf4UBQtqZW7GZK/iAR3fSGMYxZBq4mOx9gSaXvcgFQUSB lowuV9Vg8vVWLizVAIk/iHFvE5HJoK8308ryL6gMzNy98l/UTAE6UFwKa IISHUZ/JVujZLmB65v4ozvp0HgZrkO5MBux0EWkgLxS6IFfqMEIOiSHSe w==; X-IronPort-AV: E=Sophos;i="5.46,473,1511798400"; d="scan'208";a="71396972" Received: from mail-by2nam01lp0175.outbound.protection.outlook.com (HELO NAM01-BY2-obe.outbound.protection.outlook.com) ([216.32.181.175]) by ob1.hgst.iphmx.com with ESMTP; 08 Feb 2018 05:02:23 +0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sharedspace.onmicrosoft.com; s=selector1-wdc-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=xhaiohy+Ba7rpFqZsKCw4DRbmXsKOfCjXoLkf/hzKC4=; b=JOfEds+efZsLrVRww/xs2gYKJSTOZ57l8R7jQ9jEW49d4sA1S8q52s6BQbo1CZm7QTzV/1UBGtcZENEHwKkgQwRzdxwxhs4P0IF/Fb84blc9Fi7s3mbC5xhn2GPkUOFc2KkWWGKrKFAqQrRkYwII7UxUKVzcn4W94gRQiXe86DU= Received: from CY1PR0401MB1536.namprd04.prod.outlook.com (10.163.19.154) by CY1PR0401MB1658.namprd04.prod.outlook.com (10.162.166.148) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.464.11; Wed, 7 Feb 2018 21:02:21 +0000 Received: from CY1PR0401MB1536.namprd04.prod.outlook.com ([10.163.19.154]) by CY1PR0401MB1536.namprd04.prod.outlook.com ([10.163.19.154]) with mapi id 15.20.0464.016; Wed, 7 Feb 2018 21:02:21 +0000 From: Bart Van Assche To: "tj@kernel.org" CC: "hch@lst.de" , "linux-block@vger.kernel.org" , "axboe@kernel.dk" Subject: Re: [PATCH v2] blk-mq: Fix race between resetting the timer and completion handling Thread-Topic: [PATCH v2] blk-mq: Fix race between resetting the timer and completion handling Thread-Index: AQHTn7CflvpawNBziEiOaXlhgveZUKOZLIcAgAAg4oCAABJugIAADqmA Date: Wed, 7 Feb 2018 21:02:21 +0000 Message-ID: <1518037339.2870.61.camel@wdc.com> References: <20180207011133.25957-1-bart.vanassche@wdc.com> <20180207170612.GB695913@devbig577.frc2.facebook.com> <1518030233.2870.57.camel@wdc.com> <20180207200951.GE695913@devbig577.frc2.facebook.com> In-Reply-To: <20180207200951.GE695913@devbig577.frc2.facebook.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Bart.VanAssche@wdc.com; x-originating-ip: [199.255.44.172] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; CY1PR0401MB1658; 7:A1VEMlCkzXw01oJAMcZY4DzHpfdJYNTOft5QJTDeiy46XBrpAXWLakOTSVjAhHQMFhlOj5P5mtSAnnWjguyxUuVEfiSyQvi53O9jnN6I9IDXo1mwGSUDVr1IYUz2fYHPjXeQb2FXq3HcQw+O6rVDlq0/3BEKI/449axNC0dLx4oAXqqiYAGrrYzArk2dJ1WSdxSVQJ6VWNhMEwMx3knZs1/1H9rGIqY4gNvZRGCUlhY/CJlRwtJLIPqNPTSoxVzq; 20:3EmhxkpVZa1U6bx0T5GGgFZstPKgY/eq4p0uwOX99n/RvcZGBdc/wnguC7cnPiuYOiR8ILv+AxbZ0PCnYRyJFAHWCSqRH80nufTxojVWwJRG5in2aCd2kPkl6fDjeLAHDdwxAKr5lUz9RXOUujLCAHADomoZypbH+h02Xs83L9c= x-ms-exchange-antispam-srfa-diagnostics: SSOS; x-ms-office365-filtering-ht: Tenant x-ms-office365-filtering-correlation-id: ef525933-7277-416a-cad9-08d56e6e148f x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(7020095)(4652020)(4534165)(4627221)(201703031133081)(201702281549075)(48565401081)(5600026)(4604075)(3008032)(2017052603307)(7153060)(7193020); SRVR:CY1PR0401MB1658; x-ms-traffictypediagnostic: CY1PR0401MB1658: wdcipoutbound: EOP-TRUE x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040501)(2401047)(5005006)(8121501046)(3231101)(2400082)(944501161)(10201501046)(93006095)(93001095)(3002001)(6055026)(6041288)(20161123562045)(20161123564045)(20161123560045)(20161123558120)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(6072148)(201708071742011); SRVR:CY1PR0401MB1658; BCL:0; PCL:0; RULEID:; SRVR:CY1PR0401MB1658; x-forefront-prvs: 0576145E86 x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(396003)(39860400002)(366004)(39380400002)(346002)(376002)(52314003)(189003)(199004)(377424004)(97736004)(25786009)(36756003)(72206003)(106356001)(3660700001)(478600001)(3280700002)(2906002)(3846002)(6116002)(5660300001)(2351001)(93886005)(305945005)(2501003)(99286004)(68736007)(54906003)(316002)(14454004)(7736002)(6512007)(6436002)(86362001)(105586002)(8936002)(229853002)(2900100001)(2950100002)(1730700003)(81166006)(6246003)(6486002)(575784001)(53936002)(5640700003)(81156014)(103116003)(8676002)(77096007)(6916009)(26005)(4326008)(76176011)(66066001)(186003)(59450400001)(102836004)(6506007); DIR:OUT; SFP:1102; SCL:1; SRVR:CY1PR0401MB1658; H:CY1PR0401MB1536.namprd04.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; MX:1; A:1; LANG:en; x-microsoft-antispam-message-info: lKrydTvce//w7nRebH0XYa+dueQL8aCNp1z4O5t+4/Cp44HB94G8gIhJrfnlMc/2TbqYMkCGMuRJK7J0nntyoA== spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-ID: MIME-Version: 1.0 X-OriginatorOrg: wdc.com X-MS-Exchange-CrossTenant-Network-Message-Id: ef525933-7277-416a-cad9-08d56e6e148f X-MS-Exchange-CrossTenant-originalarrivaltime: 07 Feb 2018 21:02:21.2281 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: b61c8803-16f3-4c35-9b17-6f65f441df86 X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY1PR0401MB1658 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Wed, 2018-02-07 at 12:09 -0800, tj@kernel.org wrote: > Hello, > > On Wed, Feb 07, 2018 at 07:03:56PM +0000, Bart Van Assche wrote: > > I tried the above patch but already during the first iteration of the test I > > noticed that the test hung, probably due to the following request that got stuck: > > > > $ (cd /sys/kernel/debug/block && grep -aH . */*/*/rq_list) > > 00000000a98cff60 {.op=SCSI_IN, .cmd_flags=, .rq_flags=MQ_INFLIGHT|PREEMPT|QUIET|IO_STAT|PM, > > .state=idle, .tag=22, .internal_tag=-1, .cmd=Synchronize Cache(10) 35 00 00 00 00 00, .retries=0, > > .result = 0x0, .flags=TAGGED, .timeout=60.000, allocated 872.690 s ago} > > I'm wonder how this happened, so we can lose a completion when it > races against BLK_EH_RESET_TIMER; however, the command should timeout > later cuz the timer is running again now. Maybe we actually had the > memory barrier race that you pointed out in the other message? Hello Tejun, The patch that I used in my test had an smp_wmb() call (see also below). Anyway, I will see whether I can extract more state information through debugfs. diff --git a/block/blk-mq.c b/block/blk-mq.c index ef4f6df0f1df..8eb2105d82b7 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -827,13 +827,9 @@ static void blk_mq_rq_timed_out(struct request *req, bool reserved) __blk_mq_complete_request(req); break; case BLK_EH_RESET_TIMER: - /* - * As nothing prevents from completion happening while - * ->aborted_gstate is set, this may lead to ignored - * completions and further spurious timeouts. - */ - blk_mq_rq_update_aborted_gstate(req, 0); blk_add_timer(req); + smp_wmb(); + blk_mq_rq_update_aborted_gstate(req, 0); break; case BLK_EH_NOT_HANDLED: break;