From patchwork Tue Jul 16 20:19:26 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 11046693 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 487E5912 for ; Tue, 16 Jul 2019 20:19:39 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 386CD28657 for ; Tue, 16 Jul 2019 20:19:39 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2CD53286BF; Tue, 16 Jul 2019 20:19:39 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D048628657 for ; Tue, 16 Jul 2019 20:19:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388518AbfGPUTi (ORCPT ); Tue, 16 Jul 2019 16:19:38 -0400 Received: from mail-qt1-f196.google.com ([209.85.160.196]:36077 "EHLO mail-qt1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728137AbfGPUTi (ORCPT ); Tue, 16 Jul 2019 16:19:38 -0400 Received: by mail-qt1-f196.google.com with SMTP id z4so21006784qtc.3 for ; Tue, 16 Jul 2019 13:19:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references; bh=MB4gVXqWNYaKJnrlVX20etvhbnggMPAbdHNsnWIPeXs=; b=E0TsVungQ0CRAB8CPusKzkgnknOYKTXIDOYzfPHQXdujxSJjUiCBTmQrMRVdGc5Eei cklNvU3oWTctho8seddoap//2XJC7ukV1tKaw3fGfd1PZZJ4eQRuT34UBcD5EOnM2YMA tBNsVumhOIEhFoTGm0WmERWdpAeQb4VwvIYfPzWJEMi4xSzs+Utha4IsoLYvNOjziKCC TiK2K41IYoc729YHqEZsfjJr7OaPZ56UgKTzzoJiJvSbeXyKCvIy9pz5SP/aJC0juYJA 6onZVWyvS1sFAUl6nmGtveO2nyZVZE+HumuHK335UcJ3jIJ4S0c1IUX6ksyaEFxJip5C nWbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=MB4gVXqWNYaKJnrlVX20etvhbnggMPAbdHNsnWIPeXs=; b=HM3rPcRqJZQuRf+FGJWoSaQY25TG4cOjJTxNSfZN2zm3DxPiPQTfDT6P84XJyQKjtH 6DyrpmHCpB8V5JNkY8MeEJnyQAGYgotEZjutguuSW9xefITT++YSt/qI+3m2wdDNwhES L70hWXxBGrct59HzmnK0cjH0oKRb5dRIuLFGAu2UQGvtY84/E1yuS5fB1ohM6+lyxaG1 lgjvY7vSAjWaAyQfvywDgO2vkQxJKzBRvCSlvQ46g9xQi8VRZsDXWLLLExZAjTnMYJS5 1AYZeGpUYDFbB/Q/d0S4iZs22Y7sTz7439I+D7KCFKEhLcrblJe87YlUbB/FKZC3ZHSP 8llA== X-Gm-Message-State: APjAAAWQolazDRrg1JJtJQm7qfnsWE0K+TwRzmJx5n/UUkhL7jkJ6XwN u+u4T3oJ0/Y/0AQ4+qDGdx8= X-Google-Smtp-Source: APXvYqzJJtrhLlNXUoj6dGJO4DPR80pYH7IJTLsI4yzx7sjdGMgUa65FO3qLauCNZ0onWm787lf+Xw== X-Received: by 2002:a0c:9214:: with SMTP id a20mr26002629qva.195.1563308377340; Tue, 16 Jul 2019 13:19:37 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::d7f3]) by smtp.gmail.com with ESMTPSA id m5sm9285237qkb.117.2019.07.16.13.19.36 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Jul 2019 13:19:36 -0700 (PDT) From: Josef Bacik To: axboe@kernel.dk, kernel-team@fb.com, linux-block@vger.kernel.org, peterz@infradead.org, oleg@redhat.com Subject: [PATCH 2/5] rq-qos: fix missed wake-ups in rq_qos_throttle Date: Tue, 16 Jul 2019 16:19:26 -0400 Message-Id: <20190716201929.79142-3-josef@toxicpanda.com> X-Mailer: git-send-email 2.13.5 In-Reply-To: <20190716201929.79142-1-josef@toxicpanda.com> References: <20190716201929.79142-1-josef@toxicpanda.com> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP We saw a hang in production with WBT where there was only one waiter in the throttle path and no outstanding IO. This is because of the has_sleepers optimization that is used to make sure we don't steal an inflight counter for new submitters when there are people already on the list. We can race with our check to see if the waitqueue has any waiters (this is done locklessly) and the time we actually add ourselves to the waitqueue. If this happens we'll go to sleep and never be woken up because nobody is doing IO to wake us up. Fix this by checking if the waitqueue has a single sleeper on the list after we add ourselves, that way we have an uptodate view of the list. Signed-off-by: Josef Bacik --- block/blk-rq-qos.c | 1 + 1 file changed, 1 insertion(+) diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c index 659ccb8b693f..67a0a4c07060 100644 --- a/block/blk-rq-qos.c +++ b/block/blk-rq-qos.c @@ -244,6 +244,7 @@ void rq_qos_wait(struct rq_wait *rqw, void *private_data, return; prepare_to_wait_exclusive(&rqw->wait, &data.wq, TASK_UNINTERRUPTIBLE); + has_sleeper = !wq_has_single_sleeper(&rqw->wait); do { if (data.got_token) break;