From patchwork Wed Jul 10 19:52:27 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 11038875 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E10DE6C5 for ; Wed, 10 Jul 2019 19:52:38 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D33E428765 for ; Wed, 10 Jul 2019 19:52:38 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C74AB289DA; Wed, 10 Jul 2019 19:52:38 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6DF2428765 for ; Wed, 10 Jul 2019 19:52:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728243AbfGJTwf (ORCPT ); Wed, 10 Jul 2019 15:52:35 -0400 Received: from mail-qt1-f195.google.com ([209.85.160.195]:46986 "EHLO mail-qt1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728211AbfGJTwd (ORCPT ); Wed, 10 Jul 2019 15:52:33 -0400 Received: by mail-qt1-f195.google.com with SMTP id h21so3771293qtn.13 for ; Wed, 10 Jul 2019 12:52:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references; bh=h2XA2X1ZY1EHxtyN0Imads8NvEkfjbdW711+ZDB5c94=; b=1p+GRmWLsFIydILSpH4SUU4qr33pL1CfrGVK7adXHtu2RSh053cgJVt/KL03koY5Gn +IE3FsqsOtjrvoaqjzN+STTVBDYcATwAhiiZ72usKmSe+2GmdGTw1VqUJ81hmVRPqt9H WVj4/Hj7zFsQ8S6Nbzk0xZZZsKW6+YytJ25j+yLDZfZhiDbpUnAxCfHiWBhQTNDZ1k6X cogq57vJw2oU8h0NdUvaJkKFy1dL3TbV6KTF9ZGhf1fy1PVEBOBt9AxARFnuvgi3ohVY QKgO1aJtdScOA9O3P99DUQsZDSLrrRDeG2axQh+RCu2+Oqna0uk/n/Ihmh9h7Bc0bSLL l8ug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=h2XA2X1ZY1EHxtyN0Imads8NvEkfjbdW711+ZDB5c94=; b=gJoYNr+MpaQRkBXCKjRScAvPRIyUX2goyuwJn6cNa3V6h9nF6yYAG3u55Iq21vvM2h sJAXQNwT3lr79mMeYgk4PgELRiHwu2TwJQ7j1onbuYmg75DBFGk295a4Onuwz755X4aS SUJL8IHxJYXgnpga8LjNUkB3qVBRSZFX3pd54OoplbBmTM59riHY47bkXZLvIMQDvBkm uCh/4OJwfQ6axcNRSolnGHEBEpySc94NgJz/Fp9P1nBx7S81fv0TbdaxJA7kibTdqwvO 1xAzuLAiaEBhgmplvtQCJLRUFdeGG9zWiIYcxpNXEQ3FXmY3JqGvFlMCC0WESLmXzQOX 4toQ== X-Gm-Message-State: APjAAAXuA4PERaU9f671CQXI57NUfzlulI8rQLn2Dt7YTF7cfRvR3E/3 DGkX3mk1b61hHEcXUGYYaRTDlQ== X-Google-Smtp-Source: APXvYqyBfvrfYhxjdfcV4ySFNAvtIm75K1LNcIgBlruRP7C/8rwENvmllFbXCx4jBoSGkRQebtMdHQ== X-Received: by 2002:ac8:2fa8:: with SMTP id l37mr25255325qta.358.1562788352412; Wed, 10 Jul 2019 12:52:32 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::5ce]) by smtp.gmail.com with ESMTPSA id i1sm1386769qtb.7.2019.07.10.12.52.31 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Jul 2019 12:52:31 -0700 (PDT) From: Josef Bacik To: kernel-team@fb.com, axboe@kernel.dk, linux-block@vger.kernel.org Subject: [PATCH 2/2] rq-qos: fix missed wake-ups in rq_qos_throttle Date: Wed, 10 Jul 2019 15:52:27 -0400 Message-Id: <20190710195227.92322-2-josef@toxicpanda.com> X-Mailer: git-send-email 2.13.5 In-Reply-To: <20190710195227.92322-1-josef@toxicpanda.com> References: <20190710195227.92322-1-josef@toxicpanda.com> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP We saw a hang in production with WBT where there was only one waiter in the throttle path and no outstanding IO. This is because of the has_sleepers optimization that is used to make sure we don't steal an inflight counter for new submitters when there are people already on the list. We can race with our check to see if the waitqueue has any waiters (this is done locklessly) and the time we actually add ourselves to the waitqueue. If this happens we'll go to sleep and never be woken up because nobody is doing IO to wake us up. Fix this by checking if the waitqueue has multiple sleepers after we add ourselves to the list, that way we have an uptodate view of the list. Signed-off-by: Josef Bacik --- block/blk-rq-qos.c | 1 + 1 file changed, 1 insertion(+) diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c index 659ccb8b693f..b39b5f3fb01b 100644 --- a/block/blk-rq-qos.c +++ b/block/blk-rq-qos.c @@ -244,6 +244,7 @@ void rq_qos_wait(struct rq_wait *rqw, void *private_data, return; prepare_to_wait_exclusive(&rqw->wait, &data.wq, TASK_UNINTERRUPTIBLE); + has_sleeper = wq_has_multiple_sleepers(&rqw->wait); do { if (data.got_token) break;