From patchwork Thu Sep 7 18:40:17 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 9942661 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 060FB602CC for ; Thu, 7 Sep 2017 18:40:23 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EFB82287FE for ; Thu, 7 Sep 2017 18:40:22 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E2A212881A; Thu, 7 Sep 2017 18:40:22 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 26AC728817 for ; Thu, 7 Sep 2017 18:40:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755841AbdIGSkU (ORCPT ); Thu, 7 Sep 2017 14:40:20 -0400 Received: from mail-pf0-f179.google.com ([209.85.192.179]:35998 "EHLO mail-pf0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755684AbdIGSkT (ORCPT ); Thu, 7 Sep 2017 14:40:19 -0400 Received: by mail-pf0-f179.google.com with SMTP id e199so790839pfh.3 for ; Thu, 07 Sep 2017 11:40:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=W63AGPp1wn1fUGwqr4FD6bPPFoikQEB08ZXQKUfc+FE=; b=hSlAeZyDqNomcuLza5rAv76duiwRAlSmvQF8Tjr0mZv6i14D9+k2JlduiUEkqT0AUu PgP1+j0MhImYwjFkDyfudS6aMwtik3loNepcd9KKxrA9PDd5Fr+miW6xsemNVMQL3v/N 9FaPLp78oaEKBT5AAqf3QA+XkjnisdT/veAlRtUj4DgQbvjJs+sAAG/HPtPh6Jkb+Zj1 DHOnlyW4u91zZh4XUKU8KuQz9Ke1M6ns/djTmkgdpXO/W5Q7xswGFOY0YfMIK6VocHUe 0j7kDXBywIOq2B7Vv4dzxR69/BLyF932BnsohFMfQ8FLQEJUw5yGPsMWNCtZRUx65cXa 5Aew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=W63AGPp1wn1fUGwqr4FD6bPPFoikQEB08ZXQKUfc+FE=; b=Ei/qBM51aehxJakQtGzohDM7+xwTcjx7CBsDMs8Aisd84/xyBWzr9u1VD2s6B/Q7DI oiTx+IjJjwJhyAcxHlpL5XEHurmt1FwHVZoIyTO8d1GvV70HYR5pf4VmWLH3nadS7YgZ UGAeBcP3Ma9PGRvJIIGttRhfJjmArJq+xMo8W4RErKjaQdeg1p/lur/pLFN5xge0LsyI P7+PqsfY6Bbm6xOR8ReIzVHHlQ1/HFatA6tmfrh8dujD0knilPuOabl73NKba/uy5f5/ ModPEMTKK8tSKrfy+wZbXnwYo5ExYKJ26wZkWmL5YF6hxrI29Q1Pplj1IGPv9nNqMOGV ft3Q== X-Gm-Message-State: AHPjjUjxSxf+Umr65UXU6JTlrkb8azf9WhuHs2V9a/ubG6ZZrp9yR+9c Oe0TSZ8yfzA2kiwv X-Google-Smtp-Source: ADKCNb5eXUYTg7HAnP7VQRd/Vw3dpHD6uRFOL8oOu194ATYEpvEINUAbpgqbPHiILHi/3dKri+brgA== X-Received: by 10.84.131.79 with SMTP id 73mr296165pld.281.1504809619012; Thu, 07 Sep 2017 11:40:19 -0700 (PDT) Received: from vader.DHCP.thefacebook.com ([2620:10d:c090:200::6:4716]) by smtp.gmail.com with ESMTPSA id 62sm485986pfs.33.2017.09.07.11.40.18 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 07 Sep 2017 11:40:18 -0700 (PDT) Date: Thu, 7 Sep 2017 11:40:17 -0700 From: Omar Sandoval To: =?utf-8?B?5p+l5paM?= Cc: linux-block@vger.kernel.org, osandov@fb.com, axboe@fb.com, boyu.mt@taobao.com, wenqing.lz@taobao.com, qijiang.qj@alibaba-inc.com Subject: Re: [PATCH]blk-mq-sched: remove the empty entry in token wait list Message-ID: <20170907184017.GC17057@vader.DHCP.thefacebook.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.0 (2017-09-02) Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Tue, Aug 29, 2017 at 10:52:13PM +0800, 查斌 wrote: > From: Bin Zha > > > When the kyber adjusts the sync and other write depth to the > minimum(1), there is a case that maybe cause the requests to > be stalled in the kyber_hctx_data list. > > The following example I have tested: > > > CPU7 > block_rq_insert > add_wait_queue > __sbitmap_queue_get CPU23 > block_rq_issue block_rq_insert > block_rq_complete ------> waiting token free > block_rq_issue > /|\ block_rq_complete > | | > | | > | \|/ > | CPU29 > | block_rq_insert > | waiting token free > | block_rq_issue > |---------------------- block_rq_complete > > CPU1 > block_rq_insert > waiting token free > > > The IO request complete in CPU29 will wake up CPU7, > because it has been added to the waitqueue in > kyber_get_domain_token. But it is empty waiter, and won't > wake up the CPU1.If no other requests issue to push it, > the requests will stall in kyber_hctx_data list. > > > Signed-off-by: Bin Zha > > diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c > index b9faabc..584bfd5 100644 > --- a/block/kyber-iosched.c > +++ b/block/kyber-iosched.c > @@ -548,6 +548,10 @@ static int kyber_get_domain_token(struct > kyber_queue_data *kqd, > * queue. > */ > nr = __sbitmap_queue_get(domain_tokens); > + if (nr >= 0) { > + remove_wait_queue(&ws->wait, wait); > + INIT_LIST_HEAD(&wait->task_list); > + } > } > return nr; > } Hm... I could've sworn I thought about this case when I wrote this code, but your analysis looks correct. I do remember that I didn't do this because I was worried about a race like so: add_wait_queue() kyber_domain_wake() \_ list_del_init() remove_wait_queue() \_ list_del() But thinking about it some more, list_del() is probably safe after list_del_init()? Have you been able to reproduce this? I have the following patch to force the domain token pools to one token, I imagine with the right workload we can hit it: diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c index f58cab8..b4e1bd7 100644 --- a/block/kyber-iosched.c +++ b/block/kyber-iosched.c @@ -58,9 +58,9 @@ enum { * So, we cap these to a reasonable value. */ static const unsigned int kyber_depth[] = { - [KYBER_READ] = 256, - [KYBER_SYNC_WRITE] = 128, - [KYBER_OTHER] = 64, + [KYBER_READ] = 1, + [KYBER_SYNC_WRITE] = 1, + [KYBER_OTHER] = 1, }; /* @@ -126,6 +126,7 @@ enum { #define IS_GOOD(status) ((status) > 0) #define IS_BAD(status) ((status) < 0) +#if 0 static int kyber_lat_status(struct blk_stat_callback *cb, unsigned int sched_domain, u64 target) { @@ -243,6 +244,7 @@ static void kyber_adjust_other_depth(struct kyber_queue_data *kqd, if (depth != orig_depth) sbitmap_queue_resize(&kqd->domain_tokens[KYBER_OTHER], depth); } +#endif /* * Apply heuristics for limiting queue depths based on gathered latency @@ -250,6 +252,8 @@ static void kyber_adjust_other_depth(struct kyber_queue_data *kqd, */ static void kyber_stat_timer_fn(struct blk_stat_callback *cb) { + return; +#if 0 struct kyber_queue_data *kqd = cb->data; int read_status, write_status; @@ -269,6 +273,7 @@ static void kyber_stat_timer_fn(struct blk_stat_callback *cb) ((IS_BAD(read_status) || IS_BAD(write_status) || kqd->domain_tokens[KYBER_OTHER].sb.depth < kyber_depth[KYBER_OTHER]))) blk_stat_activate_msecs(kqd->cb, 100); +#endif } static unsigned int kyber_sched_tags_shift(struct kyber_queue_data *kqd)