From patchwork Thu Aug 9 20:26:43 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 10561899 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C92AD14E2 for ; Thu, 9 Aug 2018 20:26:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B57012B9F4 for ; Thu, 9 Aug 2018 20:26:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A99182B9F7; Thu, 9 Aug 2018 20:26:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 230082B9EE for ; Thu, 9 Aug 2018 20:26:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727156AbeHIWxZ (ORCPT ); Thu, 9 Aug 2018 18:53:25 -0400 Received: from mail-pg1-f196.google.com ([209.85.215.196]:33854 "EHLO mail-pg1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726890AbeHIWxZ (ORCPT ); Thu, 9 Aug 2018 18:53:25 -0400 Received: by mail-pg1-f196.google.com with SMTP id y5-v6so3278683pgv.1 for ; Thu, 09 Aug 2018 13:26:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=Yz1xdTEm6iiPxkWjqWtovAUr4zTZSBDMxxiwCcpHkCc=; b=UwFAjD6XkLKsUYdx5RrhMYsvPrlmejc17xLgL7rrKjGDQZufBx4Nt6bI7SHjssKouK bjb354JUVUXeY2eS4hJJmztwZoQjq4CN7DkaB+iqGXM9yqYetkw2tzCqvg+o6955EAux Pf8cOgRRTPGmOQqU0Ld6ylrJHOzkYdr4E6qmIYPg+QdpDRTu4VrEdZywCjhjUZLfsm5L ZKfJpY0A8MC1OUBrmaBEnYvtOtU1y14aoYP6BWfB8rrQAP5BZQIFB8zF2JHakrOAwuFg SoBLpB3DmQ0AA72ro8xg2eEMevmED8dZ2+dp7dnerv3DF4nJ+vh0j9EOt/AtuivV7y5j UHKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=Yz1xdTEm6iiPxkWjqWtovAUr4zTZSBDMxxiwCcpHkCc=; b=pFsMGTOv3pVU+M/7W+IZY+ic5Cv7/6DHW0ZOIZVScSjac8WVXDzVxY2EKw5HLXNrIs KYpYQh652fbN9GSmuZoYAursVddy4oFZ2vYw/u0FauZ+Vz9kQj8vhRgLfZ5iDbIvXzfh 8T0odIxGug86OPfjwyYo8C1uZtmkP4+o6JvEG3Dx0obrFuU3+U/BR+htHy8KQleFNEeb 3y97mAUj7k3WDrbyoX1PILmGe7ecUJVsf4eKWBGRheHvPwlq0OYvWsTwCN2UHkUVJqq2 cLbwImNClSHKYK3obdPoaw2cJb/EF7BzRDI6h5c2lKfOD31TNni3JnobSQyKSEY2MDUy 1CoQ== X-Gm-Message-State: AOUpUlFKsDJ+Mr867l95BXaVnoLmUV7y8mOy2hny6JTjN27yh2adkf70 0D55Yhscbn3JUZMn8m59F3dUoCaxCnk= X-Google-Smtp-Source: AA+uWPyFA/Z8ROYOinHrIzCwb8LwrJTHpkqZLjC8aRiplGq/Wv/vAdMRE0x1R6SCNVmvHgbXJp0jIw== X-Received: by 2002:a62:cd82:: with SMTP id o124-v6mr3852074pfg.206.1533846416214; Thu, 09 Aug 2018 13:26:56 -0700 (PDT) Received: from vader.thefacebook.com ([2620:10d:c090:200::4:dd24]) by smtp.gmail.com with ESMTPSA id u2-v6sm9841709pfn.59.2018.08.09.13.26.55 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 09 Aug 2018 13:26:55 -0700 (PDT) From: Omar Sandoval To: linux-block@vger.kernel.org Cc: Jens Axboe , kernel-team@fb.com Subject: [RFC PATCH 1/5] block: move call of scheduler's ->completed_request() hook Date: Thu, 9 Aug 2018 13:26:43 -0700 Message-Id: X-Mailer: git-send-email 2.18.0 In-Reply-To: References: Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Omar Sandoval Commit 4bc6339a583c ("block: move blk_stat_add() to __blk_mq_end_request()") consolidated some calls using ktime_get() so we'd only need to call it once. Kyber's ->completed_request() hook also calls ktime_get(), so let's move it to the same place, too. Signed-off-by: Omar Sandoval --- block/blk-mq-sched.h | 4 ++-- block/blk-mq.c | 5 +++-- block/kyber-iosched.c | 5 ++--- include/linux/elevator.h | 2 +- 4 files changed, 8 insertions(+), 8 deletions(-) diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h index 0cb8f938dff9..74fb6ff9a30d 100644 --- a/block/blk-mq-sched.h +++ b/block/blk-mq-sched.h @@ -54,12 +54,12 @@ blk_mq_sched_allow_merge(struct request_queue *q, struct request *rq, return true; } -static inline void blk_mq_sched_completed_request(struct request *rq) +static inline void blk_mq_sched_completed_request(struct request *rq, u64 now) { struct elevator_queue *e = rq->q->elevator; if (e && e->type->ops.mq.completed_request) - e->type->ops.mq.completed_request(rq); + e->type->ops.mq.completed_request(rq, now); } static inline void blk_mq_sched_started_request(struct request *rq) diff --git a/block/blk-mq.c b/block/blk-mq.c index 654b0dc7e001..0c5be0001d0f 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -524,6 +524,9 @@ inline void __blk_mq_end_request(struct request *rq, blk_status_t error) blk_stat_add(rq, now); } + if (rq->internal_tag != -1) + blk_mq_sched_completed_request(rq, now); + blk_account_io_done(rq, now); if (rq->end_io) { @@ -560,8 +563,6 @@ static void __blk_mq_complete_request(struct request *rq) if (!blk_mq_mark_complete(rq)) return; - if (rq->internal_tag != -1) - blk_mq_sched_completed_request(rq); if (!test_bit(QUEUE_FLAG_SAME_COMP, &rq->q->queue_flags)) { rq->q->softirq_done_fn(rq); diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c index a1660bafc912..95d062c07c61 100644 --- a/block/kyber-iosched.c +++ b/block/kyber-iosched.c @@ -558,12 +558,12 @@ static void kyber_finish_request(struct request *rq) rq_clear_domain_token(kqd, rq); } -static void kyber_completed_request(struct request *rq) +static void kyber_completed_request(struct request *rq, u64 now) { struct request_queue *q = rq->q; struct kyber_queue_data *kqd = q->elevator->elevator_data; unsigned int sched_domain; - u64 now, latency, target; + u64 latency, target; /* * Check if this request met our latency goal. If not, quickly gather @@ -585,7 +585,6 @@ static void kyber_completed_request(struct request *rq) if (blk_stat_is_active(kqd->cb)) return; - now = ktime_get_ns(); if (now < rq->io_start_time_ns) return; diff --git a/include/linux/elevator.h b/include/linux/elevator.h index a02deea30185..015bb59c0331 100644 --- a/include/linux/elevator.h +++ b/include/linux/elevator.h @@ -111,7 +111,7 @@ struct elevator_mq_ops { void (*insert_requests)(struct blk_mq_hw_ctx *, struct list_head *, bool); struct request *(*dispatch_request)(struct blk_mq_hw_ctx *); bool (*has_work)(struct blk_mq_hw_ctx *); - void (*completed_request)(struct request *); + void (*completed_request)(struct request *, u64); void (*started_request)(struct request *); void (*requeue_request)(struct request *); struct request *(*former_request)(struct request_queue *, struct request *); From patchwork Thu Aug 9 20:26:44 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 10561901 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 28985139A for ; Thu, 9 Aug 2018 20:26:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1638E2B9EE for ; Thu, 9 Aug 2018 20:26:59 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0A8852B9F7; Thu, 9 Aug 2018 20:26:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AEEE62B9EE for ; Thu, 9 Aug 2018 20:26:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727166AbeHIWx0 (ORCPT ); Thu, 9 Aug 2018 18:53:26 -0400 Received: from mail-pf1-f194.google.com ([209.85.210.194]:45562 "EHLO mail-pf1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726890AbeHIWxZ (ORCPT ); Thu, 9 Aug 2018 18:53:25 -0400 Received: by mail-pf1-f194.google.com with SMTP id i26-v6so3362051pfo.12 for ; Thu, 09 Aug 2018 13:26:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=SMVADMEVXu+i1LcNhMjePuGVnm30RbDJM7+K3ct3i3I=; b=1I+br2h57IXNqb9i8cbHgB1Axmha+sjS2+RJEESYNwWmI4AIjCLAOnGFNn4UVLVxdB QIgNu2CrHOEFtmUBKAC98DpCeHpATzwW6oiPGKsFui6stLPOfNm9gSkZMF3RKW/TBZsV p+q/B3/OjoCod1Ri/sYLmRt+BWJt+VSraeYa3DFtsKnEGgGF5kNylKf1oeE5vafl0hLd QjGljNAcYD+cm/eZsbiSAaySH/15Lxy1CcIDEPNYlwG8YeScoOI2pqoY8DTt6pg+G58o DN2/bsTak8saLiVmP7ZGBCHgI5OkG13YEGBG1kIwf0nlc1kwpX+jkrXoQLRMDmlUXO0j 4XVA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=SMVADMEVXu+i1LcNhMjePuGVnm30RbDJM7+K3ct3i3I=; b=Bfii1rhfXMS98ZhNSyQvFVrK0Y5vRFoT8cZ6IPVoEgcKn36z3ZoxKIJT6QHzdhSZKL Ze8GUXXMibcG9oZkubwkrKj/qLxSmmSFLlwte9LILW9t0+po3cAje+1lYFsjajQQOUSK U9cvzRTYQviHjxEiU4q549+p3qhKktWs0d2gKYuWz+1OtALd8ZtsSdrM2XY1gEdmA3v6 k1cS/fz4KB+X02XutKReXkkokz2sT9p1FzGF1DRJ5rFao3Hh66Jd7syYaI2QWkk1Qhoe DanjdP6iNr9/xxN9x8WQSZvatYhtLwRp5K91AD2o0LuH2C9bp+vNANZWTIu95EUucm1l sspQ== X-Gm-Message-State: AOUpUlHVcl288nvGXRC/YRueFUvyg8vnkSAXYdw14vXouKd/r/T7K8CO V29oP3Ci9Rog4y/MDNPG0nC0l0IXl6g= X-Google-Smtp-Source: AA+uWPyPGYbh1wW5iMLXIp/gMDYPSPC7kip0PT2XTTt3+msQpA72Z2G9mtWXCivEqPK8ls+zl4ZueA== X-Received: by 2002:a62:be03:: with SMTP id l3-v6mr3864796pff.138.1533846417134; Thu, 09 Aug 2018 13:26:57 -0700 (PDT) Received: from vader.thefacebook.com ([2620:10d:c090:200::4:dd24]) by smtp.gmail.com with ESMTPSA id u2-v6sm9841709pfn.59.2018.08.09.13.26.56 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 09 Aug 2018 13:26:56 -0700 (PDT) From: Omar Sandoval To: linux-block@vger.kernel.org Cc: Jens Axboe , kernel-team@fb.com Subject: [RFC PATCH 2/5] block: export blk_stat_enable_accounting() Date: Thu, 9 Aug 2018 13:26:44 -0700 Message-Id: <0201bd7f86112ddbe7c190a1344b6a030cc2e971.1533846185.git.osandov@fb.com> X-Mailer: git-send-email 2.18.0 In-Reply-To: References: Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Omar Sandoval Kyber will need this in a future change if it is built as a module. Signed-off-by: Omar Sandoval --- block/blk-stat.c | 1 + 1 file changed, 1 insertion(+) diff --git a/block/blk-stat.c b/block/blk-stat.c index 175c143ac5b9..d98f3ad6794e 100644 --- a/block/blk-stat.c +++ b/block/blk-stat.c @@ -190,6 +190,7 @@ void blk_stat_enable_accounting(struct request_queue *q) blk_queue_flag_set(QUEUE_FLAG_STATS, q); spin_unlock(&q->stats->lock); } +EXPORT_SYMBOL_GPL(blk_stat_enable_accounting); struct blk_queue_stats *blk_alloc_queue_stats(void) { From patchwork Thu Aug 9 20:26:45 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 10561903 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EDDC014E2 for ; Thu, 9 Aug 2018 20:26:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DAFFD2B9EE for ; Thu, 9 Aug 2018 20:26:59 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CF4052B9F7; Thu, 9 Aug 2018 20:26:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 766E02B9EE for ; Thu, 9 Aug 2018 20:26:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727202AbeHIWx0 (ORCPT ); Thu, 9 Aug 2018 18:53:26 -0400 Received: from mail-pf1-f195.google.com ([209.85.210.195]:34023 "EHLO mail-pf1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726890AbeHIWx0 (ORCPT ); Thu, 9 Aug 2018 18:53:26 -0400 Received: by mail-pf1-f195.google.com with SMTP id k19-v6so3386317pfi.1 for ; Thu, 09 Aug 2018 13:26:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=Y5AohLxef0EDjlmIDFBw6X6dHuMfsARotcI3MEG8oVE=; b=pKj4IjEZw0/Vx4Rzbw1cBoUpKPV40gduokpmT9LX+9CDhUdGupfC+3OV20pSdL+Gy3 V4v3b1gRQOF1G7RXMoPpUDnWZo1Kbt/DtCSehTsKvNKaYV+DjwbRZw/AghujW2++4caa RYyUSUM+1WpAVkTy8xGy4LbZkdQ217BiBMtbPRN+3miGPENuyVceEJucIM04qGY+nx56 vmPYV38WbHgQ5hW5LGRr099ThFEeldNPI9lsxR8k+bZ0MCk3FCWxo9+azQ7gtv3r50uu L8mIisngsmN9erwiHClWegbEjzv7+30Hl9AuIUGqyscd9GSmA8sObrE8jmU3XlY2rhEL 32eg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=Y5AohLxef0EDjlmIDFBw6X6dHuMfsARotcI3MEG8oVE=; b=mSaNE4ep5+YIvSP0bKSAtrZAVNzbuTZ5S2LQbdn1D9BuTyn8wnw97c8jUBAfu3s/hZ s3yDpssDcIlw5nFesds9bTCyhngMD/d36cP5XR211eRaP499/CD2BvVqXKS0/LsMCwQM pz8IO3EpjwAe7hYQ6Jkf+DYuREGqzHR+y7A0DVzeGU6Ev/h0O0dr2r2g1lX5r0rBywb3 neRuvT17Hk3ntmnz2ucMYs7z8fKjpuYiQouQ/4vKuE01sxWh1avg/uriJK7oRNHbjdFT WGitYJzMlxn3zAgTKqdyzP2poWdUwe26CmevNbw/1iO7qV+jxAnF5Bbypg8ipyZ5s+wP nTlA== X-Gm-Message-State: AOUpUlEQg+KFn0uiEGPErSiboWGKK8yZbXOj/YpurEg87umNN6j8Z+6M sdoMqx1oVomO88NA5V1g8WOdRl2+/as= X-Google-Smtp-Source: AA+uWPz6/RwmkK8EL6YPjIOGhhC32akgVjqc4luQxQsqvQztDSPpCnj+Z4rwF7TqDB558AIP+81kbw== X-Received: by 2002:a62:d113:: with SMTP id z19-v6mr3835126pfg.98.1533846418029; Thu, 09 Aug 2018 13:26:58 -0700 (PDT) Received: from vader.thefacebook.com ([2620:10d:c090:200::4:dd24]) by smtp.gmail.com with ESMTPSA id u2-v6sm9841709pfn.59.2018.08.09.13.26.57 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 09 Aug 2018 13:26:57 -0700 (PDT) From: Omar Sandoval To: linux-block@vger.kernel.org Cc: Jens Axboe , kernel-team@fb.com Subject: [RFC PATCH 3/5] kyber: don't make domain token sbitmap larger than necessary Date: Thu, 9 Aug 2018 13:26:45 -0700 Message-Id: <47e82117e82563637b9a7091b827267f6874b9d2.1533846185.git.osandov@fb.com> X-Mailer: git-send-email 2.18.0 In-Reply-To: References: Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Omar Sandoval The domain token sbitmaps are currently initialized to the device queue depth or 256, whichever is larger, and immediately resized to the maximum depth for that domain (256, 128, or 64 for read, write, and other, respectively). The sbitmap is never resized larger than that, so it's unnecessary to allocate a bitmap larger than the maximum depth. Let's just allocate it to the maximum depth to begin with. This will use marginally less memory, and more importantly, give us a more appropriate number of bits per sbitmap word. Signed-off-by: Omar Sandoval --- block/kyber-iosched.c | 15 ++------------- 1 file changed, 2 insertions(+), 13 deletions(-) diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c index 95d062c07c61..08eb5295c18d 100644 --- a/block/kyber-iosched.c +++ b/block/kyber-iosched.c @@ -40,8 +40,6 @@ enum { }; enum { - KYBER_MIN_DEPTH = 256, - /* * In order to prevent starvation of synchronous requests by a flood of * asynchronous requests, we reserve 25% of requests for synchronous @@ -305,7 +303,6 @@ static int kyber_bucket_fn(const struct request *rq) static struct kyber_queue_data *kyber_queue_data_alloc(struct request_queue *q) { struct kyber_queue_data *kqd; - unsigned int max_tokens; unsigned int shift; int ret = -ENOMEM; int i; @@ -320,25 +317,17 @@ static struct kyber_queue_data *kyber_queue_data_alloc(struct request_queue *q) if (!kqd->cb) goto err_kqd; - /* - * The maximum number of tokens for any scheduling domain is at least - * the queue depth of a single hardware queue. If the hardware doesn't - * have many tags, still provide a reasonable number. - */ - max_tokens = max_t(unsigned int, q->tag_set->queue_depth, - KYBER_MIN_DEPTH); for (i = 0; i < KYBER_NUM_DOMAINS; i++) { WARN_ON(!kyber_depth[i]); WARN_ON(!kyber_batch_size[i]); ret = sbitmap_queue_init_node(&kqd->domain_tokens[i], - max_tokens, -1, false, GFP_KERNEL, - q->node); + kyber_depth[i], -1, false, + GFP_KERNEL, q->node); if (ret) { while (--i >= 0) sbitmap_queue_free(&kqd->domain_tokens[i]); goto err_cb; } - sbitmap_queue_resize(&kqd->domain_tokens[i], kyber_depth[i]); } shift = kyber_sched_tags_shift(kqd); From patchwork Thu Aug 9 20:26:46 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 10561905 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4374C14E2 for ; Thu, 9 Aug 2018 20:27:03 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2DA722B9EE for ; Thu, 9 Aug 2018 20:27:03 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 21F522B9FA; Thu, 9 Aug 2018 20:27:03 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B5F982B9EE for ; Thu, 9 Aug 2018 20:27:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727204AbeHIWx3 (ORCPT ); Thu, 9 Aug 2018 18:53:29 -0400 Received: from mail-pg1-f196.google.com ([209.85.215.196]:45115 "EHLO mail-pg1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726890AbeHIWx2 (ORCPT ); Thu, 9 Aug 2018 18:53:28 -0400 Received: by mail-pg1-f196.google.com with SMTP id f1-v6so3253188pgq.12 for ; Thu, 09 Aug 2018 13:27:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=Haxrv90UrgXM1UIInICsyvChOxqgTWK8N36nV9vUjXw=; b=UGC5FOu1Er7ScYYJGLQ/4weyM+A5VbhN2ifrwrpxzPeo0EDOEPNvhr9miPv+d2aTiW E0rH7Hf437tvdr6MLCgbdl5V4is30WHs5c4VjlaHwAQnn6qeFCktrefZHbrosBHCqy8Y ztV6CIGm79b6ocKnAN38QpP5woQ+2gcquO6VhfE07qEZ6XchYGbFplPb8LahLsJ7RquO DLV2SDTpRsNpTpvVbAJlRsm6fhhexdNM0+nDXXAxvD7kaZ+IeTxry6UW6+8SdXL/jjN/ UhMmFyw3TW8NrjBpMEvy8hUq+h9VAniRKO5giCEn8Pz4hgc/c+IdJhqCJ/3bJuSj1Bpb l46A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=Haxrv90UrgXM1UIInICsyvChOxqgTWK8N36nV9vUjXw=; b=aHRJigGn0lJkoCHXcIambpQfuFEPlQpPSgoctzNtmnQ0CAeh159Ctoz6iNwopMtBta jenFv1dHDktP5WtevsrktGno95+q/w2ql6EqiIIFbQSNklhdmmmBo6z+E4fEhPFi42je Y2L/M6spvfESYdWxxSovuZ/qXuWNcl8W7/yntStlRvp6XXLGRdhuCZiWxTAQqLEwexfj nYAbS5Slj70x3waX0lIDzEIyfbP1S362NGwCpFBV4uW1LCasYNWtSTAmNoHj+GYiYJwm qR6hSwBRFHgDiHIUdmJ33gwMUK/iHF2P5rKXRL8wvClXaru2GfIKtQSIFAYWKIyGWTWz fQ1A== X-Gm-Message-State: AOUpUlFBCZCnYKtbp4v65rNuItRxlwjOuTmkWOb5YdNPx2Z+AXLH8KKk s0nLHCp3WXJW961mvhP616D1clkVlRo= X-Google-Smtp-Source: AA+uWPw/W6W5X/XlDA/EyYkQe3B8O636xRZG210Qu3CP7o/FJCdzcwKP3+1VcYFeVdRGKloF8jXykA== X-Received: by 2002:a62:9645:: with SMTP id c66-v6mr3927658pfe.56.1533846419265; Thu, 09 Aug 2018 13:26:59 -0700 (PDT) Received: from vader.thefacebook.com ([2620:10d:c090:200::4:dd24]) by smtp.gmail.com with ESMTPSA id u2-v6sm9841709pfn.59.2018.08.09.13.26.58 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 09 Aug 2018 13:26:58 -0700 (PDT) From: Omar Sandoval To: linux-block@vger.kernel.org Cc: Jens Axboe , kernel-team@fb.com Subject: [RFC PATCH 4/5] kyber: implement improved heuristics Date: Thu, 9 Aug 2018 13:26:46 -0700 Message-Id: <4440c3f4f1e58f6790cbae57cc138b77cbad84b8.1533846185.git.osandov@fb.com> X-Mailer: git-send-email 2.18.0 In-Reply-To: References: Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Omar Sandoval Kyber's current heuristics have a few flaws: - It's based on the mean latency, but p99 latency tends to be more meaningful to anyone who cares about latency. The mean can also be skewed by rare outliers that the scheduler can't do anything about. - The statistics calculations are purely time-based with a short window. This works for steady, high load, but is more sensitive to outliers with bursty workloads. - It only considers the latency once an I/O has been submitted to the device, but the user cares about the time spent in the kernel, as well. These are shortcomings of the generic blk-stat code which doesn't quite fit the ideal use case for Kyber. So, this replaces the statistics with a histogram used to calculate percentiles of total latency and I/O latency, which we then use to adjust depths in a slightly more intelligent manner: - Sync and async writes are now the same domain. - Discards are a separate domain. - Domain queue depths are scaled by the ratio of the p99 total latency to the target latency (e.g., if the p99 latency is double the target latency, we will double the queue depth; if the p99 latency is half of the target latency, we can halve the queue depth). - We use the I/O latency to determine whether we should scale queue depths down: we will only scale down if any domain's I/O latency exceeds the target latency, which is an indicator of congestion in the device. These new heuristics are just as scalable as the heuristics they replace. Signed-off-by: Omar Sandoval --- block/kyber-iosched.c | 497 ++++++++++++++++++++++++------------------ 1 file changed, 279 insertions(+), 218 deletions(-) diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c index 08eb5295c18d..adc8e6393829 100644 --- a/block/kyber-iosched.c +++ b/block/kyber-iosched.c @@ -29,13 +29,16 @@ #include "blk-mq-debugfs.h" #include "blk-mq-sched.h" #include "blk-mq-tag.h" -#include "blk-stat.h" -/* Scheduling domains. */ +/* + * Scheduling domains: the device is divided into multiple domains based on the + * request type. + */ enum { KYBER_READ, - KYBER_SYNC_WRITE, - KYBER_OTHER, /* Async writes, discard, etc. */ + KYBER_WRITE, + KYBER_DISCARD, + KYBER_OTHER, KYBER_NUM_DOMAINS, }; @@ -49,25 +52,82 @@ enum { }; /* - * Initial device-wide depths for each scheduling domain. + * Maximum device-wide depth for each scheduling domain. * - * Even for fast devices with lots of tags like NVMe, you can saturate - * the device with only a fraction of the maximum possible queue depth. - * So, we cap these to a reasonable value. + * Even for fast devices with lots of tags like NVMe, you can saturate the + * device with only a fraction of the maximum possible queue depth. So, we cap + * these to a reasonable value. */ static const unsigned int kyber_depth[] = { [KYBER_READ] = 256, - [KYBER_SYNC_WRITE] = 128, - [KYBER_OTHER] = 64, + [KYBER_WRITE] = 128, + [KYBER_DISCARD] = 64, + [KYBER_OTHER] = 16, }; /* - * Scheduling domain batch sizes. We favor reads. + * Default latency targets for each scheduling domain. + */ +static const u64 kyber_latency_targets[] = { + [KYBER_READ] = 2 * NSEC_PER_MSEC, + [KYBER_WRITE] = 10 * NSEC_PER_MSEC, + [KYBER_DISCARD] = 5 * NSEC_PER_SEC, +}; + +/* + * Batch size (number of requests we'll dispatch in a row) for each scheduling + * domain. */ static const unsigned int kyber_batch_size[] = { [KYBER_READ] = 16, - [KYBER_SYNC_WRITE] = 8, - [KYBER_OTHER] = 8, + [KYBER_WRITE] = 8, + [KYBER_DISCARD] = 1, + [KYBER_OTHER] = 1, +}; + +/* + * Requests latencies are recorded in a histogram with buckets defined relative + * to the target latency: + * + * <= 1/4 * target latency + * <= 1/2 * target latency + * <= 3/4 * target latency + * <= target latency + * <= 1 1/4 * target latency + * <= 1 1/2 * target latency + * <= 1 3/4 * target latency + * > 1 3/4 * target latency + */ +enum { + /* + * The width of the latency histogram buckets is + * 1 / (1 << KYBER_LATENCY_SHIFT) * target latency. + */ + KYBER_LATENCY_SHIFT = 2, + /* + * The first (1 << KYBER_LATENCY_SHIFT) buckets are <= target latency, + * thus, "good". + */ + KYBER_GOOD_BUCKETS = 1 << KYBER_LATENCY_SHIFT, + /* There are also (1 << KYBER_LATENCY_SHIFT) "bad" buckets. */ + KYBER_LATENCY_BUCKETS = 2 << KYBER_LATENCY_SHIFT, +}; + +/* + * We measure both the total latency and the I/O latency (i.e., latency after + * submitting to the device). + */ +enum { + KYBER_TOTAL_LATENCY, + KYBER_IO_LATENCY, +}; + +/* + * Per-cpu latency histograms: total latency and I/O latency for each scheduling + * domain except for KYBER_OTHER. + */ +struct kyber_cpu_latency { + atomic_t buckets[KYBER_OTHER][2][KYBER_LATENCY_BUCKETS]; }; /* @@ -84,14 +144,9 @@ struct kyber_ctx_queue { } ____cacheline_aligned_in_smp; struct kyber_queue_data { - struct request_queue *q; - - struct blk_stat_callback *cb; - /* - * The device is divided into multiple scheduling domains based on the - * request type. Each domain has a fixed number of in-flight requests of - * that type device-wide, limited by these tokens. + * Each scheduling domain has a limited number of in-flight requests + * device-wide, limited by these tokens. */ struct sbitmap_queue domain_tokens[KYBER_NUM_DOMAINS]; @@ -101,8 +156,19 @@ struct kyber_queue_data { */ unsigned int async_depth; + struct kyber_cpu_latency __percpu *cpu_latency; + + /* Timer for stats aggregation and adjusting domain tokens. */ + struct timer_list timer; + + unsigned int latency_buckets[KYBER_OTHER][2][KYBER_LATENCY_BUCKETS]; + + unsigned long latency_timeout[KYBER_OTHER]; + + int domain_p99[KYBER_OTHER]; + /* Target latencies in nanoseconds. */ - u64 read_lat_nsec, write_lat_nsec; + u64 latency_targets[KYBER_OTHER]; }; struct kyber_hctx_data { @@ -122,182 +188,165 @@ static int kyber_domain_wake(wait_queue_entry_t *wait, unsigned mode, int flags, static unsigned int kyber_sched_domain(unsigned int op) { - if ((op & REQ_OP_MASK) == REQ_OP_READ) + switch (op & REQ_OP_MASK) { + case REQ_OP_READ: return KYBER_READ; - else if ((op & REQ_OP_MASK) == REQ_OP_WRITE && op_is_sync(op)) - return KYBER_SYNC_WRITE; - else + case REQ_OP_WRITE: + return KYBER_WRITE; + case REQ_OP_DISCARD: + return KYBER_DISCARD; + default: return KYBER_OTHER; + } } -enum { - NONE = 0, - GOOD = 1, - GREAT = 2, - BAD = -1, - AWFUL = -2, -}; - -#define IS_GOOD(status) ((status) > 0) -#define IS_BAD(status) ((status) < 0) - -static int kyber_lat_status(struct blk_stat_callback *cb, - unsigned int sched_domain, u64 target) +static void flush_latency_buckets(struct kyber_queue_data *kqd, + struct kyber_cpu_latency *cpu_latency, + unsigned int sched_domain, unsigned int type) { - u64 latency; - - if (!cb->stat[sched_domain].nr_samples) - return NONE; + unsigned int *buckets = kqd->latency_buckets[sched_domain][type]; + atomic_t *cpu_buckets = cpu_latency->buckets[sched_domain][type]; + unsigned int bucket; - latency = cb->stat[sched_domain].mean; - if (latency >= 2 * target) - return AWFUL; - else if (latency > target) - return BAD; - else if (latency <= target / 2) - return GREAT; - else /* (latency <= target) */ - return GOOD; + for (bucket = 0; bucket < KYBER_LATENCY_BUCKETS; bucket++) + buckets[bucket] += atomic_xchg(&cpu_buckets[bucket], 0); } /* - * Adjust the read or synchronous write depth given the status of reads and - * writes. The goal is that the latencies of the two domains are fair (i.e., if - * one is good, then the other is good). + * Calculate the histogram bucket with the given percentile rank, or -1 if there + * aren't enough samples yet. */ -static void kyber_adjust_rw_depth(struct kyber_queue_data *kqd, - unsigned int sched_domain, int this_status, - int other_status) +static int calculate_percentile(struct kyber_queue_data *kqd, + unsigned int sched_domain, unsigned int type, + unsigned int percentile) { - unsigned int orig_depth, depth; + unsigned int *buckets = kqd->latency_buckets[sched_domain][type]; + unsigned int bucket, samples = 0, percentile_samples; + + for (bucket = 0; bucket < KYBER_LATENCY_BUCKETS; bucket++) + samples += buckets[bucket]; + + if (!samples) + return -1; /* - * If this domain had no samples, or reads and writes are both good or - * both bad, don't adjust the depth. + * We do the calculation once we have 500 samples or one second passes + * since the first sample was recorded, whichever comes first. */ - if (this_status == NONE || - (IS_GOOD(this_status) && IS_GOOD(other_status)) || - (IS_BAD(this_status) && IS_BAD(other_status))) - return; - - orig_depth = depth = kqd->domain_tokens[sched_domain].sb.depth; + if (!kqd->latency_timeout[sched_domain]) + kqd->latency_timeout[sched_domain] = max(jiffies + HZ, 1UL); + if (samples < 500 && + time_is_after_jiffies(kqd->latency_timeout[sched_domain])) { + return -1; + } + kqd->latency_timeout[sched_domain] = 0; - if (other_status == NONE) { - depth++; - } else { - switch (this_status) { - case GOOD: - if (other_status == AWFUL) - depth -= max(depth / 4, 1U); - else - depth -= max(depth / 8, 1U); - break; - case GREAT: - if (other_status == AWFUL) - depth /= 2; - else - depth -= max(depth / 4, 1U); + percentile_samples = DIV_ROUND_UP(samples * percentile, 100); + for (bucket = 0; bucket < KYBER_LATENCY_BUCKETS - 1; bucket++) { + if (buckets[bucket] >= percentile_samples) break; - case BAD: - depth++; - break; - case AWFUL: - if (other_status == GREAT) - depth += 2; - else - depth++; - break; - } + percentile_samples -= buckets[bucket]; } + memset(buckets, 0, sizeof(kqd->latency_buckets[sched_domain][type])); + return bucket; +} + +static void kyber_resize_domain(struct kyber_queue_data *kqd, + unsigned int sched_domain, unsigned int depth) +{ depth = clamp(depth, 1U, kyber_depth[sched_domain]); - if (depth != orig_depth) + if (depth != kqd->domain_tokens[sched_domain].sb.depth) sbitmap_queue_resize(&kqd->domain_tokens[sched_domain], depth); } -/* - * Adjust the depth of other requests given the status of reads and synchronous - * writes. As long as either domain is doing fine, we don't throttle, but if - * both domains are doing badly, we throttle heavily. - */ -static void kyber_adjust_other_depth(struct kyber_queue_data *kqd, - int read_status, int write_status, - bool have_samples) -{ - unsigned int orig_depth, depth; - int status; - - orig_depth = depth = kqd->domain_tokens[KYBER_OTHER].sb.depth; - - if (read_status == NONE && write_status == NONE) { - depth += 2; - } else if (have_samples) { - if (read_status == NONE) - status = write_status; - else if (write_status == NONE) - status = read_status; - else - status = max(read_status, write_status); - switch (status) { - case GREAT: - depth += 2; - break; - case GOOD: - depth++; - break; - case BAD: - depth -= max(depth / 4, 1U); - break; - case AWFUL: - depth /= 2; - break; +static void kyber_timer_fn(struct timer_list *t) +{ + struct kyber_queue_data *kqd = from_timer(kqd, t, timer); + unsigned int sched_domain; + int cpu; + bool bad = false; + + /* Sum all of the per-cpu latency histograms. */ + for_each_online_cpu(cpu) { + struct kyber_cpu_latency *cpu_latency; + + cpu_latency = per_cpu_ptr(kqd->cpu_latency, cpu); + for (sched_domain = 0; sched_domain < KYBER_OTHER; sched_domain++) { + flush_latency_buckets(kqd, cpu_latency, sched_domain, + KYBER_TOTAL_LATENCY); + flush_latency_buckets(kqd, cpu_latency, sched_domain, + KYBER_IO_LATENCY); } } - depth = clamp(depth, 1U, kyber_depth[KYBER_OTHER]); - if (depth != orig_depth) - sbitmap_queue_resize(&kqd->domain_tokens[KYBER_OTHER], depth); -} - -/* - * Apply heuristics for limiting queue depths based on gathered latency - * statistics. - */ -static void kyber_stat_timer_fn(struct blk_stat_callback *cb) -{ - struct kyber_queue_data *kqd = cb->data; - int read_status, write_status; - - read_status = kyber_lat_status(cb, KYBER_READ, kqd->read_lat_nsec); - write_status = kyber_lat_status(cb, KYBER_SYNC_WRITE, kqd->write_lat_nsec); + /* + * Check if any domains have a high I/O latency, which might indicate + * congestion in the device. Note that we use the p90; we don't want to + * be too sensitive to outliers here. + */ + for (sched_domain = 0; sched_domain < KYBER_OTHER; sched_domain++) { + int p90; - kyber_adjust_rw_depth(kqd, KYBER_READ, read_status, write_status); - kyber_adjust_rw_depth(kqd, KYBER_SYNC_WRITE, write_status, read_status); - kyber_adjust_other_depth(kqd, read_status, write_status, - cb->stat[KYBER_OTHER].nr_samples != 0); + p90 = calculate_percentile(kqd, sched_domain, KYBER_IO_LATENCY, + 90); + if (p90 >= KYBER_GOOD_BUCKETS) + bad = true; + } /* - * Continue monitoring latencies if we aren't hitting the targets or - * we're still throttling other requests. + * Adjust the scheduling domain depths. If we determined that there was + * congestion, we throttle all domains with good latencies. Either way, + * we ease up on throttling domains with bad latencies. */ - if (!blk_stat_is_active(kqd->cb) && - ((IS_BAD(read_status) || IS_BAD(write_status) || - kqd->domain_tokens[KYBER_OTHER].sb.depth < kyber_depth[KYBER_OTHER]))) - blk_stat_activate_msecs(kqd->cb, 100); + for (sched_domain = 0; sched_domain < KYBER_OTHER; sched_domain++) { + unsigned int orig_depth, depth; + int p99; + + p99 = calculate_percentile(kqd, sched_domain, + KYBER_TOTAL_LATENCY, 99); + /* + * This is kind of subtle: different domains will not + * necessarily have enough samples to calculate the latency + * percentiles during the same window, so we have to remember + * the p99 for the next time we observe congestion; once we do, + * we don't want to throttle again until we get more data, so we + * reset it to -1. + */ + if (bad) { + if (p99 < 0) + p99 = kqd->domain_p99[sched_domain]; + kqd->domain_p99[sched_domain] = -1; + } else if (p99 >= 0) { + kqd->domain_p99[sched_domain] = p99; + } + if (p99 < 0) + continue; + + /* + * If this domain has bad latency, throttle less. Otherwise, + * throttle more iff we determined that there is congestion. + * + * The new depth is scaled linearly with the p99 latency vs the + * latency target. E.g., if the p99 is 3/4 of the target, then + * we throttle down to 3/4 of the current depth, and if the p99 + * is 2x the target, then we double the depth. + */ + if (bad || p99 >= KYBER_GOOD_BUCKETS) { + orig_depth = kqd->domain_tokens[sched_domain].sb.depth; + depth = (orig_depth * (p99 + 1)) >> KYBER_LATENCY_SHIFT; + kyber_resize_domain(kqd, sched_domain, depth); + } + } } -static unsigned int kyber_sched_tags_shift(struct kyber_queue_data *kqd) +static unsigned int kyber_sched_tags_shift(struct request_queue *q) { /* * All of the hardware queues have the same depth, so we can just grab * the shift of the first one. */ - return kqd->q->queue_hw_ctx[0]->sched_tags->bitmap_tags.sb.shift; -} - -static int kyber_bucket_fn(const struct request *rq) -{ - return kyber_sched_domain(rq->cmd_flags); + return q->queue_hw_ctx[0]->sched_tags->bitmap_tags.sb.shift; } static struct kyber_queue_data *kyber_queue_data_alloc(struct request_queue *q) @@ -307,16 +356,17 @@ static struct kyber_queue_data *kyber_queue_data_alloc(struct request_queue *q) int ret = -ENOMEM; int i; - kqd = kmalloc_node(sizeof(*kqd), GFP_KERNEL, q->node); + kqd = kzalloc_node(sizeof(*kqd), GFP_KERNEL, q->node); if (!kqd) goto err; - kqd->q = q; - kqd->cb = blk_stat_alloc_callback(kyber_stat_timer_fn, kyber_bucket_fn, - KYBER_NUM_DOMAINS, kqd); - if (!kqd->cb) + kqd->cpu_latency = alloc_percpu_gfp(struct kyber_cpu_latency, + GFP_KERNEL | __GFP_ZERO); + if (!kqd->cpu_latency) goto err_kqd; + timer_setup(&kqd->timer, kyber_timer_fn, 0); + for (i = 0; i < KYBER_NUM_DOMAINS; i++) { WARN_ON(!kyber_depth[i]); WARN_ON(!kyber_batch_size[i]); @@ -326,20 +376,22 @@ static struct kyber_queue_data *kyber_queue_data_alloc(struct request_queue *q) if (ret) { while (--i >= 0) sbitmap_queue_free(&kqd->domain_tokens[i]); - goto err_cb; + goto err_buckets; } } - shift = kyber_sched_tags_shift(kqd); - kqd->async_depth = (1U << shift) * KYBER_ASYNC_PERCENT / 100U; + for (i = 0; i < KYBER_OTHER; i++) { + kqd->domain_p99[i] = -1; + kqd->latency_targets[i] = kyber_latency_targets[i]; + } - kqd->read_lat_nsec = 2000000ULL; - kqd->write_lat_nsec = 10000000ULL; + shift = kyber_sched_tags_shift(q); + kqd->async_depth = (1U << shift) * KYBER_ASYNC_PERCENT / 100U; return kqd; -err_cb: - blk_stat_free_callback(kqd->cb); +err_buckets: + free_percpu(kqd->cpu_latency); err_kqd: kfree(kqd); err: @@ -361,25 +413,24 @@ static int kyber_init_sched(struct request_queue *q, struct elevator_type *e) return PTR_ERR(kqd); } + blk_stat_enable_accounting(q); + eq->elevator_data = kqd; q->elevator = eq; - blk_stat_add_callback(q, kqd->cb); - return 0; } static void kyber_exit_sched(struct elevator_queue *e) { struct kyber_queue_data *kqd = e->elevator_data; - struct request_queue *q = kqd->q; int i; - blk_stat_remove_callback(q, kqd->cb); + del_timer_sync(&kqd->timer); for (i = 0; i < KYBER_NUM_DOMAINS; i++) sbitmap_queue_free(&kqd->domain_tokens[i]); - blk_stat_free_callback(kqd->cb); + free_percpu(kqd->cpu_latency); kfree(kqd); } @@ -547,40 +598,44 @@ static void kyber_finish_request(struct request *rq) rq_clear_domain_token(kqd, rq); } -static void kyber_completed_request(struct request *rq, u64 now) +static void add_latency_sample(struct kyber_cpu_latency *cpu_latency, + unsigned int sched_domain, unsigned int type, + u64 target, u64 latency) { - struct request_queue *q = rq->q; - struct kyber_queue_data *kqd = q->elevator->elevator_data; - unsigned int sched_domain; - u64 latency, target; + unsigned int bucket; + u64 divisor; - /* - * Check if this request met our latency goal. If not, quickly gather - * some statistics and start throttling. - */ - sched_domain = kyber_sched_domain(rq->cmd_flags); - switch (sched_domain) { - case KYBER_READ: - target = kqd->read_lat_nsec; - break; - case KYBER_SYNC_WRITE: - target = kqd->write_lat_nsec; - break; - default: - return; + if (latency > 0) { + divisor = max_t(u64, target >> KYBER_LATENCY_SHIFT, 1); + bucket = min_t(unsigned int, div64_u64(latency - 1, divisor), + KYBER_LATENCY_BUCKETS - 1); + } else { + bucket = 0; } - /* If we are already monitoring latencies, don't check again. */ - if (blk_stat_is_active(kqd->cb)) - return; + atomic_inc(&cpu_latency->buckets[sched_domain][type][bucket]); +} - if (now < rq->io_start_time_ns) +static void kyber_completed_request(struct request *rq, u64 now) +{ + struct kyber_queue_data *kqd = rq->q->elevator->elevator_data; + struct kyber_cpu_latency *cpu_latency; + unsigned int sched_domain; + u64 target; + + sched_domain = kyber_sched_domain(rq->cmd_flags); + if (sched_domain == KYBER_OTHER) return; - latency = now - rq->io_start_time_ns; + cpu_latency = get_cpu_ptr(kqd->cpu_latency); + target = kqd->latency_targets[sched_domain]; + add_latency_sample(cpu_latency, sched_domain, KYBER_TOTAL_LATENCY, + target, now - rq->start_time_ns); + add_latency_sample(cpu_latency, sched_domain, KYBER_IO_LATENCY, target, + now - rq->io_start_time_ns); + put_cpu_ptr(kqd->cpu_latency); - if (latency > target) - blk_stat_activate_msecs(kqd->cb, 10); + timer_reduce(&kqd->timer, jiffies + HZ / 10); } struct flush_kcq_data { @@ -778,17 +833,17 @@ static bool kyber_has_work(struct blk_mq_hw_ctx *hctx) return false; } -#define KYBER_LAT_SHOW_STORE(op) \ -static ssize_t kyber_##op##_lat_show(struct elevator_queue *e, \ - char *page) \ +#define KYBER_LAT_SHOW_STORE(domain, name) \ +static ssize_t kyber_##name##_lat_show(struct elevator_queue *e, \ + char *page) \ { \ struct kyber_queue_data *kqd = e->elevator_data; \ \ - return sprintf(page, "%llu\n", kqd->op##_lat_nsec); \ + return sprintf(page, "%llu\n", kqd->latency_targets[domain]); \ } \ \ -static ssize_t kyber_##op##_lat_store(struct elevator_queue *e, \ - const char *page, size_t count) \ +static ssize_t kyber_##name##_lat_store(struct elevator_queue *e, \ + const char *page, size_t count) \ { \ struct kyber_queue_data *kqd = e->elevator_data; \ unsigned long long nsec; \ @@ -798,12 +853,12 @@ static ssize_t kyber_##op##_lat_store(struct elevator_queue *e, \ if (ret) \ return ret; \ \ - kqd->op##_lat_nsec = nsec; \ + kqd->latency_targets[domain] = nsec; \ \ return count; \ } -KYBER_LAT_SHOW_STORE(read); -KYBER_LAT_SHOW_STORE(write); +KYBER_LAT_SHOW_STORE(KYBER_READ, read); +KYBER_LAT_SHOW_STORE(KYBER_WRITE, write); #undef KYBER_LAT_SHOW_STORE #define KYBER_LAT_ATTR(op) __ATTR(op##_lat_nsec, 0644, kyber_##op##_lat_show, kyber_##op##_lat_store) @@ -870,7 +925,8 @@ static int kyber_##name##_waiting_show(void *data, struct seq_file *m) \ return 0; \ } KYBER_DEBUGFS_DOMAIN_ATTRS(KYBER_READ, read) -KYBER_DEBUGFS_DOMAIN_ATTRS(KYBER_SYNC_WRITE, sync_write) +KYBER_DEBUGFS_DOMAIN_ATTRS(KYBER_WRITE, write) +KYBER_DEBUGFS_DOMAIN_ATTRS(KYBER_DISCARD, discard) KYBER_DEBUGFS_DOMAIN_ATTRS(KYBER_OTHER, other) #undef KYBER_DEBUGFS_DOMAIN_ATTRS @@ -892,8 +948,11 @@ static int kyber_cur_domain_show(void *data, struct seq_file *m) case KYBER_READ: seq_puts(m, "READ\n"); break; - case KYBER_SYNC_WRITE: - seq_puts(m, "SYNC_WRITE\n"); + case KYBER_WRITE: + seq_puts(m, "WRITE\n"); + break; + case KYBER_DISCARD: + seq_puts(m, "DISCARD\n"); break; case KYBER_OTHER: seq_puts(m, "OTHER\n"); @@ -918,7 +977,8 @@ static int kyber_batching_show(void *data, struct seq_file *m) {#name "_tokens", 0400, kyber_##name##_tokens_show} static const struct blk_mq_debugfs_attr kyber_queue_debugfs_attrs[] = { KYBER_QUEUE_DOMAIN_ATTRS(read), - KYBER_QUEUE_DOMAIN_ATTRS(sync_write), + KYBER_QUEUE_DOMAIN_ATTRS(write), + KYBER_QUEUE_DOMAIN_ATTRS(discard), KYBER_QUEUE_DOMAIN_ATTRS(other), {"async_depth", 0400, kyber_async_depth_show}, {}, @@ -930,7 +990,8 @@ static const struct blk_mq_debugfs_attr kyber_queue_debugfs_attrs[] = { {#name "_waiting", 0400, kyber_##name##_waiting_show} static const struct blk_mq_debugfs_attr kyber_hctx_debugfs_attrs[] = { KYBER_HCTX_DOMAIN_ATTRS(read), - KYBER_HCTX_DOMAIN_ATTRS(sync_write), + KYBER_HCTX_DOMAIN_ATTRS(write), + KYBER_HCTX_DOMAIN_ATTRS(discard), KYBER_HCTX_DOMAIN_ATTRS(other), {"cur_domain", 0400, kyber_cur_domain_show}, {"batching", 0400, kyber_batching_show}, From patchwork Thu Aug 9 20:26:47 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 10561907 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6EF7E139A for ; Thu, 9 Aug 2018 20:27:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5A9882B9EE for ; Thu, 9 Aug 2018 20:27:04 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4F30E2B9F7; Thu, 9 Aug 2018 20:27:04 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A1F892B9F4 for ; Thu, 9 Aug 2018 20:27:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726890AbeHIWxa (ORCPT ); Thu, 9 Aug 2018 18:53:30 -0400 Received: from mail-pl0-f68.google.com ([209.85.160.68]:40189 "EHLO mail-pl0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727167AbeHIWx3 (ORCPT ); Thu, 9 Aug 2018 18:53:29 -0400 Received: by mail-pl0-f68.google.com with SMTP id s17-v6so3011838plp.7 for ; Thu, 09 Aug 2018 13:27:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=eEWrkVmf5YsZyHiWflFbcU+IoFjU3MHuzkLxE62EYYE=; b=jkqb/phkYEDfi/b2STS0bGuOUjaz79RoGRNxL/EjWspz8y7QMU58unaJeuyDyWVngh NGcrPJhVIYJ6kGV9vUhRcF/s5b+zP4xvooj1+Ja+0sKfUgE3M2gK0zgYPk4TgQuTQ7Tr zn4lkvPtu10aEfzdPzuR5X/BFmhXxXdMAefWy/FUznXYts3h7+GyDHT2rA38Psizisgg ZvWdDHg2UHdcJl04T2261elvYEi0QlZqFndvDWTSyOvVRAz9g6js4zjJDVAo13nLuLjJ TPa4kbqR0Uwi/WCbNThvROnmIQbUfAXUedFE5fEWRctOeu24n+GenirrhcnluiZV5RVV k2HQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=eEWrkVmf5YsZyHiWflFbcU+IoFjU3MHuzkLxE62EYYE=; b=TzZCXcM+5H/YdTlZLK8uPKPewWTaKNCZMTpyCuYrP0TR1nUzvaMd8ZTzpXoys+U8cx X3wXZKlAA541kJLJaB6N82rkbXkmaGsrGHuZS+BhwxiSYOB4GIQufBWA72ACQ3Fqf+ft fxZjJgqgb8pbtEslaEao2jlnQErxu7GhKiCbi1rFbms9h7ovDzbFxV5Cu1iFgg1RW9K6 r28YTlMnKStDRm3X4dviyIaepyq/Cng5byPiCvRC3cxa0WRbZxINJr1xAD7TOl6K3zOr WN+om9fPSgO7pZmcR3XoMrizp6Tn5/PzRPnYB9a9EIsCwBqv2fxkye2zW8Squ/ZnHtzO evRA== X-Gm-Message-State: AOUpUlFkp9m6yB4Z08J5hICxP3MF3jATUwU0GzJlYIYkh3k/1BjJJB3s +s+25Ns+vaVZqO4/tsf0tMhrAtFunf4= X-Google-Smtp-Source: AA+uWPyVd1sLw/zIHw/1jf2VPC6kudLEs/AWfFG/ZHnqjjAI8DdYGdhNLTbYirB995HsGx+OjnQFnQ== X-Received: by 2002:a17:902:9884:: with SMTP id s4-v6mr3346057plp.127.1533846420515; Thu, 09 Aug 2018 13:27:00 -0700 (PDT) Received: from vader.thefacebook.com ([2620:10d:c090:200::4:dd24]) by smtp.gmail.com with ESMTPSA id u2-v6sm9841709pfn.59.2018.08.09.13.26.59 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 09 Aug 2018 13:26:59 -0700 (PDT) From: Omar Sandoval To: linux-block@vger.kernel.org Cc: Jens Axboe , kernel-team@fb.com Subject: [RFC PATCH 5/5] kyber: add tracepoints Date: Thu, 9 Aug 2018 13:26:47 -0700 Message-Id: <2f48af6da42d785330e11bc8a7df73aab7f702a0.1533846185.git.osandov@fb.com> X-Mailer: git-send-email 2.18.0 In-Reply-To: References: Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Omar Sandoval When debugging Kyber, it's really useful to know what latencies we've been having and how the domain depths have been adjusted. Add two tracepoints, kyber_latency and kyber_adjust, to record that. Signed-off-by: Omar Sandoval --- block/kyber-iosched.c | 46 +++++++++++++--------- include/trace/events/kyber.h | 76 ++++++++++++++++++++++++++++++++++++ 2 files changed, 104 insertions(+), 18 deletions(-) create mode 100644 include/trace/events/kyber.h diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c index adc8e6393829..d8ddcb4f2435 100644 --- a/block/kyber-iosched.c +++ b/block/kyber-iosched.c @@ -30,6 +30,9 @@ #include "blk-mq-sched.h" #include "blk-mq-tag.h" +#define CREATE_TRACE_POINTS +#include + /* * Scheduling domains: the device is divided into multiple domains based on the * request type. @@ -42,6 +45,13 @@ enum { KYBER_NUM_DOMAINS, }; +static const char *kyber_domain_names[] = { + [KYBER_READ] = "READ", + [KYBER_WRITE] = "WRITE", + [KYBER_DISCARD] = "DISCARD", + [KYBER_OTHER] = "OTHER", +}; + enum { /* * In order to prevent starvation of synchronous requests by a flood of @@ -122,6 +132,11 @@ enum { KYBER_IO_LATENCY, }; +static const char *kyber_latency_type_names[] = { + [KYBER_TOTAL_LATENCY] = "total", + [KYBER_IO_LATENCY] = "I/O", +}; + /* * Per-cpu latency histograms: total latency and I/O latency for each scheduling * domain except for KYBER_OTHER. @@ -144,6 +159,8 @@ struct kyber_ctx_queue { } ____cacheline_aligned_in_smp; struct kyber_queue_data { + struct request_queue *q; + /* * Each scheduling domain has a limited number of in-flight requests * device-wide, limited by these tokens. @@ -249,6 +266,10 @@ static int calculate_percentile(struct kyber_queue_data *kqd, } memset(buckets, 0, sizeof(kqd->latency_buckets[sched_domain][type])); + trace_kyber_latency(kqd->q, kyber_domain_names[sched_domain], + kyber_latency_type_names[type], percentile, + bucket + 1, 1 << KYBER_LATENCY_SHIFT, samples); + return bucket; } @@ -256,8 +277,11 @@ static void kyber_resize_domain(struct kyber_queue_data *kqd, unsigned int sched_domain, unsigned int depth) { depth = clamp(depth, 1U, kyber_depth[sched_domain]); - if (depth != kqd->domain_tokens[sched_domain].sb.depth) + if (depth != kqd->domain_tokens[sched_domain].sb.depth) { sbitmap_queue_resize(&kqd->domain_tokens[sched_domain], depth); + trace_kyber_adjust(kqd->q, kyber_domain_names[sched_domain], + depth); + } } static void kyber_timer_fn(struct timer_list *t) @@ -360,6 +384,8 @@ static struct kyber_queue_data *kyber_queue_data_alloc(struct request_queue *q) if (!kqd) goto err; + kqd->q = q; + kqd->cpu_latency = alloc_percpu_gfp(struct kyber_cpu_latency, GFP_KERNEL | __GFP_ZERO); if (!kqd->cpu_latency) @@ -944,23 +970,7 @@ static int kyber_cur_domain_show(void *data, struct seq_file *m) struct blk_mq_hw_ctx *hctx = data; struct kyber_hctx_data *khd = hctx->sched_data; - switch (khd->cur_domain) { - case KYBER_READ: - seq_puts(m, "READ\n"); - break; - case KYBER_WRITE: - seq_puts(m, "WRITE\n"); - break; - case KYBER_DISCARD: - seq_puts(m, "DISCARD\n"); - break; - case KYBER_OTHER: - seq_puts(m, "OTHER\n"); - break; - default: - seq_printf(m, "%u\n", khd->cur_domain); - break; - } + seq_printf(m, "%s\n", kyber_domain_names[khd->cur_domain]); return 0; } diff --git a/include/trace/events/kyber.h b/include/trace/events/kyber.h new file mode 100644 index 000000000000..9bf85dbef492 --- /dev/null +++ b/include/trace/events/kyber.h @@ -0,0 +1,76 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM kyber + +#if !defined(_TRACE_KYBER_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_KYBER_H + +#include +#include + +#define DOMAIN_LEN 16 +#define LATENCY_TYPE_LEN 8 + +TRACE_EVENT(kyber_latency, + + TP_PROTO(struct request_queue *q, const char *domain, const char *type, + unsigned int percentile, unsigned int numerator, + unsigned int denominator, unsigned int samples), + + TP_ARGS(q, domain, type, percentile, numerator, denominator, samples), + + TP_STRUCT__entry( + __field( dev_t, dev ) + __array( char, domain, DOMAIN_LEN ) + __array( char, type, LATENCY_TYPE_LEN ) + __field( u8, percentile ) + __field( u8, numerator ) + __field( u8, denominator ) + __field( unsigned int, samples ) + ), + + TP_fast_assign( + __entry->dev = disk_devt(dev_to_disk(kobj_to_dev(q->kobj.parent))); + strlcpy(__entry->domain, domain, DOMAIN_LEN); + strlcpy(__entry->type, type, DOMAIN_LEN); + __entry->percentile = percentile; + __entry->numerator = numerator; + __entry->denominator = denominator; + __entry->samples = samples; + ), + + TP_printk("%d,%d %s %s p%u %u/%u samples=%u", + MAJOR(__entry->dev), MINOR(__entry->dev), __entry->domain, + __entry->type, __entry->percentile, __entry->numerator, + __entry->denominator, __entry->samples) +); + +TRACE_EVENT(kyber_adjust, + + TP_PROTO(struct request_queue *q, const char *domain, + unsigned int depth), + + TP_ARGS(q, domain, depth), + + TP_STRUCT__entry( + __field( dev_t, dev ) + __array( char, domain, DOMAIN_LEN ) + __field( unsigned int, depth ) + ), + + TP_fast_assign( + __entry->dev = disk_devt(dev_to_disk(kobj_to_dev(q->kobj.parent))); + strlcpy(__entry->domain, domain, DOMAIN_LEN); + __entry->depth = depth; + ), + + TP_printk("%d,%d %s %u", + MAJOR(__entry->dev), MINOR(__entry->dev), __entry->domain, + __entry->depth) +); + +#define _TRACE_KYBER_H +#endif /* _TRACE_KYBER_H */ + +/* This part must be outside protection */ +#include