From patchwork Thu Oct 12 09:15:46 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joseph Qi X-Patchwork-Id: 10001275 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id F14C9602BF for ; Thu, 12 Oct 2017 09:15:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E0FDD28CEA for ; Thu, 12 Oct 2017 09:15:59 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D5DA028D34; Thu, 12 Oct 2017 09:15:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.5 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7A1E928CEA for ; Thu, 12 Oct 2017 09:15:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751932AbdJLJP4 (ORCPT ); Thu, 12 Oct 2017 05:15:56 -0400 Received: from mail-oi0-f65.google.com ([209.85.218.65]:36044 "EHLO mail-oi0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751921AbdJLJP4 (ORCPT ); Thu, 12 Oct 2017 05:15:56 -0400 Received: by mail-oi0-f65.google.com with SMTP id s185so2173574oif.3 for ; Thu, 12 Oct 2017 02:15:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:subject:to:cc:message-id:date:user-agent:mime-version :content-language:content-transfer-encoding; bh=6pJr0+OnMrCVgMrNrumto1qAcYN6AEIeuKx1ORtRwZs=; b=V7j6s3+/9mpZyGQyjCv5/15O0Vk+v3Pr71rScSaYnjnxqtu+DDVTD5FwK4rANgABLH shy6fWTpADzL4ASlc/bzktHs4JHVWfXzdM/C7uoSqEqALMDy/r6uuTbpVFBSz+6W8xUd 7cpHPby2Pv9WKIirPb3+b+rDAXS6NiayM5jWhvYH7BGrj4bNiF8rORG9lYN7h2/YcTsU /IsJ92SMHcnzK4tUTmNQarb6QTAZXDPygEa3lrthY2z6F/nr1n8VX+Nv7YFklfJ4fZGQ cTw0Uk2RLrLKd8icwEuX0tnrrTCD8vIZsaR4h/cIIPcezPyr4Ys4M532WPUs+Cr/yEmG ehlw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:subject:to:cc:message-id:date:user-agent :mime-version:content-language:content-transfer-encoding; bh=6pJr0+OnMrCVgMrNrumto1qAcYN6AEIeuKx1ORtRwZs=; b=crhlsaURDEikX0yOQd+Me93qPPIx3ky4XHPz9wzH3SeVyAYwjKj0ma11P33xlFGqm8 dMYURoBXz1QFeOYs6XBycYkSO3Ly/DMru2XU4y/X+oKZPt47F2DihpcK8GBiXvcmrJ8y AJyc7b9ZGRoHRfVEPhZzZXVtNb/wEv1WS79wkFnLvt1pefSzLMs61jmzL1f4YgcfBPnP Q9eC8tuTpFnrKTmGjAVFFhjH7opEqY5fg95NrJZ+ePqXr1jA1i53Vom8N6CbxGdS49cY bbrpgePO1dqDsbXeQjcPBsHCvSjam4Z3MbICBGFTyMCNwpWwJ1BiO9R3Ljg355Nw0Ok3 8g2g== X-Gm-Message-State: AMCzsaUl55j6L2TAjJz2T2WWjwqw2SdmcwQJxTtwT+l0ifdMK30Z9T/b XB37uiSXXVH+3eMXfRgSt14= X-Google-Smtp-Source: ABhQp+RvdcyTq2F1XmYt/vJB0SDCizlHaE1faZzGGleOZFUZfINHrHRHqdHmE78Wm7gzKpP0JIqFMg== X-Received: by 10.157.20.141 with SMTP id d13mr1238626ote.361.1507799753900; Thu, 12 Oct 2017 02:15:53 -0700 (PDT) Received: from JosephdeMacBook-Pro.local ([205.204.117.24]) by smtp.gmail.com with ESMTPSA id e124sm3171277oia.30.2017.10.12.02.15.49 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 12 Oct 2017 02:15:53 -0700 (PDT) From: Joseph Qi Subject: [PATCH v2] blk-throttle: track read and write request individually To: linux-block Cc: Jens Axboe , Shaohua Li , boyu.mt@taobao.com, wenqing.lz@taobao.com, qijiang.qj@alibaba-inc.com Message-ID: <8dde2c9d-51a0-dc8e-74e4-43d2f51d4426@gmail.com> Date: Thu, 12 Oct 2017 17:15:46 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 MIME-Version: 1.0 Content-Language: en-US Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Joseph Qi In mixed read/write workload on SSD, write latency is much lower than read. But now we only track and record read latency and then use it as threshold base for both read and write io latency accounting. As a result, write io latency will always be considered as good and bad_bio_cnt is much smaller than 20% of bio_cnt. That is to mean, the tg to be checked will be treated as idle most of the time and still let others dispatch more ios, even it is truly running under low limit and wants its low limit to be guaranteed, which is not we expected in fact. So track read and write request individually, which can bring more precise latency control for low limit idle detection. Signed-off-by: Joseph Qi Reviewed-by: Shaohua Li --- block/blk-throttle.c | 134 ++++++++++++++++++++++++++++++--------------------- 1 file changed, 79 insertions(+), 55 deletions(-) diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 17816a0..0897378 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -215,9 +215,9 @@ struct throtl_data unsigned int scale; - struct latency_bucket tmp_buckets[LATENCY_BUCKET_SIZE]; - struct avg_latency_bucket avg_buckets[LATENCY_BUCKET_SIZE]; - struct latency_bucket __percpu *latency_buckets; + struct latency_bucket tmp_buckets[2][LATENCY_BUCKET_SIZE]; + struct avg_latency_bucket avg_buckets[2][LATENCY_BUCKET_SIZE]; + struct latency_bucket __percpu *latency_buckets[2]; unsigned long last_calculate_time; unsigned long filtered_latency; @@ -2040,10 +2040,10 @@ static void blk_throtl_update_idletime(struct throtl_grp *tg) #ifdef CONFIG_BLK_DEV_THROTTLING_LOW static void throtl_update_latency_buckets(struct throtl_data *td) { - struct avg_latency_bucket avg_latency[LATENCY_BUCKET_SIZE]; - int i, cpu; - unsigned long last_latency = 0; - unsigned long latency; + struct avg_latency_bucket avg_latency[2][LATENCY_BUCKET_SIZE]; + int i, cpu, rw; + unsigned long last_latency[2] = { 0 }; + unsigned long latency[2]; if (!blk_queue_nonrot(td->queue)) return; @@ -2052,56 +2052,67 @@ static void throtl_update_latency_buckets(struct throtl_data *td) td->last_calculate_time = jiffies; memset(avg_latency, 0, sizeof(avg_latency)); - for (i = 0; i < LATENCY_BUCKET_SIZE; i++) { - struct latency_bucket *tmp = &td->tmp_buckets[i]; - - for_each_possible_cpu(cpu) { - struct latency_bucket *bucket; - - /* this isn't race free, but ok in practice */ - bucket = per_cpu_ptr(td->latency_buckets, cpu); - tmp->total_latency += bucket[i].total_latency; - tmp->samples += bucket[i].samples; - bucket[i].total_latency = 0; - bucket[i].samples = 0; - } + for (rw = READ; rw <= WRITE; rw++) { + for (i = 0; i < LATENCY_BUCKET_SIZE; i++) { + struct latency_bucket *tmp = &td->tmp_buckets[rw][i]; + + for_each_possible_cpu(cpu) { + struct latency_bucket *bucket; + + /* this isn't race free, but ok in practice */ + bucket = per_cpu_ptr(td->latency_buckets[rw], + cpu); + tmp->total_latency += bucket[i].total_latency; + tmp->samples += bucket[i].samples; + bucket[i].total_latency = 0; + bucket[i].samples = 0; + } - if (tmp->samples >= 32) { - int samples = tmp->samples; + if (tmp->samples >= 32) { + int samples = tmp->samples; - latency = tmp->total_latency; + latency[rw] = tmp->total_latency; - tmp->total_latency = 0; - tmp->samples = 0; - latency /= samples; - if (latency == 0) - continue; - avg_latency[i].latency = latency; + tmp->total_latency = 0; + tmp->samples = 0; + latency[rw] /= samples; + if (latency[rw] == 0) + continue; + avg_latency[rw][i].latency = latency[rw]; + } } } - for (i = 0; i < LATENCY_BUCKET_SIZE; i++) { - if (!avg_latency[i].latency) { - if (td->avg_buckets[i].latency < last_latency) - td->avg_buckets[i].latency = last_latency; - continue; - } + for (rw = READ; rw <= WRITE; rw++) { + for (i = 0; i < LATENCY_BUCKET_SIZE; i++) { + if (!avg_latency[rw][i].latency) { + if (td->avg_buckets[rw][i].latency < last_latency[rw]) + td->avg_buckets[rw][i].latency = + last_latency[rw]; + continue; + } - if (!td->avg_buckets[i].valid) - latency = avg_latency[i].latency; - else - latency = (td->avg_buckets[i].latency * 7 + - avg_latency[i].latency) >> 3; + if (!td->avg_buckets[rw][i].valid) + latency[rw] = avg_latency[rw][i].latency; + else + latency[rw] = (td->avg_buckets[rw][i].latency * 7 + + avg_latency[rw][i].latency) >> 3; - td->avg_buckets[i].latency = max(latency, last_latency); - td->avg_buckets[i].valid = true; - last_latency = td->avg_buckets[i].latency; + td->avg_buckets[rw][i].latency = max(latency[rw], + last_latency[rw]); + td->avg_buckets[rw][i].valid = true; + last_latency[rw] = td->avg_buckets[rw][i].latency; + } } for (i = 0; i < LATENCY_BUCKET_SIZE; i++) throtl_log(&td->service_queue, - "Latency bucket %d: latency=%ld, valid=%d", i, - td->avg_buckets[i].latency, td->avg_buckets[i].valid); + "Latency bucket %d: read latency=%ld, read valid=%d, " + "write latency=%ld, write valid=%d", i, + td->avg_buckets[READ][i].latency, + td->avg_buckets[READ][i].valid, + td->avg_buckets[WRITE][i].latency, + td->avg_buckets[WRITE][i].valid); } #else static inline void throtl_update_latency_buckets(struct throtl_data *td) @@ -2244,16 +2255,17 @@ static void throtl_track_latency(struct throtl_data *td, sector_t size, struct latency_bucket *latency; int index; - if (!td || td->limit_index != LIMIT_LOW || op != REQ_OP_READ || + if (!td || td->limit_index != LIMIT_LOW || + !(op == REQ_OP_READ || op == REQ_OP_WRITE) || !blk_queue_nonrot(td->queue)) return; index = request_bucket_index(size); - latency = get_cpu_ptr(td->latency_buckets); + latency = get_cpu_ptr(td->latency_buckets[op]); latency[index].total_latency += time; latency[index].samples++; - put_cpu_ptr(td->latency_buckets); + put_cpu_ptr(td->latency_buckets[op]); } void blk_throtl_stat_add(struct request *rq, u64 time_ns) @@ -2272,6 +2284,7 @@ void blk_throtl_bio_endio(struct bio *bio) unsigned long finish_time; unsigned long start_time; unsigned long lat; + int rw = bio_data_dir(bio); tg = bio->bi_cg_private; if (!tg) @@ -2298,7 +2311,7 @@ void blk_throtl_bio_endio(struct bio *bio) bucket = request_bucket_index( blk_stat_size(&bio->bi_issue_stat)); - threshold = tg->td->avg_buckets[bucket].latency + + threshold = tg->td->avg_buckets[rw][bucket].latency + tg->latency_target; if (lat > threshold) tg->bad_bio_cnt++; @@ -2389,9 +2402,16 @@ int blk_throtl_init(struct request_queue *q) td = kzalloc_node(sizeof(*td), GFP_KERNEL, q->node); if (!td) return -ENOMEM; - td->latency_buckets = __alloc_percpu(sizeof(struct latency_bucket) * + td->latency_buckets[READ] = __alloc_percpu(sizeof(struct latency_bucket) * + LATENCY_BUCKET_SIZE, __alignof__(u64)); + if (!td->latency_buckets[READ]) { + kfree(td); + return -ENOMEM; + } + td->latency_buckets[WRITE] = __alloc_percpu(sizeof(struct latency_bucket) * LATENCY_BUCKET_SIZE, __alignof__(u64)); - if (!td->latency_buckets) { + if (!td->latency_buckets[WRITE]) { + free_percpu(td->latency_buckets[READ]); kfree(td); return -ENOMEM; } @@ -2410,7 +2430,8 @@ int blk_throtl_init(struct request_queue *q) /* activate policy */ ret = blkcg_activate_policy(q, &blkcg_policy_throtl); if (ret) { - free_percpu(td->latency_buckets); + free_percpu(td->latency_buckets[READ]); + free_percpu(td->latency_buckets[WRITE]); kfree(td); } return ret; @@ -2421,7 +2442,8 @@ void blk_throtl_exit(struct request_queue *q) BUG_ON(!q->td); throtl_shutdown_wq(q); blkcg_deactivate_policy(q, &blkcg_policy_throtl); - free_percpu(q->td->latency_buckets); + free_percpu(q->td->latency_buckets[READ]); + free_percpu(q->td->latency_buckets[WRITE]); kfree(q->td); } @@ -2439,8 +2461,10 @@ void blk_throtl_register_queue(struct request_queue *q) } else { td->throtl_slice = DFL_THROTL_SLICE_HD; td->filtered_latency = LATENCY_FILTERED_HD; - for (i = 0; i < LATENCY_BUCKET_SIZE; i++) - td->avg_buckets[i].latency = DFL_HD_BASELINE_LATENCY; + for (i = 0; i < LATENCY_BUCKET_SIZE; i++) { + td->avg_buckets[READ][i].latency = DFL_HD_BASELINE_LATENCY; + td->avg_buckets[WRITE][i].latency = DFL_HD_BASELINE_LATENCY; + } } #ifndef CONFIG_BLK_DEV_THROTTLING_LOW /* if no low limit, use previous default */