From patchwork Wed Dec 27 22:39:31 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Timofey Titovets X-Patchwork-Id: 10134105 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id A99CE60388 for ; Wed, 27 Dec 2017 22:39:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 85C892CF55 for ; Wed, 27 Dec 2017 22:39:59 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7472C2CF66; Wed, 27 Dec 2017 22:39:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F00882CF55 for ; Wed, 27 Dec 2017 22:39:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752051AbdL0Wjl (ORCPT ); Wed, 27 Dec 2017 17:39:41 -0500 Received: from mail-wr0-f193.google.com ([209.85.128.193]:43088 "EHLO mail-wr0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751881AbdL0Wjk (ORCPT ); Wed, 27 Dec 2017 17:39:40 -0500 Received: by mail-wr0-f193.google.com with SMTP id w68so25416713wrc.10 for ; Wed, 27 Dec 2017 14:39:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=DC1jVRbl0m5aTyiGS6dkNhrH2AU3AHNMLyRayIkvSXM=; b=V5/YdNB74CcFiKP7TxnrKDBTaNhQblBNo41YJkO7doxu5nL5nZYFPP6I1KZ7XBaEf1 INjjz8higKWR0IUPkm0UjfmjIiKr/4kS0L4e7M1dMJyM6/zFzbOX45Eyv6ufxjChJLwW PFnTVEgv5vKX6kkyxc4PUnq7OwpzHV/C1MVa6LWOY25dC5WvjJI8lR2cxJhMGyfy7/fW VqpiYxdFvdC+4F4qc7Dv4oQeHIW+Wnqp2boFTpVtbTH02hq3LqxgmcrA5qF8CKlwgQZ9 aKZSavTctddt+Dy9zTBH2/shW9UDykrQ8ykMUoFUC/NwmOIlAyjJPMxmlKym/w4f9Cir g76Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=DC1jVRbl0m5aTyiGS6dkNhrH2AU3AHNMLyRayIkvSXM=; b=TLhqBZn1+X3eEqlpIBC1cALSvr7bDqReKzAnB5t/kcg+b27tEJeH8JsUXrH7EL+27X vVvbRieHR6wpFMboPxXPUC+fwGNUF7+Xyw5mp0+9zw8AcqXLUGtCww5yiXQeSGP1VV9Q Jy6nsRLU8ZACkyK4mSS+KZ8BfJdMWW28rBHTkOhDVQL558CKuDI4TKaD2iJkH3N/go5T nWTFq1Em23Jn6EvlplPavOZCW1gkfQmMtfDHRaWROWLDpuiKAjpAdSbEBNyv3RSbONxS nNay0UGA2SE3qvdXtjZ+hsrk0ihDM6wIPInlGEZr2El5QTum5SxCx9ZBEd+Yvh2C4x5A qzSw== X-Gm-Message-State: AKGB3mJjylG26feOLfKWlr5NE9t15vdwjneSU5QVJ6Fvmm4N/ZWp/0Sq rep2T3gJdLOiBN5RuYPVXZNOLA== X-Google-Smtp-Source: ACJfBovI0IM9ntu/2sd1aSLLESXTMKpC0bFMYPLciA0J6SHEDgQTq2deWTWE/fQdtsxxCVrVu5I3cQ== X-Received: by 10.223.186.81 with SMTP id t17mr28518727wrg.275.1514414379312; Wed, 27 Dec 2017 14:39:39 -0800 (PST) Received: from titovetst-beplan.itransition.corp (nat6-minsk-pool-46-53-208-190.telecom.by. [46.53.208.190]) by smtp.gmail.com with ESMTPSA id r3sm14524199wmg.31.2017.12.27.14.39.37 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 27 Dec 2017 14:39:38 -0800 (PST) From: Timofey Titovets To: linux-btrfs@vger.kernel.org Cc: Timofey Titovets Subject: [PATCH] Btrfs: enchanse raid1/10 balance heuristic for non rotating devices Date: Thu, 28 Dec 2017 01:39:31 +0300 Message-Id: <20171227223931.7878-1-nefelim4ag@gmail.com> X-Mailer: git-send-email 2.15.1 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Currently btrfs raid1/10 balancer blance requests to mirrors, based on pid % num of mirrors. Update logic and make it understood if underline device are non rotational. If one of mirrors are non rotational, then all read requests will be moved to non rotational device. If both of mirrors are non rotational, calculate sum of pending and in flight request for queue on that bdev and use device with least queue leght. P.S. Inspired by md-raid1 read balancing Signed-off-by: Timofey Titovets --- fs/btrfs/volumes.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 59 insertions(+) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 9a04245003ab..98bc2433a920 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -5216,13 +5216,30 @@ int btrfs_is_parity_mirror(struct btrfs_fs_info *fs_info, u64 logical, u64 len) return ret; } +static inline int bdev_get_queue_len(struct block_device *bdev) +{ + int sum = 0; + struct request_queue *rq = bdev_get_queue(bdev); + + sum += rq->nr_rqs[BLK_RW_SYNC] + rq->nr_rqs[BLK_RW_ASYNC]; + sum += rq->in_flight[BLK_RW_SYNC] + rq->in_flight[BLK_RW_ASYNC]; + + /* + * Try prevent switch for every sneeze + * By roundup output num by 2 + */ + return ALIGN(sum, 2); +} + static int find_live_mirror(struct btrfs_fs_info *fs_info, struct map_lookup *map, int first, int num, int optimal, int dev_replace_is_ongoing) { int i; int tolerance; + struct block_device *bdev; struct btrfs_device *srcdev; + bool all_bdev_nonrot = true; if (dev_replace_is_ongoing && fs_info->dev_replace.cont_reading_from_srcdev_mode == @@ -5231,6 +5248,48 @@ static int find_live_mirror(struct btrfs_fs_info *fs_info, else srcdev = NULL; + /* + * Optimal expected to be pid % num + * That's generaly ok for spinning rust drives + * But if one of mirror are non rotating, + * that bdev can show better performance + * + * if one of disks are non rotating: + * - set optimal to non rotating device + * if both disk are non rotating + * - set optimal to bdev with least queue + * If both disks are spinning rust: + * - leave old pid % nu, + */ + for (i = 0; i < num; i++) { + bdev = map->stripes[i].dev->bdev; + if (!bdev) + continue; + if (blk_queue_nonrot(bdev_get_queue(bdev))) + optimal = i; + else + all_bdev_nonrot = false; + } + + if (all_bdev_nonrot) { + int qlen; + /* Forse following logic choise by init with some big number */ + int optimal_dev_rq_count = 1 << 24; + + for (i = 0; i < num; i++) { + bdev = map->stripes[i].dev->bdev; + if (!bdev) + continue; + + qlen = bdev_get_queue_len(bdev); + + if (qlen < optimal_dev_rq_count) { + optimal = i; + optimal_dev_rq_count = qlen; + } + } + } + /* * try to avoid the drive that is the source drive for a * dev-replace procedure, only choose it if no other non-missing