From patchwork Fri Oct 13 23:35:31 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Lyle X-Patchwork-Id: 10006121 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 9ABB360230 for ; Fri, 13 Oct 2017 23:36:10 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8AC5529009 for ; Fri, 13 Oct 2017 23:36:10 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7FA462918E; Fri, 13 Oct 2017 23:36:10 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.4 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C096329009 for ; Fri, 13 Oct 2017 23:36:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753231AbdJMXgF (ORCPT ); Fri, 13 Oct 2017 19:36:05 -0400 Received: from mail-pf0-f195.google.com ([209.85.192.195]:46438 "EHLO mail-pf0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753193AbdJMXgC (ORCPT ); Fri, 13 Oct 2017 19:36:02 -0400 Received: by mail-pf0-f195.google.com with SMTP id p87so11515993pfj.3 for ; Fri, 13 Oct 2017 16:36:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lyle-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=cTVZHoxV1jroDIuUspxBy+G94lCyY3vHO/CJzfjDDCs=; b=BZryVyx5XZSJ5MUnbH2Hu0gDJgo55nLDZThx8DAPBNg7soXtSwD51kQ9HeilBspqvM FJVINek9QU+9Z6GadH03FJsUlOVr1pZrLTXUovUf4bOqxjeQIlEa/QqP7opPNa2HTIao +4ryEYjJgzs63gJi5qustEwnBdQvCXO2R3lyW8XX7ZLhyrxKiLxsHpI4vM2EyLPg5FMV R8kAAEOOjWFMVex/zXZ1xhN82Swyt6wkyqJ5NG+O8nEvcI/0fDfbiyoY7jyn7lV1Eovx BMT3cWAU5+E1uxGVV32VBYXoDO0VLfyXGUByvi4CmXaV75l2Nxw6uAtQXWyhLICILjjr uoxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=cTVZHoxV1jroDIuUspxBy+G94lCyY3vHO/CJzfjDDCs=; b=O23TC0gnKNg23rFTSgKhXNjFi24XjTLq8n8asFoeq7Jt0PnHS3hxVPno10d/jOFBK9 LONsItWXND4kid7spwDF2jwbORLhAF2yQNsFNJz9CUOZhUweDdUh5XTT6SuqdGPQWOrU MYwCWiLhggwcIJfAK9WFxiXWd7IBJFvFHoPeM+vgPdFKvWjfTI6RhUHNAafryGAdPny4 QrAIgpliizw4m51PTDIkjcxbU47mgrPB17O5EtMPpy3X2KQxWOnVOgojCLT0m6vq3NjC v8terPo50o8knS+tuxBVPKeXUojq9RPETJFI5P9W/EqF9GPWfCrpBbOGatGspwAegJGw uRaA== X-Gm-Message-State: AMCzsaVpheXu49w0kUXczeEmlz4W4JQw/ecj7JUWJsnC2JzShkm5RDeY PueOXKcSGglhRJISvhCb34is7g== X-Google-Smtp-Source: AOwi7QAW32OPgMn9QCyUC2OfSOyC0DW6o7KTP4Zwt3624UFI73mBNBgAljSH9iq47L6lvdlqXFUMdA== X-Received: by 10.98.209.84 with SMTP id t20mr2605568pfl.333.1507937762185; Fri, 13 Oct 2017 16:36:02 -0700 (PDT) Received: from localhost.localdomain (68-189-67-104.dhcp.prtv.ca.charter.com. [68.189.67.104]) by smtp.gmail.com with ESMTPSA id 74sm4950718pft.184.2017.10.13.16.36.00 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 13 Oct 2017 16:36:01 -0700 (PDT) From: Michael Lyle To: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org Cc: axboe@fb.com, Coly Li , Eric Wheeler , Junhui Tang , Michael Lyle Subject: [PATCH 04/15] bcache: rewrite multiple partitions support Date: Fri, 13 Oct 2017 16:35:31 -0700 Message-Id: <20171013233542.20938-5-mlyle@lyle.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171013233542.20938-1-mlyle@lyle.org> References: <20171013233542.20938-1-mlyle@lyle.org> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Coly Li Current partition support of bcache is confusing and buggy. It tries to trace non-continuous device minor numbers by an ida bit string, and mistakenly mixed bcache device index with minor numbers. This design generates several negative results, - Index of bcache device name is not consecutive under /dev/. If there are 3 bcache devices, they name will be, /dev/bcache0, /dev/bcache16, /dev/bcache32 Only bcache code indexes bcache device name is such an interesting way. - First minor number of each bcache device is traced by ida bit string. One bcache device will occupy 16 bits, this is not a good idea. Indeed only one bit is enough. - Because minor number and bcache device index are mixed, a device index is allocated by ida_simple_get(), but an first minor number is sent into ida_simple_remove() to release the device. It confused original author too. Root cause of the above errors is, bcache code should not handle device minor numbers at all! A standard process to support multiple partitions in Linux kernel is, - Device driver provides major device number, and indexes multiple device instances. - Device driver does not allocat nor trace device minor number, only provides a first minor number of a given device instance, and sets how many minor numbers (paritions) the device instance may have. All rested stuffs are handled by block layer code, most of the details can be found from block/{genhd, partition-generic}.c files. This patch re-writes multiple partitions support for bcache. It makes whole things to be more clear, and uses ida bit string in a more efficeint way. - Ida bit string only traces bcache device index, not minor number. For a bcache device with 128 partitions, only one bit in ida bit string is enough. - Device minor number and device index are separated in concept. Device index is used for /dev node naming, and ida bit string trace. Minor number is calculated from device index and only used to initialize first_minor of a bcache device. - It does not follow any standard for 16 partitions on a bcache device. This patch sets 128 partitions on single bcache device at max, this is the limitation from GPT (GUID Partition Table) and supported by fdisk. Considering a typical device minor number is 20 bits width, each bcache device may have 128 partitions (7 bits), there can be 8192 bcache devices existing on system. For most common deployment for a single server in now days, it should be enough. [minor spelling fixes in commit message by Michael Lyle] Signed-off-by: Coly Li Cc: Eric Wheeler Cc: Junhui Tang Reviewed-by: Michael Lyle Signed-off-by: Michael Lyle --- drivers/md/bcache/super.c | 37 +++++++++++++++++++++++++------------ 1 file changed, 25 insertions(+), 12 deletions(-) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index fc0a31b13ac4..a478d1ac0480 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -53,12 +53,15 @@ LIST_HEAD(bch_cache_sets); static LIST_HEAD(uncached_devices); static int bcache_major; -static DEFINE_IDA(bcache_minor); +static DEFINE_IDA(bcache_device_idx); static wait_queue_head_t unregister_wait; struct workqueue_struct *bcache_wq; #define BTREE_MAX_PAGES (256 * 1024 / PAGE_SIZE) -#define BCACHE_MINORS 16 /* partition support */ +/* limitation of partitions number on single bcache device */ +#define BCACHE_MINORS 128 +/* limitation of bcache devices number on single system */ +#define BCACHE_DEVICE_IDX_MAX ((1U << MINORBITS)/BCACHE_MINORS) /* Superblock */ @@ -721,6 +724,16 @@ static void bcache_device_attach(struct bcache_device *d, struct cache_set *c, closure_get(&c->caching); } +static inline int first_minor_to_idx(int first_minor) +{ + return (first_minor/BCACHE_MINORS); +} + +static inline int idx_to_first_minor(int idx) +{ + return (idx * BCACHE_MINORS); +} + static void bcache_device_free(struct bcache_device *d) { lockdep_assert_held(&bch_register_lock); @@ -734,7 +747,8 @@ static void bcache_device_free(struct bcache_device *d) if (d->disk && d->disk->queue) blk_cleanup_queue(d->disk->queue); if (d->disk) { - ida_simple_remove(&bcache_minor, d->disk->first_minor); + ida_simple_remove(&bcache_device_idx, + first_minor_to_idx(d->disk->first_minor)); put_disk(d->disk); } @@ -751,7 +765,7 @@ static int bcache_device_init(struct bcache_device *d, unsigned block_size, { struct request_queue *q; size_t n; - int minor; + int idx; if (!d->stripe_size) d->stripe_size = 1 << 31; @@ -776,25 +790,24 @@ static int bcache_device_init(struct bcache_device *d, unsigned block_size, if (!d->full_dirty_stripes) return -ENOMEM; - minor = ida_simple_get(&bcache_minor, 0, MINORMASK + 1, GFP_KERNEL); - if (minor < 0) - return minor; - - minor *= BCACHE_MINORS; + idx = ida_simple_get(&bcache_device_idx, 0, + BCACHE_DEVICE_IDX_MAX, GFP_KERNEL); + if (idx < 0) + return idx; if (!(d->bio_split = bioset_create(4, offsetof(struct bbio, bio), BIOSET_NEED_BVECS | BIOSET_NEED_RESCUER)) || !(d->disk = alloc_disk(BCACHE_MINORS))) { - ida_simple_remove(&bcache_minor, minor); + ida_simple_remove(&bcache_device_idx, idx); return -ENOMEM; } set_capacity(d->disk, sectors); - snprintf(d->disk->disk_name, DISK_NAME_LEN, "bcache%i", minor); + snprintf(d->disk->disk_name, DISK_NAME_LEN, "bcache%i", idx); d->disk->major = bcache_major; - d->disk->first_minor = minor; + d->disk->first_minor = idx_to_first_minor(idx); d->disk->fops = &bcache_ops; d->disk->private_data = d;