From patchwork Wed Jun 14 11:46:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hannes Reinecke X-Patchwork-Id: 13279973 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3453FEB64D9 for ; Wed, 14 Jun 2023 11:47:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244284AbjFNLrJ (ORCPT ); Wed, 14 Jun 2023 07:47:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59674 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244272AbjFNLrD (ORCPT ); Wed, 14 Jun 2023 07:47:03 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BB8BA131; Wed, 14 Jun 2023 04:47:01 -0700 (PDT) Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id DE9C322530; Wed, 14 Jun 2023 11:46:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1686743217; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9lcE/JSbO8Lz9HATID+PDRS9qocozua81AyOw+OPYyM=; b=KDqVI+m5Dn/J4jI0LnZkE5O+DYO1FDTwxkSPwEuf5hkIRWstHnRAkeK/+KHNMRV5oYUW53 m2orKviGkoEt9g8ncs7QwutiGAixybshvQGl7PvC7iTSRzsLEiRYD5U29S9wwaBglxzRHA c4WcgBCO2Fi4fxl880NvoFw8Ii0J1B0= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1686743217; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9lcE/JSbO8Lz9HATID+PDRS9qocozua81AyOw+OPYyM=; b=n0vqZw77i2z/ady7xhtHtq15GKwxQ9+gSlkQq6Tl1vFq+dAZO/uzUEcTz3qN5RNwrhE/p1 nyWRNQBE6kfy+YDw== Received: from adalid.arch.suse.de (adalid.arch.suse.de [10.161.8.13]) by relay2.suse.de (Postfix) with ESMTP id 7C4A62C142; Wed, 14 Jun 2023 11:46:57 +0000 (UTC) Received: by adalid.arch.suse.de (Postfix, from userid 16045) id 7163E51C4E0B; Wed, 14 Jun 2023 13:46:57 +0200 (CEST) From: Hannes Reinecke To: Matthew Wilcox Cc: linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, Andrew Morton , Christoph Hellwig , Luis Chamberlain , Pankaj Raghav Subject: [PATCH 1/7] brd: use XArray instead of radix-tree to index backing pages Date: Wed, 14 Jun 2023 13:46:31 +0200 Message-Id: <20230614114637.89759-2-hare@suse.de> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230614114637.89759-1-hare@suse.de> References: <20230614114637.89759-1-hare@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Pankaj Raghav XArray was introduced to hold large array of pointers with a simple API. XArray API also provides array semantics which simplifies the way we store and access the backing pages, and the code becomes significantly easier to understand. No performance difference was noticed between the two implementation using fio with direct=1 [1]. [1] Performance in KIOPS: | radix-tree | XArray | Diff | | | write | 315 | 313 | -0.6% randwrite | 286 | 290 | +1.3% read | 330 | 335 | +1.5% randread | 309 | 312 | +0.9% Signed-off-by: Pankaj Raghav Reviewed-by: Matthew Wilcox (Oracle) --- drivers/block/brd.c | 93 ++++++++++++--------------------------------- 1 file changed, 24 insertions(+), 69 deletions(-) diff --git a/drivers/block/brd.c b/drivers/block/brd.c index bcad9b926b0c..2f71376afc71 100644 --- a/drivers/block/brd.c +++ b/drivers/block/brd.c @@ -19,7 +19,7 @@ #include #include #include -#include +#include #include #include #include @@ -28,7 +28,7 @@ #include /* - * Each block ramdisk device has a radix_tree brd_pages of pages that stores + * Each block ramdisk device has a xarray brd_pages of pages that stores * the pages containing the block device's contents. A brd page's ->index is * its offset in PAGE_SIZE units. This is similar to, but in no way connected * with, the kernel's pagecache or buffer cache (which sit above our block @@ -40,11 +40,9 @@ struct brd_device { struct list_head brd_list; /* - * Backing store of pages and lock to protect it. This is the contents - * of the block device. + * Backing store of pages. This is the contents of the block device. */ - spinlock_t brd_lock; - struct radix_tree_root brd_pages; + struct xarray brd_pages; u64 brd_nr_pages; }; @@ -56,21 +54,8 @@ static struct page *brd_lookup_page(struct brd_device *brd, sector_t sector) pgoff_t idx; struct page *page; - /* - * The page lifetime is protected by the fact that we have opened the - * device node -- brd pages will never be deleted under us, so we - * don't need any further locking or refcounting. - * - * This is strictly true for the radix-tree nodes as well (ie. we - * don't actually need the rcu_read_lock()), however that is not a - * documented feature of the radix-tree API so it is better to be - * safe here (we don't have total exclusion from radix tree updates - * here, only deletes). - */ - rcu_read_lock(); idx = sector >> PAGE_SECTORS_SHIFT; /* sector to page index */ - page = radix_tree_lookup(&brd->brd_pages, idx); - rcu_read_unlock(); + page = xa_load(&brd->brd_pages, idx); BUG_ON(page && page->index != idx); @@ -83,7 +68,7 @@ static struct page *brd_lookup_page(struct brd_device *brd, sector_t sector) static int brd_insert_page(struct brd_device *brd, sector_t sector, gfp_t gfp) { pgoff_t idx; - struct page *page; + struct page *page, *cur; int ret = 0; page = brd_lookup_page(brd, sector); @@ -94,71 +79,42 @@ static int brd_insert_page(struct brd_device *brd, sector_t sector, gfp_t gfp) if (!page) return -ENOMEM; - if (radix_tree_maybe_preload(gfp)) { - __free_page(page); - return -ENOMEM; - } + xa_lock(&brd->brd_pages); - spin_lock(&brd->brd_lock); idx = sector >> PAGE_SECTORS_SHIFT; page->index = idx; - if (radix_tree_insert(&brd->brd_pages, idx, page)) { + + cur = __xa_cmpxchg(&brd->brd_pages, idx, NULL, page, gfp); + + if (unlikely(cur)) { __free_page(page); - page = radix_tree_lookup(&brd->brd_pages, idx); - if (!page) - ret = -ENOMEM; - else if (page->index != idx) + ret = xa_err(cur); + if (!ret && (cur->index != idx)) ret = -EIO; } else { brd->brd_nr_pages++; } - spin_unlock(&brd->brd_lock); - radix_tree_preload_end(); + xa_unlock(&brd->brd_pages); + return ret; } /* - * Free all backing store pages and radix tree. This must only be called when + * Free all backing store pages and xarray. This must only be called when * there are no other users of the device. */ -#define FREE_BATCH 16 static void brd_free_pages(struct brd_device *brd) { - unsigned long pos = 0; - struct page *pages[FREE_BATCH]; - int nr_pages; - - do { - int i; - - nr_pages = radix_tree_gang_lookup(&brd->brd_pages, - (void **)pages, pos, FREE_BATCH); - - for (i = 0; i < nr_pages; i++) { - void *ret; - - BUG_ON(pages[i]->index < pos); - pos = pages[i]->index; - ret = radix_tree_delete(&brd->brd_pages, pos); - BUG_ON(!ret || ret != pages[i]); - __free_page(pages[i]); - } - - pos++; + struct page *page; + pgoff_t idx; - /* - * It takes 3.4 seconds to remove 80GiB ramdisk. - * So, we need cond_resched to avoid stalling the CPU. - */ - cond_resched(); + xa_for_each(&brd->brd_pages, idx, page) { + __free_page(page); + cond_resched_rcu(); + } - /* - * This assumes radix_tree_gang_lookup always returns as - * many pages as possible. If the radix-tree code changes, - * so will this have to. - */ - } while (nr_pages == FREE_BATCH); + xa_destroy(&brd->brd_pages); } /* @@ -372,8 +328,7 @@ static int brd_alloc(int i) brd->brd_number = i; list_add_tail(&brd->brd_list, &brd_devices); - spin_lock_init(&brd->brd_lock); - INIT_RADIX_TREE(&brd->brd_pages, GFP_ATOMIC); + xa_init(&brd->brd_pages); snprintf(buf, DISK_NAME_LEN, "ram%d", i); if (!IS_ERR_OR_NULL(brd_debugfs_dir)) From patchwork Wed Jun 14 11:46:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hannes Reinecke X-Patchwork-Id: 13279972 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 189F2C001DF for ; Wed, 14 Jun 2023 11:47:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236153AbjFNLrE (ORCPT ); Wed, 14 Jun 2023 07:47:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59662 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235826AbjFNLrB (ORCPT ); Wed, 14 Jun 2023 07:47:01 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E597D199C; Wed, 14 Jun 2023 04:46:58 -0700 (PDT) Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 97F1322511; Wed, 14 Jun 2023 11:46:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1686743217; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=aYOs+uABb5gYnoNK951Q+oP81sGUVzH0UFROd8+HK+U=; b=UaMoMJVUZCkNQ1TkRASiERLoBdOgAdypqmSgBc5f44Tn8t9EMSsreEJ9WgSZq/RW8iFMyE rKvcdMi9r8eofRzfWy8it968U+wUleEGohl9x2RI7qBjdB3cqy13Mnj4KS+sy6LFZM93iJ baGhswcgOr4TagkiSN4dO9rqQZw95AQ= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1686743217; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=aYOs+uABb5gYnoNK951Q+oP81sGUVzH0UFROd8+HK+U=; b=aC+1V25h3X4iLNyoKbcBzPP7PmJgRObaoqakQOcgW0HGlHPIQRWZmqDrgbVLX/Z9vO7+Xl 6QBkpHwxszS3ymBg== Received: from adalid.arch.suse.de (adalid.arch.suse.de [10.161.8.13]) by relay2.suse.de (Postfix) with ESMTP id 843122C143; Wed, 14 Jun 2023 11:46:57 +0000 (UTC) Received: by adalid.arch.suse.de (Postfix, from userid 16045) id 78EEB51C4E0D; Wed, 14 Jun 2023 13:46:57 +0200 (CEST) From: Hannes Reinecke To: Matthew Wilcox Cc: linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, Andrew Morton , Christoph Hellwig , Luis Chamberlain , Hannes Reinecke Subject: [PATCH 2/7] brd: convert to folios Date: Wed, 14 Jun 2023 13:46:32 +0200 Message-Id: <20230614114637.89759-3-hare@suse.de> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230614114637.89759-1-hare@suse.de> References: <20230614114637.89759-1-hare@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Convert the driver to work on folios instead of pages. Signed-off-by: Hannes Reinecke --- drivers/block/brd.c | 150 ++++++++++++++++++++++---------------------- 1 file changed, 74 insertions(+), 76 deletions(-) diff --git a/drivers/block/brd.c b/drivers/block/brd.c index 2f71376afc71..24769d010fee 100644 --- a/drivers/block/brd.c +++ b/drivers/block/brd.c @@ -28,11 +28,10 @@ #include /* - * Each block ramdisk device has a xarray brd_pages of pages that stores - * the pages containing the block device's contents. A brd page's ->index is - * its offset in PAGE_SIZE units. This is similar to, but in no way connected - * with, the kernel's pagecache or buffer cache (which sit above our block - * device). + * Each block ramdisk device has a xarray of folios that stores the folios + * containing the block device's contents. A brd folio's ->index is its offset + * in PAGE_SIZE units. This is similar to, but in no way connected with, + * the kernel's pagecache or buffer cache (which sit above our block device). */ struct brd_device { int brd_number; @@ -40,81 +39,81 @@ struct brd_device { struct list_head brd_list; /* - * Backing store of pages. This is the contents of the block device. + * Backing store of folios. This is the contents of the block device. */ - struct xarray brd_pages; - u64 brd_nr_pages; + struct xarray brd_folios; + u64 brd_nr_folios; }; /* - * Look up and return a brd's page for a given sector. + * Look up and return a brd's folio for a given sector. */ -static struct page *brd_lookup_page(struct brd_device *brd, sector_t sector) +static struct folio *brd_lookup_folio(struct brd_device *brd, sector_t sector) { pgoff_t idx; - struct page *page; + struct folio *folio; - idx = sector >> PAGE_SECTORS_SHIFT; /* sector to page index */ - page = xa_load(&brd->brd_pages, idx); + idx = sector >> PAGE_SECTORS_SHIFT; /* sector to folio index */ + folio = xa_load(&brd->brd_folios, idx); - BUG_ON(page && page->index != idx); + BUG_ON(folio && folio->index != idx); - return page; + return folio; } /* - * Insert a new page for a given sector, if one does not already exist. + * Insert a new folio for a given sector, if one does not already exist. */ -static int brd_insert_page(struct brd_device *brd, sector_t sector, gfp_t gfp) +static int brd_insert_folio(struct brd_device *brd, sector_t sector, gfp_t gfp) { pgoff_t idx; - struct page *page, *cur; + struct folio *folio, *cur; int ret = 0; - page = brd_lookup_page(brd, sector); - if (page) + folio = brd_lookup_folio(brd, sector); + if (folio) return 0; - page = alloc_page(gfp | __GFP_ZERO | __GFP_HIGHMEM); - if (!page) + folio = folio_alloc(gfp | __GFP_ZERO | __GFP_HIGHMEM, 0); + if (!folio) return -ENOMEM; - xa_lock(&brd->brd_pages); + xa_lock(&brd->brd_folios); idx = sector >> PAGE_SECTORS_SHIFT; - page->index = idx; + folio->index = idx; - cur = __xa_cmpxchg(&brd->brd_pages, idx, NULL, page, gfp); + cur = __xa_cmpxchg(&brd->brd_folios, idx, NULL, folio, gfp); if (unlikely(cur)) { - __free_page(page); + folio_put(folio); ret = xa_err(cur); if (!ret && (cur->index != idx)) ret = -EIO; } else { - brd->brd_nr_pages++; + brd->brd_nr_folios++; } - xa_unlock(&brd->brd_pages); + xa_unlock(&brd->brd_folios); return ret; } /* - * Free all backing store pages and xarray. This must only be called when + * Free all backing store folios and xarray. This must only be called when * there are no other users of the device. */ -static void brd_free_pages(struct brd_device *brd) +static void brd_free_folios(struct brd_device *brd) { - struct page *page; + struct folio *folio; pgoff_t idx; - xa_for_each(&brd->brd_pages, idx, page) { - __free_page(page); + xa_for_each(&brd->brd_folios, idx, folio) { + folio_put(folio); cond_resched_rcu(); } - xa_destroy(&brd->brd_pages); + xa_destroy(&brd->brd_folios); } /* @@ -128,12 +127,12 @@ static int copy_to_brd_setup(struct brd_device *brd, sector_t sector, size_t n, int ret; copy = min_t(size_t, n, PAGE_SIZE - offset); - ret = brd_insert_page(brd, sector, gfp); + ret = brd_insert_folio(brd, sector, gfp); if (ret) return ret; if (copy < n) { sector += copy >> SECTOR_SHIFT; - ret = brd_insert_page(brd, sector, gfp); + ret = brd_insert_folio(brd, sector, gfp); } return ret; } @@ -144,29 +143,29 @@ static int copy_to_brd_setup(struct brd_device *brd, sector_t sector, size_t n, static void copy_to_brd(struct brd_device *brd, const void *src, sector_t sector, size_t n) { - struct page *page; + struct folio *folio; void *dst; unsigned int offset = (sector & (PAGE_SECTORS-1)) << SECTOR_SHIFT; size_t copy; copy = min_t(size_t, n, PAGE_SIZE - offset); - page = brd_lookup_page(brd, sector); - BUG_ON(!page); + folio = brd_lookup_folio(brd, sector); + BUG_ON(!folio); - dst = kmap_atomic(page); - memcpy(dst + offset, src, copy); - kunmap_atomic(dst); + dst = kmap_local_folio(folio, offset); + memcpy(dst, src, copy); + kunmap_local(dst); if (copy < n) { src += copy; sector += copy >> SECTOR_SHIFT; copy = n - copy; - page = brd_lookup_page(brd, sector); - BUG_ON(!page); + folio = brd_lookup_folio(brd, sector); + BUG_ON(!folio); - dst = kmap_atomic(page); + dst = kmap_local_folio(folio, 0); memcpy(dst, src, copy); - kunmap_atomic(dst); + kunmap_local(dst); } } @@ -176,17 +175,17 @@ static void copy_to_brd(struct brd_device *brd, const void *src, static void copy_from_brd(void *dst, struct brd_device *brd, sector_t sector, size_t n) { - struct page *page; + struct folio *folio; void *src; unsigned int offset = (sector & (PAGE_SECTORS-1)) << SECTOR_SHIFT; size_t copy; copy = min_t(size_t, n, PAGE_SIZE - offset); - page = brd_lookup_page(brd, sector); - if (page) { - src = kmap_atomic(page); - memcpy(dst, src + offset, copy); - kunmap_atomic(src); + folio = brd_lookup_folio(brd, sector); + if (folio) { + src = kmap_local_folio(folio, offset); + memcpy(dst, src, copy); + kunmap_local(src); } else memset(dst, 0, copy); @@ -194,20 +193,20 @@ static void copy_from_brd(void *dst, struct brd_device *brd, dst += copy; sector += copy >> SECTOR_SHIFT; copy = n - copy; - page = brd_lookup_page(brd, sector); - if (page) { - src = kmap_atomic(page); + folio = brd_lookup_folio(brd, sector); + if (folio) { + src = kmap_local_folio(folio, 0); memcpy(dst, src, copy); - kunmap_atomic(src); + kunmap_local(src); } else memset(dst, 0, copy); } } /* - * Process a single bvec of a bio. + * Process a single folio of a bio. */ -static int brd_do_bvec(struct brd_device *brd, struct page *page, +static int brd_do_folio(struct brd_device *brd, struct folio *folio, unsigned int len, unsigned int off, blk_opf_t opf, sector_t sector) { @@ -217,7 +216,7 @@ static int brd_do_bvec(struct brd_device *brd, struct page *page, if (op_is_write(opf)) { /* * Must use NOIO because we don't want to recurse back into the - * block or filesystem layers from page reclaim. + * block or filesystem layers from folio reclaim. */ gfp_t gfp = opf & REQ_NOWAIT ? GFP_NOWAIT : GFP_NOIO; @@ -226,15 +225,15 @@ static int brd_do_bvec(struct brd_device *brd, struct page *page, goto out; } - mem = kmap_atomic(page); + mem = kmap_local_folio(folio, off); if (!op_is_write(opf)) { - copy_from_brd(mem + off, brd, sector, len); - flush_dcache_page(page); + copy_from_brd(mem, brd, sector, len); + flush_dcache_folio(folio); } else { - flush_dcache_page(page); - copy_to_brd(brd, mem + off, sector, len); + flush_dcache_folio(folio); + copy_to_brd(brd, mem, sector, len); } - kunmap_atomic(mem); + kunmap_local(mem); out: return err; @@ -244,19 +243,18 @@ static void brd_submit_bio(struct bio *bio) { struct brd_device *brd = bio->bi_bdev->bd_disk->private_data; sector_t sector = bio->bi_iter.bi_sector; - struct bio_vec bvec; - struct bvec_iter iter; + struct folio_iter iter; - bio_for_each_segment(bvec, bio, iter) { - unsigned int len = bvec.bv_len; + bio_for_each_folio_all(iter, bio) { + unsigned int len = iter.length; int err; /* Don't support un-aligned buffer */ - WARN_ON_ONCE((bvec.bv_offset & (SECTOR_SIZE - 1)) || + WARN_ON_ONCE((iter.offset & (SECTOR_SIZE - 1)) || (len & (SECTOR_SIZE - 1))); - err = brd_do_bvec(brd, bvec.bv_page, len, bvec.bv_offset, - bio->bi_opf, sector); + err = brd_do_folio(brd, iter.folio, len, iter.offset, + bio->bi_opf, sector); if (err) { if (err == -ENOMEM && bio->bi_opf & REQ_NOWAIT) { bio_wouldblock_error(bio); @@ -328,12 +326,12 @@ static int brd_alloc(int i) brd->brd_number = i; list_add_tail(&brd->brd_list, &brd_devices); - xa_init(&brd->brd_pages); + xa_init(&brd->brd_folios); snprintf(buf, DISK_NAME_LEN, "ram%d", i); if (!IS_ERR_OR_NULL(brd_debugfs_dir)) debugfs_create_u64(buf, 0444, brd_debugfs_dir, - &brd->brd_nr_pages); + &brd->brd_nr_folios); disk = brd->brd_disk = blk_alloc_disk(NUMA_NO_NODE); if (!disk) @@ -388,7 +386,7 @@ static void brd_cleanup(void) list_for_each_entry_safe(brd, next, &brd_devices, brd_list) { del_gendisk(brd->brd_disk); put_disk(brd->brd_disk); - brd_free_pages(brd); + brd_free_folios(brd); list_del(&brd->brd_list); kfree(brd); } @@ -419,7 +417,7 @@ static int __init brd_init(void) brd_check_and_reset_par(); - brd_debugfs_dir = debugfs_create_dir("ramdisk_pages", NULL); + brd_debugfs_dir = debugfs_create_dir("ramdisk_folios", NULL); for (i = 0; i < rd_nr; i++) { err = brd_alloc(i); From patchwork Wed Jun 14 11:46:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hannes Reinecke X-Patchwork-Id: 13279968 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 40B34EB64DC for ; Wed, 14 Jun 2023 11:47:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243930AbjFNLrB (ORCPT ); Wed, 14 Jun 2023 07:47:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59652 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230129AbjFNLrA (ORCPT ); Wed, 14 Jun 2023 07:47:00 -0400 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E576810F7; Wed, 14 Jun 2023 04:46:58 -0700 (PDT) Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 9D4951FDED; Wed, 14 Jun 2023 11:46:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1686743217; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HqgE0FoAJNGu3N9ldwgXXKWng+6fcILJBM9HxWu5/NM=; b=XIORnNdJrvGEKeB2679tyIGGbsGA6aRreLcugpVv3sg62nALqkZYkJYvIm6YUHEukI2EIj f/TA8v5NejynDwmXXWOp7RdmPFjRzr4pzi7ZkIcLnwLt5FDXeRSalpRiNwGkf/fx6R27Xo CQkoT/X4KqzxC1nf0aiLAg6AXVYEMzM= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1686743217; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HqgE0FoAJNGu3N9ldwgXXKWng+6fcILJBM9HxWu5/NM=; b=E5RJpF3gL1iygNwm8tgdEVR9oSKr1G01Fvtut4qtEtAg95627X2HroAfPDGDIW7yqUeyVz CTZwvWSz93eCtnAg== Received: from adalid.arch.suse.de (adalid.arch.suse.de [10.161.8.13]) by relay2.suse.de (Postfix) with ESMTP id 8BF2F2C145; Wed, 14 Jun 2023 11:46:57 +0000 (UTC) Received: by adalid.arch.suse.de (Postfix, from userid 16045) id 808DC51C4E0F; Wed, 14 Jun 2023 13:46:57 +0200 (CEST) From: Hannes Reinecke To: Matthew Wilcox Cc: linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, Andrew Morton , Christoph Hellwig , Luis Chamberlain , Hannes Reinecke Subject: [PATCH 3/7] brd: abstract page_size conventions Date: Wed, 14 Jun 2023 13:46:33 +0200 Message-Id: <20230614114637.89759-4-hare@suse.de> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230614114637.89759-1-hare@suse.de> References: <20230614114637.89759-1-hare@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org In preparation for changing the block sizes abstract away references to PAGE_SIZE and friends. Signed-off-by: Hannes Reinecke --- drivers/block/brd.c | 39 ++++++++++++++++++++++++++++++--------- 1 file changed, 30 insertions(+), 9 deletions(-) diff --git a/drivers/block/brd.c b/drivers/block/brd.c index 24769d010fee..71d3d8af8b0d 100644 --- a/drivers/block/brd.c +++ b/drivers/block/brd.c @@ -45,6 +45,23 @@ struct brd_device { u64 brd_nr_folios; }; +#define BRD_SECTOR_SHIFT(b) (PAGE_SHIFT - SECTOR_SHIFT) + +static pgoff_t brd_sector_index(struct brd_device *brd, sector_t sector) +{ + pgoff_t idx; + + idx = sector >> BRD_SECTOR_SHIFT(brd); + return idx; +} + +static int brd_sector_offset(struct brd_device *brd, sector_t sector) +{ + unsigned int rd_sector_mask = (1 << BRD_SECTOR_SHIFT(brd)) - 1; + + return ((unsigned int)sector & rd_sector_mask) << SECTOR_SHIFT; +} + /* * Look up and return a brd's folio for a given sector. */ @@ -53,7 +70,7 @@ static struct folio *brd_lookup_folio(struct brd_device *brd, sector_t sector) pgoff_t idx; struct folio *folio; - idx = sector >> PAGE_SECTORS_SHIFT; /* sector to folio index */ + idx = brd_sector_index(brd, sector); /* sector to folio index */ folio = xa_load(&brd->brd_folios, idx); BUG_ON(folio && folio->index != idx); @@ -68,19 +85,20 @@ static int brd_insert_folio(struct brd_device *brd, sector_t sector, gfp_t gfp) { pgoff_t idx; struct folio *folio, *cur; + unsigned int rd_sector_order = get_order(PAGE_SIZE); int ret = 0; folio = brd_lookup_folio(brd, sector); if (folio) return 0; - folio = folio_alloc(gfp | __GFP_ZERO | __GFP_HIGHMEM, 0); + folio = folio_alloc(gfp | __GFP_ZERO | __GFP_HIGHMEM, rd_sector_order); if (!folio) return -ENOMEM; xa_lock(&brd->brd_folios); - idx = sector >> PAGE_SECTORS_SHIFT; + idx = brd_sector_index(brd, sector); folio->index = idx; cur = __xa_cmpxchg(&brd->brd_folios, idx, NULL, folio, gfp); @@ -122,11 +140,12 @@ static void brd_free_folios(struct brd_device *brd) static int copy_to_brd_setup(struct brd_device *brd, sector_t sector, size_t n, gfp_t gfp) { - unsigned int offset = (sector & (PAGE_SECTORS-1)) << SECTOR_SHIFT; + unsigned int rd_sector_size = PAGE_SIZE; + unsigned int offset = brd_sector_offset(brd, sector); size_t copy; int ret; - copy = min_t(size_t, n, PAGE_SIZE - offset); + copy = min_t(size_t, n, rd_sector_size - offset); ret = brd_insert_folio(brd, sector, gfp); if (ret) return ret; @@ -145,10 +164,11 @@ static void copy_to_brd(struct brd_device *brd, const void *src, { struct folio *folio; void *dst; - unsigned int offset = (sector & (PAGE_SECTORS-1)) << SECTOR_SHIFT; + unsigned int rd_sector_size = PAGE_SIZE; + unsigned int offset = brd_sector_offset(brd, sector); size_t copy; - copy = min_t(size_t, n, PAGE_SIZE - offset); + copy = min_t(size_t, n, rd_sector_size - offset); folio = brd_lookup_folio(brd, sector); BUG_ON(!folio); @@ -177,10 +197,11 @@ static void copy_from_brd(void *dst, struct brd_device *brd, { struct folio *folio; void *src; - unsigned int offset = (sector & (PAGE_SECTORS-1)) << SECTOR_SHIFT; + unsigned int rd_sector_size = PAGE_SIZE; + unsigned int offset = brd_sector_offset(brd, sector); size_t copy; - copy = min_t(size_t, n, PAGE_SIZE - offset); + copy = min_t(size_t, n, rd_sector_size - offset); folio = brd_lookup_folio(brd, sector); if (folio) { src = kmap_local_folio(folio, offset); From patchwork Wed Jun 14 11:46:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hannes Reinecke X-Patchwork-Id: 13279970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8EF32C001DD for ; Wed, 14 Jun 2023 11:47:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243703AbjFNLrD (ORCPT ); Wed, 14 Jun 2023 07:47:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59660 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243671AbjFNLrA (ORCPT ); Wed, 14 Jun 2023 07:47:00 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F3F641BE8; Wed, 14 Jun 2023 04:46:58 -0700 (PDT) Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id A659B22518; Wed, 14 Jun 2023 11:46:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1686743217; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=a+JuxCxbU3J/C7rxukZwv/Y0BBZpaRFvUYKNZQeoUDE=; b=W+EX46t3sb9IhLvjQfI4UYRJ6rt6J/ApTLGHdPWsqhosb2HrQ2loC7vRndFa3ub4mwCeWS LkQNGBIHXKjhBffHce/R4/kFkwI+cZMlYU8S8nZc5BMAzexbteDg2U39NxX7+8EXrtGQng rzk+lxAdkmK6Kh2TWcZiLTbYvl1jHRQ= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1686743217; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=a+JuxCxbU3J/C7rxukZwv/Y0BBZpaRFvUYKNZQeoUDE=; b=E7gNk0er8YuCbdbl9jEhq2z0ZJcJH7591rpNyV6PB05c0uOpa8JFBNlOpgkJEoHq0+VvRg 4KcgHQwHZ2uny9DA== Received: from adalid.arch.suse.de (adalid.arch.suse.de [10.161.8.13]) by relay2.suse.de (Postfix) with ESMTP id 934872C146; Wed, 14 Jun 2023 11:46:57 +0000 (UTC) Received: by adalid.arch.suse.de (Postfix, from userid 16045) id 8950551C4E11; Wed, 14 Jun 2023 13:46:57 +0200 (CEST) From: Hannes Reinecke To: Matthew Wilcox Cc: linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, Andrew Morton , Christoph Hellwig , Luis Chamberlain , Hannes Reinecke Subject: [PATCH 4/7] brd: make sector size configurable Date: Wed, 14 Jun 2023 13:46:34 +0200 Message-Id: <20230614114637.89759-5-hare@suse.de> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230614114637.89759-1-hare@suse.de> References: <20230614114637.89759-1-hare@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Add a module option 'rd_blksize' to allow the user to change the sector size of the RAM disks. Signed-off-by: Hannes Reinecke --- drivers/block/brd.c | 50 +++++++++++++++++++++++++++++++-------------- 1 file changed, 35 insertions(+), 15 deletions(-) diff --git a/drivers/block/brd.c b/drivers/block/brd.c index 71d3d8af8b0d..2ebb5532a204 100644 --- a/drivers/block/brd.c +++ b/drivers/block/brd.c @@ -30,7 +30,7 @@ /* * Each block ramdisk device has a xarray of folios that stores the folios * containing the block device's contents. A brd folio's ->index is its offset - * in PAGE_SIZE units. This is similar to, but in no way connected with, + * in brd_sector_size units. This is similar to, but in no way connected with, * the kernel's pagecache or buffer cache (which sit above our block device). */ struct brd_device { @@ -43,9 +43,11 @@ struct brd_device { */ struct xarray brd_folios; u64 brd_nr_folios; + unsigned int brd_sector_shift; + unsigned int brd_sector_size; }; -#define BRD_SECTOR_SHIFT(b) (PAGE_SHIFT - SECTOR_SHIFT) +#define BRD_SECTOR_SHIFT(b) ((b)->brd_sector_shift - SECTOR_SHIFT) static pgoff_t brd_sector_index(struct brd_device *brd, sector_t sector) { @@ -85,7 +87,7 @@ static int brd_insert_folio(struct brd_device *brd, sector_t sector, gfp_t gfp) { pgoff_t idx; struct folio *folio, *cur; - unsigned int rd_sector_order = get_order(PAGE_SIZE); + unsigned int rd_sector_order = get_order(brd->brd_sector_size); int ret = 0; folio = brd_lookup_folio(brd, sector); @@ -140,7 +142,7 @@ static void brd_free_folios(struct brd_device *brd) static int copy_to_brd_setup(struct brd_device *brd, sector_t sector, size_t n, gfp_t gfp) { - unsigned int rd_sector_size = PAGE_SIZE; + unsigned int rd_sector_size = brd->brd_sector_size; unsigned int offset = brd_sector_offset(brd, sector); size_t copy; int ret; @@ -164,7 +166,7 @@ static void copy_to_brd(struct brd_device *brd, const void *src, { struct folio *folio; void *dst; - unsigned int rd_sector_size = PAGE_SIZE; + unsigned int rd_sector_size = brd->brd_sector_size; unsigned int offset = brd_sector_offset(brd, sector); size_t copy; @@ -197,7 +199,7 @@ static void copy_from_brd(void *dst, struct brd_device *brd, { struct folio *folio; void *src; - unsigned int rd_sector_size = PAGE_SIZE; + unsigned int rd_sector_size = brd->brd_sector_size; unsigned int offset = brd_sector_offset(brd, sector); size_t copy; @@ -310,6 +312,10 @@ static int max_part = 1; module_param(max_part, int, 0444); MODULE_PARM_DESC(max_part, "Num Minors to reserve between devices"); +static unsigned int rd_blksize = PAGE_SIZE; +module_param(rd_blksize, uint, 0444); +MODULE_PARM_DESC(rd_blksize, "Blocksize of each RAM disk in bytes."); + MODULE_LICENSE("GPL"); MODULE_ALIAS_BLOCKDEV_MAJOR(RAMDISK_MAJOR); MODULE_ALIAS("rd"); @@ -336,6 +342,7 @@ static int brd_alloc(int i) struct brd_device *brd; struct gendisk *disk; char buf[DISK_NAME_LEN]; + unsigned int rd_max_sectors; int err = -ENOMEM; list_for_each_entry(brd, &brd_devices, brd_list) @@ -346,6 +353,25 @@ static int brd_alloc(int i) return -ENOMEM; brd->brd_number = i; list_add_tail(&brd->brd_list, &brd_devices); + brd->brd_sector_shift = ilog2(rd_blksize); + if ((1ULL << brd->brd_sector_shift) != rd_blksize) { + pr_err("rd_blksize %d is not supported\n", rd_blksize); + err = -EINVAL; + goto out_free_dev; + } + if (rd_blksize < SECTOR_SIZE) { + pr_err("rd_blksize must be at least 512 bytes\n"); + err = -EINVAL; + goto out_free_dev; + } + /* We can't allocate more than MAX_ORDER pages */ + rd_max_sectors = (1ULL << MAX_ORDER) << BRD_SECTOR_SHIFT(brd); + if (rd_blksize > rd_max_sectors) { + pr_err("rd_blocksize too large\n"); + err = -EINVAL; + goto out_free_dev; + } + brd->brd_sector_size = rd_blksize; xa_init(&brd->brd_folios); @@ -365,15 +391,9 @@ static int brd_alloc(int i) disk->private_data = brd; strscpy(disk->disk_name, buf, DISK_NAME_LEN); set_capacity(disk, rd_size * 2); - - /* - * This is so fdisk will align partitions on 4k, because of - * direct_access API needing 4k alignment, returning a PFN - * (This is only a problem on very small devices <= 4M, - * otherwise fdisk will align on 1M. Regardless this call - * is harmless) - */ - blk_queue_physical_block_size(disk->queue, PAGE_SIZE); + + blk_queue_physical_block_size(disk->queue, rd_blksize); + blk_queue_max_hw_sectors(disk->queue, 1ULL << (MAX_ORDER + PAGE_SECTORS_SHIFT)); /* Tell the block layer that this is not a rotational device */ blk_queue_flag_set(QUEUE_FLAG_NONROT, disk->queue); From patchwork Wed Jun 14 11:46:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hannes Reinecke X-Patchwork-Id: 13279971 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E058C001DE for ; Wed, 14 Jun 2023 11:47:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244281AbjFNLrD (ORCPT ); Wed, 14 Jun 2023 07:47:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59658 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243896AbjFNLrA (ORCPT ); Wed, 14 Jun 2023 07:47:00 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F1BE0A2; Wed, 14 Jun 2023 04:46:58 -0700 (PDT) Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id ACB8C22519; Wed, 14 Jun 2023 11:46:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1686743217; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=u1yTeoSVLVyjZiMQAOJz+F9uUM4XPnKqxDI9EXnILlU=; b=WXZo/hrnY9KoS/cRpRil5negGKX0kEDL5MkDRBN8jhPpaeY6blrfAsNR22+KjBddDUCgbg mccP3bHnwBZSZ58fKwX7SUqG4kbb5MTMsOiD4ojMoyCMoNAZPHKS3DePvrgoEW31PDLNyx 0t+00SrhBD/U6atjJL330n8j+dU5F/Q= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1686743217; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=u1yTeoSVLVyjZiMQAOJz+F9uUM4XPnKqxDI9EXnILlU=; b=S9MMdSSeOkT+Jen8/vr/M/ALKN0CKtIzVXFfvlSD4rjStJrcyMBtcqprjSJnc82REEECoW HwvLYf3SH3i9vmDA== Received: from adalid.arch.suse.de (adalid.arch.suse.de [10.161.8.13]) by relay2.suse.de (Postfix) with ESMTP id 985F72C149; Wed, 14 Jun 2023 11:46:57 +0000 (UTC) Received: by adalid.arch.suse.de (Postfix, from userid 16045) id 90C2251C4E13; Wed, 14 Jun 2023 13:46:57 +0200 (CEST) From: Hannes Reinecke To: Matthew Wilcox Cc: linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, Andrew Morton , Christoph Hellwig , Luis Chamberlain , Hannes Reinecke Subject: [PATCH 5/7] brd: make logical sector size configurable Date: Wed, 14 Jun 2023 13:46:35 +0200 Message-Id: <20230614114637.89759-6-hare@suse.de> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230614114637.89759-1-hare@suse.de> References: <20230614114637.89759-1-hare@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Add a module option 'rd_logical_blksize' to allow the user to change the logical sector size of the RAM disks. Signed-off-by: Hannes Reinecke --- drivers/block/brd.c | 38 ++++++++++++++++++++++++++++++-------- 1 file changed, 30 insertions(+), 8 deletions(-) diff --git a/drivers/block/brd.c b/drivers/block/brd.c index 2ebb5532a204..a9f3c6591e75 100644 --- a/drivers/block/brd.c +++ b/drivers/block/brd.c @@ -45,9 +45,11 @@ struct brd_device { u64 brd_nr_folios; unsigned int brd_sector_shift; unsigned int brd_sector_size; + unsigned int brd_logical_sector_shift; + unsigned int brd_logical_sector_size; }; -#define BRD_SECTOR_SHIFT(b) ((b)->brd_sector_shift - SECTOR_SHIFT) +#define BRD_SECTOR_SHIFT(b) ((b)->brd_sector_shift - (b)->brd_logical_sector_shift) static pgoff_t brd_sector_index(struct brd_device *brd, sector_t sector) { @@ -61,7 +63,7 @@ static int brd_sector_offset(struct brd_device *brd, sector_t sector) { unsigned int rd_sector_mask = (1 << BRD_SECTOR_SHIFT(brd)) - 1; - return ((unsigned int)sector & rd_sector_mask) << SECTOR_SHIFT; + return ((unsigned int)sector & rd_sector_mask) << brd->brd_logical_sector_shift; } /* @@ -152,7 +154,7 @@ static int copy_to_brd_setup(struct brd_device *brd, sector_t sector, size_t n, if (ret) return ret; if (copy < n) { - sector += copy >> SECTOR_SHIFT; + sector += copy >> brd->brd_logical_sector_shift; ret = brd_insert_folio(brd, sector, gfp); } return ret; @@ -180,7 +182,7 @@ static void copy_to_brd(struct brd_device *brd, const void *src, if (copy < n) { src += copy; - sector += copy >> SECTOR_SHIFT; + sector += copy >> brd->brd_logical_sector_shift; copy = n - copy; folio = brd_lookup_folio(brd, sector); BUG_ON(!folio); @@ -214,7 +216,7 @@ static void copy_from_brd(void *dst, struct brd_device *brd, if (copy < n) { dst += copy; - sector += copy >> SECTOR_SHIFT; + sector += copy >> brd->brd_logical_sector_shift; copy = n - copy; folio = brd_lookup_folio(brd, sector); if (folio) { @@ -273,8 +275,8 @@ static void brd_submit_bio(struct bio *bio) int err; /* Don't support un-aligned buffer */ - WARN_ON_ONCE((iter.offset & (SECTOR_SIZE - 1)) || - (len & (SECTOR_SIZE - 1))); + WARN_ON_ONCE((iter.offset & (brd->brd_logical_sector_size - 1)) || + (len & (brd->brd_logical_sector_size - 1))); err = brd_do_folio(brd, iter.folio, len, iter.offset, bio->bi_opf, sector); @@ -286,7 +288,7 @@ static void brd_submit_bio(struct bio *bio) bio_io_error(bio); return; } - sector += len >> SECTOR_SHIFT; + sector += len >> brd->brd_logical_sector_shift; } bio_endio(bio); @@ -316,6 +318,10 @@ static unsigned int rd_blksize = PAGE_SIZE; module_param(rd_blksize, uint, 0444); MODULE_PARM_DESC(rd_blksize, "Blocksize of each RAM disk in bytes."); +static unsigned int rd_logical_blksize = SECTOR_SIZE; +module_param(rd_logical_blksize, uint, 0444); +MODULE_PARM_DESC(rd_logical_blksize, "Logical blocksize of each RAM disk in bytes."); + MODULE_LICENSE("GPL"); MODULE_ALIAS_BLOCKDEV_MAJOR(RAMDISK_MAJOR); MODULE_ALIAS("rd"); @@ -373,6 +379,21 @@ static int brd_alloc(int i) } brd->brd_sector_size = rd_blksize; + brd->brd_logical_sector_shift = ilog2(rd_logical_blksize); + if ((1ULL << brd->brd_sector_shift) != rd_blksize) { + pr_err("rd_logical_blksize %d is not supported\n", + rd_logical_blksize); + err = -EINVAL; + goto out_free_dev; + } + if (rd_logical_blksize > rd_blksize) { + pr_err("rd_logical_blksize %d larger than rd_blksize %d\n", + rd_logical_blksize, rd_blksize); + err = -EINVAL; + goto out_free_dev; + } + brd->brd_logical_sector_size = rd_logical_blksize; + xa_init(&brd->brd_folios); snprintf(buf, DISK_NAME_LEN, "ram%d", i); @@ -393,6 +414,7 @@ static int brd_alloc(int i) set_capacity(disk, rd_size * 2); blk_queue_physical_block_size(disk->queue, rd_blksize); + blk_queue_logical_block_size(disk->queue, rd_logical_blksize); blk_queue_max_hw_sectors(disk->queue, 1ULL << (MAX_ORDER + PAGE_SECTORS_SHIFT)); /* Tell the block layer that this is not a rotational device */ From patchwork Wed Jun 14 11:46:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hannes Reinecke X-Patchwork-Id: 13279967 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 61826C001B0 for ; Wed, 14 Jun 2023 11:47:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230129AbjFNLrC (ORCPT ); Wed, 14 Jun 2023 07:47:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59654 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230388AbjFNLrA (ORCPT ); Wed, 14 Jun 2023 07:47:00 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F299C1BC3; Wed, 14 Jun 2023 04:46:58 -0700 (PDT) Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id AF2BB2251A; Wed, 14 Jun 2023 11:46:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1686743217; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CbocsrMjXp7dBwU80SxwtZMmeZa5Prxn6bB4blsRqFc=; b=VBwllP/vdy0rvoghMWasHmjJKlmJP4eYOLD9/8tJO+rxARI9BJ/dzhII0uV02vTRUSoHzh CCE1rQnLEMFY06ZBORzgWF31ETC1Ab7UC5J9W2tiQZyJL03WjeVSiNP4xJByqGPUTEuUxk FFxdxW6iTjXbu0+3Lcq1e7fIrQ/dPN8= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1686743217; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CbocsrMjXp7dBwU80SxwtZMmeZa5Prxn6bB4blsRqFc=; b=1TLaH2Wd/FXHoOKBlRtPpC5KmV11QkwhBaKWiyVPkOgs/JRFzcrWBvZKIaQTKD0YoXVxH7 vMoIa7WQxmxHeWAg== Received: from adalid.arch.suse.de (adalid.arch.suse.de [10.161.8.13]) by relay2.suse.de (Postfix) with ESMTP id 9E91A2C14E; Wed, 14 Jun 2023 11:46:57 +0000 (UTC) Received: by adalid.arch.suse.de (Postfix, from userid 16045) id 97D9451C4E15; Wed, 14 Jun 2023 13:46:57 +0200 (CEST) From: Hannes Reinecke To: Matthew Wilcox Cc: linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, Andrew Morton , Christoph Hellwig , Luis Chamberlain , Hannes Reinecke Subject: [PATCH 6/7] mm/filemap: allocate folios with mapping blocksize Date: Wed, 14 Jun 2023 13:46:36 +0200 Message-Id: <20230614114637.89759-7-hare@suse.de> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230614114637.89759-1-hare@suse.de> References: <20230614114637.89759-1-hare@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org The mapping has an underlying blocksize (by virtue of mapping->host->i_blkbits), so if the mapping blocksize is larger than the pagesize we should allocate folios in the correct order. Signed-off-by: Hannes Reinecke Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/pagemap.h | 7 +++++++ mm/filemap.c | 7 ++++--- mm/readahead.c | 6 +++--- 3 files changed, 14 insertions(+), 6 deletions(-) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 716953ee1ebd..9ea1a9724d64 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -494,6 +494,13 @@ static inline gfp_t readahead_gfp_mask(struct address_space *x) return mapping_gfp_mask(x) | __GFP_NORETRY | __GFP_NOWARN; } +static inline int mapping_get_order(struct address_space *x) +{ + if (x->host->i_blkbits > PAGE_SHIFT) + return x->host->i_blkbits - PAGE_SHIFT; + return 0; +} + typedef int filler_t(struct file *, struct folio *); pgoff_t page_cache_next_miss(struct address_space *mapping, diff --git a/mm/filemap.c b/mm/filemap.c index 4be20e82e4c3..6f08d04995d9 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1936,7 +1936,7 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index, gfp |= GFP_NOWAIT | __GFP_NOWARN; } - folio = filemap_alloc_folio(gfp, 0); + folio = filemap_alloc_folio(gfp, mapping_get_order(mapping)); if (!folio) return ERR_PTR(-ENOMEM); @@ -2495,7 +2495,8 @@ static int filemap_create_folio(struct file *file, struct folio *folio; int error; - folio = filemap_alloc_folio(mapping_gfp_mask(mapping), 0); + folio = filemap_alloc_folio(mapping_gfp_mask(mapping), + mapping_get_order(mapping)); if (!folio) return -ENOMEM; @@ -3646,7 +3647,7 @@ static struct folio *do_read_cache_folio(struct address_space *mapping, repeat: folio = filemap_get_folio(mapping, index); if (IS_ERR(folio)) { - folio = filemap_alloc_folio(gfp, 0); + folio = filemap_alloc_folio(gfp, mapping_get_order(mapping)); if (!folio) return ERR_PTR(-ENOMEM); err = filemap_add_folio(mapping, folio, index, gfp); diff --git a/mm/readahead.c b/mm/readahead.c index 47afbca1d122..031935b78af7 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -245,7 +245,7 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, continue; } - folio = filemap_alloc_folio(gfp_mask, 0); + folio = filemap_alloc_folio(gfp_mask, mapping_get_order(mapping)); if (!folio) break; if (filemap_add_folio(mapping, folio, index + i, @@ -806,7 +806,7 @@ void readahead_expand(struct readahead_control *ractl, if (folio && !xa_is_value(folio)) return; /* Folio apparently present */ - folio = filemap_alloc_folio(gfp_mask, 0); + folio = filemap_alloc_folio(gfp_mask, mapping_get_order(mapping)); if (!folio) return; if (filemap_add_folio(mapping, folio, index, gfp_mask) < 0) { @@ -833,7 +833,7 @@ void readahead_expand(struct readahead_control *ractl, if (folio && !xa_is_value(folio)) return; /* Folio apparently present */ - folio = filemap_alloc_folio(gfp_mask, 0); + folio = filemap_alloc_folio(gfp_mask, mapping_get_order(mapping)); if (!folio) return; if (filemap_add_folio(mapping, folio, index, gfp_mask) < 0) { From patchwork Wed Jun 14 11:46:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hannes Reinecke X-Patchwork-Id: 13279969 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 579FAC001B3 for ; Wed, 14 Jun 2023 11:47:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244267AbjFNLrC (ORCPT ); Wed, 14 Jun 2023 07:47:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59656 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236153AbjFNLrA (ORCPT ); Wed, 14 Jun 2023 07:47:00 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1138A1BE9; Wed, 14 Jun 2023 04:46:59 -0700 (PDT) Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id B36A82252F; Wed, 14 Jun 2023 11:46:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1686743217; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=07KAYk4SthrO3fQhas82j8fyHpnc5bRMX3kImRSFlUk=; b=Kp7aqo3YOc2v6vzpWLCRu5NMYJAlqDzGoIt9Wywl4nnU5pIKA6Qwb4ilRVNtBBhd9MHd9w 6/UkRMkd4j0A84amowZr76QsU7iDsm2fA4yXj/ZyudHg/ya/UFQkR8UOO5dO9C39N3Q/1U QbrCro1ElbWTz1c54Ht+1yJbolv1JH4= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1686743217; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=07KAYk4SthrO3fQhas82j8fyHpnc5bRMX3kImRSFlUk=; b=MfAA5PP+8kGCQ480wk1yiwsSkZerT4G5Yj2tu2k3gsQaH3BqJvIm9L6eDGAgU2HXVRSSVC NzGXsfuur5b//HAw== Received: from adalid.arch.suse.de (adalid.arch.suse.de [10.161.8.13]) by relay2.suse.de (Postfix) with ESMTP id A41492C14F; Wed, 14 Jun 2023 11:46:57 +0000 (UTC) Received: by adalid.arch.suse.de (Postfix, from userid 16045) id A009C51C4E17; Wed, 14 Jun 2023 13:46:57 +0200 (CEST) From: Hannes Reinecke To: Matthew Wilcox Cc: linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, Andrew Morton , Christoph Hellwig , Luis Chamberlain , Hannes Reinecke Subject: [PATCH 7/7] mm/readahead: align readahead down to mapping blocksize Date: Wed, 14 Jun 2023 13:46:37 +0200 Message-Id: <20230614114637.89759-8-hare@suse.de> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230614114637.89759-1-hare@suse.de> References: <20230614114637.89759-1-hare@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org If the blocksize of the mapping is larger than the page size we need to align down readahead to avoid reading past the end of the device. Signed-off-by: Hannes Reinecke --- mm/readahead.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/mm/readahead.c b/mm/readahead.c index 031935b78af7..91a7dbf4fa04 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -285,6 +285,7 @@ static void do_page_cache_ra(struct readahead_control *ractl, struct inode *inode = ractl->mapping->host; unsigned long index = readahead_index(ractl); loff_t isize = i_size_read(inode); + unsigned int iblksize = i_blocksize(inode); pgoff_t end_index; /* The last page we want to read */ if (isize == 0) @@ -293,6 +294,9 @@ static void do_page_cache_ra(struct readahead_control *ractl, end_index = (isize - 1) >> PAGE_SHIFT; if (index > end_index) return; + if (iblksize > PAGE_SIZE) + end_index = ALIGN_DOWN(end_index, iblksize); + /* Don't read past the page containing the last byte of the file */ if (nr_to_read > end_index - index) nr_to_read = end_index - index + 1;