From patchwork Mon Sep 11 13:34:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Pankaj Raghav (Samsung)" X-Patchwork-Id: 13380098 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EED4ACA0EC3 for ; Mon, 11 Sep 2023 21:47:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1350777AbjIKVlR (ORCPT ); Mon, 11 Sep 2023 17:41:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38332 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238031AbjIKNep (ORCPT ); Mon, 11 Sep 2023 09:34:45 -0400 Received: from mout-p-102.mailbox.org (mout-p-102.mailbox.org [80.241.56.152]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BE71DCD7; Mon, 11 Sep 2023 06:34:39 -0700 (PDT) Received: from smtp202.mailbox.org (smtp202.mailbox.org [10.196.197.202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-102.mailbox.org (Postfix) with ESMTPS id 4Rknkh1Cz9z9sqY; Mon, 11 Sep 2023 15:34:36 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pankajraghav.com; s=MBO0001; t=1694439276; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sKXrEFssCVb4iL+iRLOPhm9fa5JUtWNhHDnCVqBgNPc=; b=a+rOwbLDJ0Sf+xzYoZzSRlHvx9UHJnqj+mQVZDnY4zqvNEODoN/XYhwdhjLk4ZgWiYjj1N HpekzhvAisQ0yOeJC/Y9d4Oa6OZyrKxeFauCcz3dewg6KxTCn8otgbWvPnJsDS6ReTC1fA WY8b7BNPjuCa6UaQJPrVHHpWItY6WDxdLT7flVA6EAwA6yZKVsVSUILep8Mw1vxGyTghVz e0jdDb4xdCYtn7n2FlomMQNjAmiziLVVkwicfOTf5AS59l8W8F8nyrLO3l7rFGubeh3Nf1 NsB1marZdTVTnRSOsQSA9cOZ1kaHRKUrt8tESFD9rpGFfxYPwmDyxRmebVnq0A== From: Pankaj Raghav To: minchan@kernel.org, senozhatsky@chromium.org Cc: linux-kernel@vger.kernel.org, axboe@kernel.dk, p.raghav@samsung.com, linux-block@vger.kernel.org, kernel@pankajraghav.com, gost.dev@samsung.com Subject: [PATCH 1/5] zram: move index preparation to a separate function in writeback_store Date: Mon, 11 Sep 2023 15:34:26 +0200 Message-Id: <20230911133430.1824564-2-kernel@pankajraghav.com> In-Reply-To: <20230911133430.1824564-1-kernel@pankajraghav.com> References: <20230911133430.1824564-1-kernel@pankajraghav.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Pankaj Raghav Add a new function writeback_prep_or_skip_index() that does the check and set the approapriate flags before writeback starts. The function returns false if the index can be skipped. Signed-off-by: Pankaj Raghav --- drivers/block/zram/zram_drv.c | 68 +++++++++++++++++++++-------------- 1 file changed, 42 insertions(+), 26 deletions(-) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 06673c6ca255..eaf9e227778e 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -595,6 +595,46 @@ static void read_from_bdev_async(struct zram *zram, struct page *page, #define IDLE_WRITEBACK (1<<1) #define INCOMPRESSIBLE_WRITEBACK (1<<2) +/* + * Returns: true if the index was prepared for further processing + * false if the index can be skipped + */ +static bool writeback_prep_or_skip_index(struct zram *zram, int mode, + unsigned long index) +{ + bool ret = false; + + zram_slot_lock(zram, index); + if (!zram_allocated(zram, index)) + goto skip; + + if (zram_test_flag(zram, index, ZRAM_WB) || + zram_test_flag(zram, index, ZRAM_SAME) || + zram_test_flag(zram, index, ZRAM_UNDER_WB)) + goto skip; + + if (mode & IDLE_WRITEBACK && !zram_test_flag(zram, index, ZRAM_IDLE)) + goto skip; + if (mode & HUGE_WRITEBACK && !zram_test_flag(zram, index, ZRAM_HUGE)) + goto skip; + if (mode & INCOMPRESSIBLE_WRITEBACK && + !zram_test_flag(zram, index, ZRAM_INCOMPRESSIBLE)) + goto skip; + + /* + * Clearing ZRAM_UNDER_WB is duty of caller. + * IOW, zram_free_page never clear it. + */ + zram_set_flag(zram, index, ZRAM_UNDER_WB); + /* Need for hugepage writeback racing */ + zram_set_flag(zram, index, ZRAM_IDLE); + + ret = true; +skip: + zram_slot_unlock(zram, index); + return ret; +} + static ssize_t writeback_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t len) { @@ -662,33 +702,9 @@ static ssize_t writeback_store(struct device *dev, } } - zram_slot_lock(zram, index); - if (!zram_allocated(zram, index)) - goto next; + if (!writeback_prep_or_skip_index(zram, mode, index)) + continue; - if (zram_test_flag(zram, index, ZRAM_WB) || - zram_test_flag(zram, index, ZRAM_SAME) || - zram_test_flag(zram, index, ZRAM_UNDER_WB)) - goto next; - - if (mode & IDLE_WRITEBACK && - !zram_test_flag(zram, index, ZRAM_IDLE)) - goto next; - if (mode & HUGE_WRITEBACK && - !zram_test_flag(zram, index, ZRAM_HUGE)) - goto next; - if (mode & INCOMPRESSIBLE_WRITEBACK && - !zram_test_flag(zram, index, ZRAM_INCOMPRESSIBLE)) - goto next; - - /* - * Clearing ZRAM_UNDER_WB is duty of caller. - * IOW, zram_free_page never clear it. - */ - zram_set_flag(zram, index, ZRAM_UNDER_WB); - /* Need for hugepage writeback racing */ - zram_set_flag(zram, index, ZRAM_IDLE); - zram_slot_unlock(zram, index); if (zram_read_page(zram, page, index, NULL)) { zram_slot_lock(zram, index); zram_clear_flag(zram, index, ZRAM_UNDER_WB); From patchwork Mon Sep 11 13:34:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Pankaj Raghav (Samsung)" X-Patchwork-Id: 13380097 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AE007CA0EC9 for ; Mon, 11 Sep 2023 21:47:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1350761AbjIKVlH (ORCPT ); Mon, 11 Sep 2023 17:41:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38326 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238030AbjIKNeo (ORCPT ); Mon, 11 Sep 2023 09:34:44 -0400 Received: from mout-p-101.mailbox.org (mout-p-101.mailbox.org [80.241.56.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A80B5125; Mon, 11 Sep 2023 06:34:39 -0700 (PDT) Received: from smtp202.mailbox.org (smtp202.mailbox.org [IPv6:2001:67c:2050:b231:465::202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-101.mailbox.org (Postfix) with ESMTPS id 4Rknkk1MxMz9slY; Mon, 11 Sep 2023 15:34:38 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pankajraghav.com; s=MBO0001; t=1694439278; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BXAvZHy82SoCVQaKv/cbU0xLnaVu+pzPidj59xLN5A8=; b=cu7s4m08t6I65Ck7rZOpE28zXVayV5lInSJ72/6vzTSz8wjACsQpffctcc75NXzi7655Yq zQpVCUGLw9hB8sY1icjF1R0cGQqaHwQj+2/dRw2528KW+prXimhzVmfA7udgodWOLxfJ2m mSzEAcDZK1V4+LS7wCFt+CDLoCjqQDiarfEZksvFz0fxmyz4d7CcbrflquuE2LOmjeIRzI Fg3Al6gxwuBWYgoIWyp9KdFq7NY2uAmQBKMLCrMPRhaf1A2R+QJQDBFU8DOU6lmuNQefta NR5kiGRq5pW3kD8VM6CDFY2WyAs9EYA9Nxwv9EjeowvpUAwYGc4dMbd2NXPo6A== From: Pankaj Raghav To: minchan@kernel.org, senozhatsky@chromium.org Cc: linux-kernel@vger.kernel.org, axboe@kernel.dk, p.raghav@samsung.com, linux-block@vger.kernel.org, kernel@pankajraghav.com, gost.dev@samsung.com Subject: [PATCH 2/5] zram: encapsulate writeback to the backing bdev in a function Date: Mon, 11 Sep 2023 15:34:27 +0200 Message-Id: <20230911133430.1824564-3-kernel@pankajraghav.com> In-Reply-To: <20230911133430.1824564-1-kernel@pankajraghav.com> References: <20230911133430.1824564-1-kernel@pankajraghav.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 4Rknkk1MxMz9slY Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Pankaj Raghav Encapsulate the flushing data to the backing bdev in writeback in a separate function writeback_flush_to_bdev(). This is in preparation for adding batching IO support to writeback_store(). No functional changes. Signed-off-by: Pankaj Raghav --- drivers/block/zram/zram_drv.c | 125 ++++++++++++++++++---------------- 1 file changed, 68 insertions(+), 57 deletions(-) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index eaf9e227778e..bd93ed653b99 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -635,14 +635,78 @@ static bool writeback_prep_or_skip_index(struct zram *zram, int mode, return ret; } +static int writeback_flush_to_bdev(struct zram *zram, unsigned long index, + struct page *page, unsigned long *blk_idx) +{ + struct bio bio; + struct bio_vec bio_vec; + int ret; + + bio_init(&bio, zram->bdev, &bio_vec, 1, REQ_OP_WRITE | REQ_SYNC); + bio.bi_iter.bi_sector = *blk_idx * (PAGE_SIZE >> 9); + __bio_add_page(&bio, page, PAGE_SIZE, 0); + + /* + * XXX: A single page IO would be inefficient for write + * but it would be not bad as starter. + */ + ret = submit_bio_wait(&bio); + if (ret) { + zram_slot_lock(zram, index); + zram_clear_flag(zram, index, ZRAM_UNDER_WB); + zram_clear_flag(zram, index, ZRAM_IDLE); + zram_slot_unlock(zram, index); + /* + * BIO errors are not fatal, we continue and simply + * attempt to writeback the remaining objects (pages). + * At the same time we need to signal user-space that + * some writes (at least one, but also could be all of + * them) were not successful and we do so by returning + * the most recent BIO error. + */ + return ret; + } + + atomic64_inc(&zram->stats.bd_writes); + /* + * We released zram_slot_lock so need to check if the slot was + * changed. If there is freeing for the slot, we can catch it + * easily by zram_allocated. + * A subtle case is the slot is freed/reallocated/marked as + * ZRAM_IDLE again. To close the race, idle_store doesn't + * mark ZRAM_IDLE once it found the slot was ZRAM_UNDER_WB. + * Thus, we could close the race by checking ZRAM_IDLE bit. + */ + zram_slot_lock(zram, index); + if (!zram_allocated(zram, index) || + !zram_test_flag(zram, index, ZRAM_IDLE)) { + zram_clear_flag(zram, index, ZRAM_UNDER_WB); + zram_clear_flag(zram, index, ZRAM_IDLE); + goto skip; + } + + zram_free_page(zram, index); + zram_clear_flag(zram, index, ZRAM_UNDER_WB); + zram_set_flag(zram, index, ZRAM_WB); + zram_set_element(zram, index, *blk_idx); + atomic64_inc(&zram->stats.pages_stored); + *blk_idx = 0; + + spin_lock(&zram->wb_limit_lock); + if (zram->wb_limit_enable && zram->bd_wb_limit > 0) + zram->bd_wb_limit -= 1UL << (PAGE_SHIFT - 12); + spin_unlock(&zram->wb_limit_lock); +skip: + zram_slot_unlock(zram, index); + return 0; +} + static ssize_t writeback_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t len) { struct zram *zram = dev_to_zram(dev); unsigned long nr_pages = zram->disksize >> PAGE_SHIFT; unsigned long index = 0; - struct bio bio; - struct bio_vec bio_vec; struct page *page; ssize_t ret = len; int mode, err; @@ -713,63 +777,10 @@ static ssize_t writeback_store(struct device *dev, continue; } - bio_init(&bio, zram->bdev, &bio_vec, 1, - REQ_OP_WRITE | REQ_SYNC); - bio.bi_iter.bi_sector = blk_idx * (PAGE_SIZE >> 9); - __bio_add_page(&bio, page, PAGE_SIZE, 0); + err = writeback_flush_to_bdev(zram, index, page, &blk_idx); - /* - * XXX: A single page IO would be inefficient for write - * but it would be not bad as starter. - */ - err = submit_bio_wait(&bio); - if (err) { - zram_slot_lock(zram, index); - zram_clear_flag(zram, index, ZRAM_UNDER_WB); - zram_clear_flag(zram, index, ZRAM_IDLE); - zram_slot_unlock(zram, index); - /* - * BIO errors are not fatal, we continue and simply - * attempt to writeback the remaining objects (pages). - * At the same time we need to signal user-space that - * some writes (at least one, but also could be all of - * them) were not successful and we do so by returning - * the most recent BIO error. - */ + if (err) ret = err; - continue; - } - - atomic64_inc(&zram->stats.bd_writes); - /* - * We released zram_slot_lock so need to check if the slot was - * changed. If there is freeing for the slot, we can catch it - * easily by zram_allocated. - * A subtle case is the slot is freed/reallocated/marked as - * ZRAM_IDLE again. To close the race, idle_store doesn't - * mark ZRAM_IDLE once it found the slot was ZRAM_UNDER_WB. - * Thus, we could close the race by checking ZRAM_IDLE bit. - */ - zram_slot_lock(zram, index); - if (!zram_allocated(zram, index) || - !zram_test_flag(zram, index, ZRAM_IDLE)) { - zram_clear_flag(zram, index, ZRAM_UNDER_WB); - zram_clear_flag(zram, index, ZRAM_IDLE); - goto next; - } - - zram_free_page(zram, index); - zram_clear_flag(zram, index, ZRAM_UNDER_WB); - zram_set_flag(zram, index, ZRAM_WB); - zram_set_element(zram, index, blk_idx); - blk_idx = 0; - atomic64_inc(&zram->stats.pages_stored); - spin_lock(&zram->wb_limit_lock); - if (zram->wb_limit_enable && zram->bd_wb_limit > 0) - zram->bd_wb_limit -= 1UL << (PAGE_SHIFT - 12); - spin_unlock(&zram->wb_limit_lock); -next: - zram_slot_unlock(zram, index); } if (blk_idx) From patchwork Mon Sep 11 13:34:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Pankaj Raghav (Samsung)" X-Patchwork-Id: 13380096 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DD04ACA0ECB for ; Mon, 11 Sep 2023 21:47:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1350713AbjIKVk5 (ORCPT ); Mon, 11 Sep 2023 17:40:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38338 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238032AbjIKNeu (ORCPT ); Mon, 11 Sep 2023 09:34:50 -0400 Received: from mout-p-102.mailbox.org (mout-p-102.mailbox.org [80.241.56.152]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9E798106; Mon, 11 Sep 2023 06:34:45 -0700 (PDT) Received: from smtp202.mailbox.org (smtp202.mailbox.org [10.196.197.202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-102.mailbox.org (Postfix) with ESMTPS id 4Rknkm0ZMnz9ssn; Mon, 11 Sep 2023 15:34:40 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pankajraghav.com; s=MBO0001; t=1694439280; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tX5Q/p2imNV6Vfwxh+ihQ2ZUaBdn0HxqAB6O7wq75t4=; b=A4rvocrwut8w3kDn9hrNosZg/hjRDt10YVYAjjnDvElcx/bLavMp1sppemaUFwSJ0n+p8U k0nB1Jk8AVaZSBBl3Lr22PZ7z5XhG4ZZ61JDBptPyPcTCpKQxrM0UFmiDipFo7kMXKdWGy +OAfZ3NQyUEm9NPzd9GAhi08xM3HnQJ0K56MVNvPvmJZqOwfUkP6Yyhnan0nx6osZabei9 CicvRQ9Ai5edtrYhe6x9HU8lNfX9DPYYqJI79ozUcrmI6vGNywf3qe07gXCOJShsyLoeFa rJ7TPfBejw7Fu0b9dRrpNVj/nRARjzzYylP5TP+dXHMYCvQUUkGih51d74Rn9A== From: Pankaj Raghav To: minchan@kernel.org, senozhatsky@chromium.org Cc: linux-kernel@vger.kernel.org, axboe@kernel.dk, p.raghav@samsung.com, linux-block@vger.kernel.org, kernel@pankajraghav.com, gost.dev@samsung.com Subject: [PATCH 3/5] zram: add alloc_block_bdev_range() and free_block_bdev_range() Date: Mon, 11 Sep 2023 15:34:28 +0200 Message-Id: <20230911133430.1824564-4-kernel@pankajraghav.com> In-Reply-To: <20230911133430.1824564-1-kernel@pankajraghav.com> References: <20230911133430.1824564-1-kernel@pankajraghav.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Pankaj Raghav Add [alloc|free]_block_bdev_range() which accepts number of blocks to allocate or free from the block bitmap. alloc_block_bdev_range() tries to allocate a range of bitmap based in the input nr_of_blocks whenever possible, or else it will retry with a smaller value. This is done so that we don't unnecessarily return EIO when the underlying device is fragmented. alloc_block_bdev_range() is not an atomic operation as this function can be called only from writeback_store() and init_lock is anyway taken making sure there cannot be two processes allocating from bdev bitmap. free_block_bdev_range() is just a simple loop that calls the atomic free_block_bdev() function. As bdev bitmap free can be called from two different process simulataneously without a lock, atomicity needs to be maintained. This is useful when we want to send larger IOs to the backing dev. Signed-off-by: Pankaj Raghav --- drivers/block/zram/zram_drv.c | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index bd93ed653b99..0b8f814e11dd 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -576,6 +576,39 @@ static void free_block_bdev(struct zram *zram, unsigned long blk_idx) atomic64_dec(&zram->stats.bd_count); } +static unsigned long alloc_block_bdev_range(struct zram *zram, + unsigned int *nr_of_blocksp) +{ + unsigned long blk_idx; + unsigned int nr_of_blocks = *nr_of_blocksp; +retry: + /* skip 0 bit to confuse zram.handle = 0 */ + blk_idx = 1; + blk_idx = bitmap_find_next_zero_area(zram->bitmap, zram->nr_pages, + blk_idx, nr_of_blocks, 0); + + if ((blk_idx + nr_of_blocks) > zram->nr_pages) { + if (nr_of_blocks == 1) + return 0; + + nr_of_blocks = nr_of_blocks / 2; + goto retry; + } + + bitmap_set(zram->bitmap, blk_idx, nr_of_blocks); + atomic64_add(nr_of_blocks, &zram->stats.bd_count); + *nr_of_blocksp = nr_of_blocks; + + return blk_idx; +} + +static void free_block_bdev_range(struct zram *zram, unsigned long blk_idx, + unsigned int nr_of_blocks) +{ + for (unsigned int i = 0; i < nr_of_blocks; i++) + free_block_bdev(zram, blk_idx + i); +} + static void read_from_bdev_async(struct zram *zram, struct page *page, unsigned long entry, struct bio *parent) { From patchwork Mon Sep 11 13:34:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Pankaj Raghav (Samsung)" X-Patchwork-Id: 13380099 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 95AB8CA0EC7 for ; Mon, 11 Sep 2023 21:47:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241926AbjIKVlP (ORCPT ); Mon, 11 Sep 2023 17:41:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38340 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238034AbjIKNev (ORCPT ); Mon, 11 Sep 2023 09:34:51 -0400 Received: from mout-p-101.mailbox.org (mout-p-101.mailbox.org [80.241.56.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 14B3B106; Mon, 11 Sep 2023 06:34:47 -0700 (PDT) Received: from smtp1.mailbox.org (smtp1.mailbox.org [10.196.197.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-101.mailbox.org (Postfix) with ESMTPS id 4Rknkq2wrQz9sWC; Mon, 11 Sep 2023 15:34:43 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pankajraghav.com; s=MBO0001; t=1694439283; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0WN7ulzjMpysFWDYCmd+jc2KprSgNcNXEQOmRucvEQI=; b=teeid22MXH+w10HGGofK/1vRz+6CBeRd2wRWCS+MBDUIjAT5NY1nMUrTmJgCAgkaetPWNX 1/YyMGJ3RA8mPhSR5qWybpwzdBdp24pCPNvi5UhJwNj5ZUL9PDvzKFbVdUdYsjYE8IM9Gj RtEKjPPMEZItqXO79/TOKxiTg7IYoAoSoeQ9k/KCQ7ebLtVD4Ia6BiEFFQTafJlMJDqYWz XtlLn9hpYNoADus+IhNqkrHTVfQBLFS/+pL0Sdn7quhNGo7Dj6bLw6mX+NP9xCjmyjnZrs oryOhgib7zia2MWzUG8SqtjAgbyhEI9EJdvd1nrvoIEiFNIQJxIRPYZNAD7iww== From: Pankaj Raghav To: minchan@kernel.org, senozhatsky@chromium.org Cc: linux-kernel@vger.kernel.org, axboe@kernel.dk, p.raghav@samsung.com, linux-block@vger.kernel.org, kernel@pankajraghav.com, gost.dev@samsung.com Subject: [PATCH 4/5] zram: batch IOs during writeback to improve performance Date: Mon, 11 Sep 2023 15:34:29 +0200 Message-Id: <20230911133430.1824564-5-kernel@pankajraghav.com> In-Reply-To: <20230911133430.1824564-1-kernel@pankajraghav.com> References: <20230911133430.1824564-1-kernel@pankajraghav.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Pankaj Raghav This crosses off one of the TODO that was there as a part of writeback_store() function: A single page IO would be inefficient for write... This reduces the time of writeback of 4G data to a nvme backing device from 68 secs to 15 secs (more than 4x improvement). The idea is to batch the IOs until to a certain limit before the data is flushed to the backing device. The batch limit is initially chosen based on the bdi->io_pages value with an upper limit of 32 pages (128k on x86). The limit is modified based writeback_limit, if set. Signed-off-by: Pankaj Raghav --- drivers/block/zram/zram_drv.c | 186 +++++++++++++++++++++------------- 1 file changed, 113 insertions(+), 73 deletions(-) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 0b8f814e11dd..27313c2d781d 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -551,22 +551,6 @@ static ssize_t backing_dev_store(struct device *dev, return err; } -static unsigned long alloc_block_bdev(struct zram *zram) -{ - unsigned long blk_idx = 1; -retry: - /* skip 0 bit to confuse zram.handle = 0 */ - blk_idx = find_next_zero_bit(zram->bitmap, zram->nr_pages, blk_idx); - if (blk_idx == zram->nr_pages) - return 0; - - if (test_and_set_bit(blk_idx, zram->bitmap)) - goto retry; - - atomic64_inc(&zram->stats.bd_count); - return blk_idx; -} - static void free_block_bdev(struct zram *zram, unsigned long blk_idx) { int was_set; @@ -628,6 +612,15 @@ static void read_from_bdev_async(struct zram *zram, struct page *page, #define IDLE_WRITEBACK (1<<1) #define INCOMPRESSIBLE_WRITEBACK (1<<2) +#define MAX_INDEX_ENTRIES_ORDER 5 +#define MAX_INDEX_ENTRIES (1U << MAX_INDEX_ENTRIES_ORDER) +struct index_mapping { + /* Cap the maximum indices to 32 before we flush */ + unsigned long arr[MAX_INDEX_ENTRIES]; + unsigned int nr_of_entries; +}; + + /* * Returns: true if the index was prepared for further processing * false if the index can be skipped @@ -668,39 +661,36 @@ static bool writeback_prep_or_skip_index(struct zram *zram, int mode, return ret; } -static int writeback_flush_to_bdev(struct zram *zram, unsigned long index, - struct page *page, unsigned long *blk_idx) +static int writeback_flush_to_bdev(struct zram *zram, struct folio *folio, + struct index_mapping *map, + unsigned long blk_idx, unsigned int io_pages) { struct bio bio; struct bio_vec bio_vec; - int ret; + int ret = 0; + + if (!map->nr_of_entries) + return ret; bio_init(&bio, zram->bdev, &bio_vec, 1, REQ_OP_WRITE | REQ_SYNC); - bio.bi_iter.bi_sector = *blk_idx * (PAGE_SIZE >> 9); - __bio_add_page(&bio, page, PAGE_SIZE, 0); + bio.bi_iter.bi_sector = blk_idx * (PAGE_SIZE >> 9); + + if (!bio_add_folio(&bio, folio, io_pages * PAGE_SIZE, 0)) + goto cleanup; - /* - * XXX: A single page IO would be inefficient for write - * but it would be not bad as starter. - */ ret = submit_bio_wait(&bio); - if (ret) { - zram_slot_lock(zram, index); - zram_clear_flag(zram, index, ZRAM_UNDER_WB); - zram_clear_flag(zram, index, ZRAM_IDLE); - zram_slot_unlock(zram, index); - /* - * BIO errors are not fatal, we continue and simply - * attempt to writeback the remaining objects (pages). - * At the same time we need to signal user-space that - * some writes (at least one, but also could be all of - * them) were not successful and we do so by returning - * the most recent BIO error. - */ - return ret; - } + /* + * BIO errors are not fatal, we continue and simply + * attempt to writeback the remaining objects (pages). + * At the same time we need to signal user-space that + * some writes (at least one, but also could be all of + * them) were not successful and we do so by returning + * the most recent BIO error. + */ + if (ret) + goto cleanup; - atomic64_inc(&zram->stats.bd_writes); + atomic64_add(map->nr_of_entries, &zram->stats.bd_writes); /* * We released zram_slot_lock so need to check if the slot was * changed. If there is freeing for the slot, we can catch it @@ -710,28 +700,40 @@ static int writeback_flush_to_bdev(struct zram *zram, unsigned long index, * mark ZRAM_IDLE once it found the slot was ZRAM_UNDER_WB. * Thus, we could close the race by checking ZRAM_IDLE bit. */ - zram_slot_lock(zram, index); - if (!zram_allocated(zram, index) || - !zram_test_flag(zram, index, ZRAM_IDLE)) { - zram_clear_flag(zram, index, ZRAM_UNDER_WB); - zram_clear_flag(zram, index, ZRAM_IDLE); - goto skip; + for (int iter = 0; iter < map->nr_of_entries; iter++) { + zram_slot_lock(zram, map->arr[iter]); + if (!zram_allocated(zram, map->arr[iter]) || + !zram_test_flag(zram, map->arr[iter], ZRAM_IDLE)) { + zram_clear_flag(zram, map->arr[iter], ZRAM_UNDER_WB); + zram_clear_flag(zram, map->arr[iter], ZRAM_IDLE); + zram_slot_unlock(zram, map->arr[iter]); + free_block_bdev(zram, blk_idx + iter); + continue; + } + + zram_free_page(zram, map->arr[iter]); + zram_clear_flag(zram, map->arr[iter], ZRAM_UNDER_WB); + zram_set_flag(zram, map->arr[iter], ZRAM_WB); + zram_set_element(zram, map->arr[iter], blk_idx + iter); + zram_slot_unlock(zram, map->arr[iter]); + atomic64_inc(&zram->stats.pages_stored); + + spin_lock(&zram->wb_limit_lock); + if (zram->wb_limit_enable && zram->bd_wb_limit > 0) + zram->bd_wb_limit -= 1UL << (PAGE_SHIFT - 12); + spin_unlock(&zram->wb_limit_lock); } + return ret; - zram_free_page(zram, index); - zram_clear_flag(zram, index, ZRAM_UNDER_WB); - zram_set_flag(zram, index, ZRAM_WB); - zram_set_element(zram, index, *blk_idx); - atomic64_inc(&zram->stats.pages_stored); - *blk_idx = 0; - - spin_lock(&zram->wb_limit_lock); - if (zram->wb_limit_enable && zram->bd_wb_limit > 0) - zram->bd_wb_limit -= 1UL << (PAGE_SHIFT - 12); - spin_unlock(&zram->wb_limit_lock); -skip: - zram_slot_unlock(zram, index); - return 0; +cleanup: + for (int iter = 0; iter < map->nr_of_entries; iter++) { + zram_slot_lock(zram, map->arr[iter]); + zram_clear_flag(zram, map->arr[iter], ZRAM_UNDER_WB); + zram_clear_flag(zram, map->arr[iter], ZRAM_IDLE); + zram_slot_unlock(zram, map->arr[iter]); + } + free_block_bdev_range(zram, blk_idx, map->nr_of_entries); + return ret; } static ssize_t writeback_store(struct device *dev, @@ -741,9 +743,15 @@ static ssize_t writeback_store(struct device *dev, unsigned long nr_pages = zram->disksize >> PAGE_SHIFT; unsigned long index = 0; struct page *page; + struct folio *folio; ssize_t ret = len; int mode, err; unsigned long blk_idx = 0; + unsigned int io_pages; + u64 bd_wb_limit_pages = ULONG_MAX; + struct index_mapping map = {}; + unsigned int order = min(MAX_INDEX_ENTRIES_ORDER, + ilog2(zram->bdev->bd_disk->bdi->io_pages)); if (sysfs_streq(buf, "idle")) mode = IDLE_WRITEBACK; @@ -776,32 +784,48 @@ static ssize_t writeback_store(struct device *dev, goto release_init_lock; } - page = alloc_page(GFP_KERNEL); - if (!page) { + folio = folio_alloc(GFP_KERNEL, order); + if (!folio) { ret = -ENOMEM; goto release_init_lock; } for (; nr_pages != 0; index++, nr_pages--) { spin_lock(&zram->wb_limit_lock); - if (zram->wb_limit_enable && !zram->bd_wb_limit) { - spin_unlock(&zram->wb_limit_lock); - ret = -EIO; - break; + if (zram->wb_limit_enable) { + if (!zram->bd_wb_limit) { + spin_unlock(&zram->wb_limit_lock); + ret = -EIO; + break; + } + bd_wb_limit_pages = zram->bd_wb_limit >> + (PAGE_SHIFT - 12); } spin_unlock(&zram->wb_limit_lock); if (!blk_idx) { - blk_idx = alloc_block_bdev(zram); + io_pages = min(1UL << order, nr_pages); + io_pages = min_t(u64, bd_wb_limit_pages, io_pages); + + blk_idx = alloc_block_bdev_range(zram, &io_pages); if (!blk_idx) { ret = -ENOSPC; break; } } - if (!writeback_prep_or_skip_index(zram, mode, index)) - continue; + if (!writeback_prep_or_skip_index(zram, mode, index)) { + if (nr_pages > 1 || map.nr_of_entries == 0) + continue; + /* There are still some unfinished IOs that + * needs to be flushed + */ + err = writeback_flush_to_bdev(zram, folio, &map, + blk_idx, io_pages); + goto next; + } + page = folio_page(folio, map.nr_of_entries); if (zram_read_page(zram, page, index, NULL)) { zram_slot_lock(zram, index); zram_clear_flag(zram, index, ZRAM_UNDER_WB); @@ -810,15 +834,31 @@ static ssize_t writeback_store(struct device *dev, continue; } - err = writeback_flush_to_bdev(zram, index, page, &blk_idx); + map.arr[map.nr_of_entries++] = index; + if (map.nr_of_entries < io_pages) + continue; + err = writeback_flush_to_bdev(zram, folio, &map, blk_idx, + io_pages); +next: if (err) ret = err; + + /* + * Check if all the blocks have been utilized before + * allocating the next batch. This is necessary to free + * the unused blocks after looping through all indices. + */ + if (map.nr_of_entries == io_pages) { + blk_idx = 0; + map.nr_of_entries = 0; + } } if (blk_idx) - free_block_bdev(zram, blk_idx); - __free_page(page); + free_block_bdev_range(zram, blk_idx + map.nr_of_entries, + io_pages - map.nr_of_entries); + folio_put(folio); release_init_lock: up_read(&zram->init_lock); From patchwork Mon Sep 11 13:34:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Pankaj Raghav (Samsung)" X-Patchwork-Id: 13380100 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CDEF2CA0EC5 for ; Mon, 11 Sep 2023 21:48:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1350812AbjIKVlf (ORCPT ); Mon, 11 Sep 2023 17:41:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48864 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238035AbjIKNey (ORCPT ); Mon, 11 Sep 2023 09:34:54 -0400 Received: from mout-p-101.mailbox.org (mout-p-101.mailbox.org [80.241.56.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CECD8125; Mon, 11 Sep 2023 06:34:48 -0700 (PDT) Received: from smtp202.mailbox.org (smtp202.mailbox.org [10.196.197.202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-101.mailbox.org (Postfix) with ESMTPS id 4Rknks1V17z9sp2; Mon, 11 Sep 2023 15:34:45 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pankajraghav.com; s=MBO0001; t=1694439285; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sM0TCUwtp+I4p1Y7f2I/GJmIKW0adFKaWbv7dHHJV3w=; b=wjU43JN0vndUzN5R2PGvxnW0lcgnOeQ1yjXd7a06tIbLqiXTx3K2by5tBsNzBdYBTy0cKe F8N/5ya9dkiG58yv1tgWKX3S41c18l6/1kzc3c86cdomo4+cb8q8jrSiNsgCedrbmbYohZ X9lg0DBCvT7EYBqe9cvBiOZ5d7461YqAHvmu/3FtyTDveluBuFf1buxSuYVwCklEKqDM1f l0CzTxfxDKC6RHMaegLTnqZhDIz7Z05c/4BSVSzqmznCZXNBso7RaAcGrDg2eYwtlOJ7ew IAWE54eVw1QWTVOzO1oJF+EysxhE1RVpgrSZ/HqS5Giz4W6GT8IoRAowzwv5nw== From: Pankaj Raghav To: minchan@kernel.org, senozhatsky@chromium.org Cc: linux-kernel@vger.kernel.org, axboe@kernel.dk, p.raghav@samsung.com, linux-block@vger.kernel.org, kernel@pankajraghav.com, gost.dev@samsung.com Subject: [PATCH 5/5] zram: don't overload blk_idx variable in writeback_store() Date: Mon, 11 Sep 2023 15:34:30 +0200 Message-Id: <20230911133430.1824564-6-kernel@pankajraghav.com> In-Reply-To: <20230911133430.1824564-1-kernel@pankajraghav.com> References: <20230911133430.1824564-1-kernel@pankajraghav.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Pankaj Raghav Instead of overloading blk_idx variable to find if it was allocated and used, add a new boolean variable blk_allocated. No functional changes. Signed-off-by: Pankaj Raghav --- drivers/block/zram/zram_drv.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 27313c2d781d..7c1420e92c6a 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -747,6 +747,7 @@ static ssize_t writeback_store(struct device *dev, ssize_t ret = len; int mode, err; unsigned long blk_idx = 0; + bool blk_allocated = false; unsigned int io_pages; u64 bd_wb_limit_pages = ULONG_MAX; struct index_mapping map = {}; @@ -803,7 +804,7 @@ static ssize_t writeback_store(struct device *dev, } spin_unlock(&zram->wb_limit_lock); - if (!blk_idx) { + if (!blk_allocated) { io_pages = min(1UL << order, nr_pages); io_pages = min_t(u64, bd_wb_limit_pages, io_pages); @@ -812,6 +813,7 @@ static ssize_t writeback_store(struct device *dev, ret = -ENOSPC; break; } + blk_allocated = true; } if (!writeback_prep_or_skip_index(zram, mode, index)) { @@ -850,12 +852,12 @@ static ssize_t writeback_store(struct device *dev, * the unused blocks after looping through all indices. */ if (map.nr_of_entries == io_pages) { - blk_idx = 0; + blk_allocated = false; map.nr_of_entries = 0; } } - if (blk_idx) + if (blk_allocated) free_block_bdev_range(zram, blk_idx + map.nr_of_entries, io_pages - map.nr_of_entries); folio_put(folio);