From patchwork Mon Sep 27 04:15:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12519057 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B99ADC433EF for ; Mon, 27 Sep 2021 04:16:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 958026113D for ; Mon, 27 Sep 2021 04:16:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231254AbhI0ERn (ORCPT ); Mon, 27 Sep 2021 00:17:43 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:56793 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229674AbhI0ERn (ORCPT ); Mon, 27 Sep 2021 00:17:43 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1632716165; x=1664252165; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=/l4cq9TGo0kQv2G0Ba0Hgy0tDfVjlT9cii1DaH4ioOs=; b=UkuYMWBWNVM8DKF+xkqW03N+N5933MyTD/9iUiRG9Bolr198/uqOuWe9 XSA7MO2ZLc9iYmdt7eOohrpFEmfArL9CjnE1/j0aIBPoFPON8qOp6Nb9a HNvj1g5mtm2/9B4Z+KklnsG0jK4iBamwZCqrihVqbSs+xOBvrkZqTppcB 0WXgg97aUidtyTUjQMNjgTHG2+zIRu4ElfseqDIeXD0DJD7yPNU1WEqVw 9djoBLrQAHdeF4NSLI08JggE49uzvr6zajQQsq2zELMQa3ey7XHzk7pTZ QVoFBg7u9WdSgWVpCXUqZki6LcDz/ZJO7yCExeQJQ/juIGdpi557ZdIyP A==; X-IronPort-AV: E=Sophos;i="5.85,325,1624291200"; d="scan'208";a="180095510" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 27 Sep 2021 12:16:05 +0800 IronPort-SDR: jPghYG4XRX1eqDvm8UcfOxUZ58/6UlQWXGxLaPyUbfQ/8wJMsuOlBkYTVWscdeLHCjxMkysMXZ rqctKyxTDWlqknINAvvfQ+2iYdM/rMu76MuWHdn3l0wcFcLv4VpxO747c17pjQv2uHco6tjmmS 7gUmqzaF/GwAoeBXrNrWFdnp9JhUGrT0NXs18bY61f78BRbY9R6+hLwMKnW9OjFife1rpyoVI1 2TxYvQbUCLLE3pbavCToK6jgziRvo8jBd1vTN3gCDQS7+YDSKpdYTsyaBLh0OuF/6KA2dTt3+w N530bQdGAGTsn1GVG8eqkDsH Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Sep 2021 20:50:39 -0700 IronPort-SDR: UR6L+3f6OhoRjEgrqvAjl/5iiHH0J4D8LybwQ3G2pVMb03ugFmzf5I4V3+VX64FsyVXR1DQ28y W6FidthbIBoYZcYGPDg8G00I40GosxmtSlQQJwJxzrqzrTh8Fqx49ZNSv+EWtLHq3rj1uEKP0D 5Yj00ZGOVYM2T4kybooLamO+mzMWlfS5MimZAlb4UhEktabf1ekv4ocEhor+mb0q/wOaYIQ4gS u3r3AWVlKxTM2OU4anKOpswmXMGe9HPexmBasiWo+LLd19uXJ7AUrv60jewko1VLpAZ6+3qkEt 9RA= WDCIronportException: Internal Received: from 1r3v0f3.ad.shared (HELO naota-xeon.wdc.com) ([10.225.49.32]) by uls-op-cesaip02.wdc.com with ESMTP; 26 Sep 2021 21:16:06 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org Cc: David Sterba , Naohiro Aota Subject: [PATCH 1/5] btrfs-progs: mkfs: do not set zone size on non-zoned mode Date: Mon, 27 Sep 2021 13:15:50 +0900 Message-Id: <20210927041554.325884-2-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20210927041554.325884-1-naohiro.aota@wdc.com> References: <20210927041554.325884-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Since zone_size() returns an emulated zone size even for non-zoned device, we cannot use cfg.zone_size to determine the device is zoned or not. Set zone_size = 0 on non-zoned mode. Signed-off-by: Naohiro Aota Reviewed-by: Johannes Thumshirn --- mkfs/main.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/mkfs/main.c b/mkfs/main.c index 6f3d6ce42c5d..b925c572b2b3 100644 --- a/mkfs/main.c +++ b/mkfs/main.c @@ -1355,7 +1355,10 @@ int BOX_MAIN(mkfs)(int argc, char **argv) mkfs_cfg.features = features; mkfs_cfg.runtime_features = runtime_features; mkfs_cfg.csum_type = csum_type; - mkfs_cfg.zone_size = zone_size(file); + if (zoned) + mkfs_cfg.zone_size = zone_size(file); + else + mkfs_cfg.zone_size = 0; ret = make_btrfs(fd, &mkfs_cfg); if (ret) { From patchwork Mon Sep 27 04:15:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12519063 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A1879C433EF for ; Mon, 27 Sep 2021 04:17:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6E65A610A2 for ; Mon, 27 Sep 2021 04:17:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231319AbhI0ERo (ORCPT ); Mon, 27 Sep 2021 00:17:44 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:56793 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229674AbhI0ERo (ORCPT ); Mon, 27 Sep 2021 00:17:44 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1632716166; x=1664252166; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=G3uOKQMDOG5AQEAjA7Vwxllb85BTeFk4Wg1fLMga4lM=; b=pua9NIOnh2RAYFIKk4VuijdT8yHr54lp+rBeyz+cUoDl6HPMZsl0ajes qGZ/lDpfsYIowhUd+Q7SeBRoPI8tpguXt5F36IsNqdABqYz4xBHlfZB6L 6cjZo2BOKX0TMl8Zcs5R9zO++FjINVGkzPq5BBpxK8PfuRMGj3QcxlweL YrkY94Wvm1UKEgTa7cY5dQmzNjRoI89UD+ZUDxl7M1NHmq/2UTUGr8YZH j7NtEFE7+StBRa5eT4qRuHnI9859EAQ0rYxwkGIt/fO2GPON1+r5788Zj 5p50P2u3/Rtcz2IwrNQTDvRxxf6R7Q7CfG6oQaXP5AqZOy4K0RmfZQGag g==; X-IronPort-AV: E=Sophos;i="5.85,325,1624291200"; d="scan'208";a="180095512" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 27 Sep 2021 12:16:06 +0800 IronPort-SDR: 1kPjn8TzgfVl7sq6cQXnPKNxAsp5LCFo9osUpxNVDWODUJuxRdjdbqB633hSrmL5O68no1cJg8 WV8KcMjtQOoEnCtoo0wV3xZW5DoAYSfuPD2vvdG15URAVoHTdleKuL6435YE5EHtzblxIeYz9B d0AN2ZlKq+hQhPeQ+VPFkh58PS9FUeZYuNuRu4Zi/K8V2hMDovCDezNiOPEU9JUi+6Aps/uINv 5Uhazuakixad+/74urEtqF1TO5MLy3K+8xR60FLXlZLc9qbbmwSnRyY69P8mXqItP5TO5+JtVH hh5i3iCKTM5SEPsD+DF8fSqf Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Sep 2021 20:50:40 -0700 IronPort-SDR: Xtq854IlQQplABGqSoUXcPlQGJ7mXhlPCo0BqHOkTi7H1wxLGexKgAGMgJfd5TDQcpky3uxRh7 iFt/9I3NSN06MZfWHg+1H9Sb1JjrTMHfxiu1JBkWlbSNv6RUz7b9ASTJiWzEqBMKCzqFDktdFs 03GIxg7m29sQ7gWzLEDmF18ijhxTh0YHWzkQBOgMXKHtgeo0pxLRznYV8N2oH8e6vx7Kuq330y VwqXwD/yIyI17fM6+SU7Fw8N2q8tuKuUl4lId7SUdYUxFacVTiCLEDOpkPYcIWlJ/RoI+ZWJ/3 8D8= WDCIronportException: Internal Received: from 1r3v0f3.ad.shared (HELO naota-xeon.wdc.com) ([10.225.49.32]) by uls-op-cesaip02.wdc.com with ESMTP; 26 Sep 2021 21:16:07 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org Cc: David Sterba , Naohiro Aota Subject: [PATCH 2/5] btrfs-progs: introduce btrfs_pwrite wrapper for pwrite Date: Mon, 27 Sep 2021 13:15:51 +0900 Message-Id: <20210927041554.325884-3-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20210927041554.325884-1-naohiro.aota@wdc.com> References: <20210927041554.325884-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Wrap pwrite with btrfs_pwrite(). It simply calls pwrite() on non-zoned btrfs (= opened without O_DIRECT). On zoned mode (= opened with O_DIRECT), it allocates an aligned bounce buffer, copy the contents and use it for direct-IO writing. Writes in device_zero_blocks() and btrfs_wipe_existing_sb() are a little tricky. We don't have fs_info on our hands, so use zinfo to determine it is a zoned device or not. Signed-off-by: Naohiro Aota Reviewed-by: Johannes Thumshirn --- common/device-utils.c | 76 ++++++++++++++++++++++++++++++++++++--- common/device-utils.h | 19 +++++++++- kernel-shared/extent_io.c | 7 ++-- kernel-shared/zoned.c | 4 +-- mkfs/common.c | 14 +++++--- 5 files changed, 106 insertions(+), 14 deletions(-) diff --git a/common/device-utils.c b/common/device-utils.c index 503705c43754..3ba4dccba689 100644 --- a/common/device-utils.c +++ b/common/device-utils.c @@ -26,6 +26,7 @@ #include #include #include +#include #include #include "kernel-lib/sizes.h" #include "kernel-shared/disk-io.h" @@ -76,7 +77,7 @@ int device_discard_blocks(int fd, u64 start, u64 len) /* * Write zeros to the given range [start, start + len) */ -int device_zero_blocks(int fd, off_t start, size_t len) +int device_zero_blocks(int fd, off_t start, size_t len, const bool direct) { char *buf = malloc(len); int ret = 0; @@ -85,7 +86,7 @@ int device_zero_blocks(int fd, off_t start, size_t len) if (!buf) return -ENOMEM; memset(buf, 0, len); - written = pwrite(fd, buf, len, start); + written = btrfs_pwrite(fd, buf, len, start, direct); if (written != len) ret = -EIO; free(buf); @@ -115,7 +116,7 @@ static int zero_dev_clamped(int fd, struct btrfs_zoned_device_info *zinfo, if (zinfo && zinfo->model == ZONED_HOST_MANAGED) return zero_zone_blocks(fd, zinfo, start, end - start); - return device_zero_blocks(fd, start, end - start); + return device_zero_blocks(fd, start, end - start, false); } /* @@ -157,8 +158,10 @@ static int btrfs_wipe_existing_sb(int fd, struct btrfs_zoned_device_info *zinfo) len = sizeof(buf); if (!zone_is_sequential(zinfo, offset)) { + const bool direct = zinfo && zinfo->model == ZONED_HOST_MANAGED; + memset(buf, 0, len); - ret = pwrite(fd, buf, len, offset); + ret = btrfs_pwrite(fd, buf, len, offset, direct); if (ret < 0) { error("cannot wipe existing superblock: %m"); ret = -1; @@ -491,3 +494,68 @@ out: close(sysfs_fd); return ret; } + +ssize_t btrfs_direct_pio(int rw, int fd, void *buf, size_t count, off_t offset) +{ + int alignment; + size_t iosize; + void *bounce_buf = NULL; + struct stat stat_buf; + unsigned long req; + int ret; + ssize_t ret_rw; + + ASSERT(rw == READ || rw == WRITE); + + if (fstat(fd, &stat_buf) == -1) { + error("fstat failed (%m)"); + return 0; + } + + if ((stat_buf.st_mode & S_IFMT) == S_IFBLK) + req = BLKSSZGET; + else + req = FIGETBSZ; + + if (ioctl(fd, req, &alignment)) { + error("failed to get block size: %m"); + return 0; + } + + if (IS_ALIGNED((size_t)buf, alignment) && IS_ALIGNED(count, alignment)) { + if (rw == WRITE) + return pwrite(fd, buf, count, offset); + else + return pread(fd, buf, count, offset); + } + + /* Cannot do anything if the write size is not aligned */ + if (rw == WRITE && !IS_ALIGNED(count, alignment)) { + error("%lu is not aligned to %d", count, alignment); + return 0; + } + + iosize = round_up(count, alignment); + + ret = posix_memalign(&bounce_buf, alignment, iosize); + if (ret) { + error("failed to allocate bounce buffer: %m"); + errno = ret; + return 0; + } + + if (rw == WRITE) { + ASSERT(iosize == count); + memcpy(bounce_buf, buf, count); + ret_rw = pwrite(fd, bounce_buf, iosize, offset); + } else { + ret_rw = pread(fd, bounce_buf, iosize, offset); + if (ret_rw >= count) { + ret_rw = count; + memcpy(buf, bounce_buf, count); + } + } + + free(bounce_buf); + return ret_rw; +} diff --git a/common/device-utils.h b/common/device-utils.h index 099520bf9737..767dab4370e1 100644 --- a/common/device-utils.h +++ b/common/device-utils.h @@ -17,6 +17,8 @@ #ifndef __DEVICE_UTILS_H__ #define __DEVICE_UTILS_H__ +#include +#include #include "kerncompat.h" #include "sys/stat.h" @@ -35,7 +37,7 @@ * Generic block device helpers */ int device_discard_blocks(int fd, u64 start, u64 len); -int device_zero_blocks(int fd, off_t start, size_t len); +int device_zero_blocks(int fd, off_t start, size_t len, const bool direct); u64 device_get_partition_size(const char *dev); u64 device_get_partition_size_fd(int fd); int device_get_queue_param(const char *file, const char *param, char *buf, size_t len); @@ -47,5 +49,20 @@ u64 device_get_zone_size(int fd, const char *name); u64 btrfs_device_size(int fd, struct stat *st); int btrfs_prepare_device(int fd, const char *file, u64 *block_count_ret, u64 max_block_count, unsigned opflags); +ssize_t btrfs_direct_pio(int rw, int fd, void *buf, size_t count, off_t offset); + +#ifdef BTRFS_ZONED +static inline ssize_t btrfs_pwrite(int fd, void *buf, size_t count, + off_t offset, bool direct) +{ + if (!direct) + return pwrite(fd, buf, count, offset); + + return btrfs_direct_pio(WRITE, fd, buf, count, offset); +} +#else +#define btrfs_pwrite(fd, buf, count, offset, direct) \ + ({ (void)(direct); pwrite(fd, buf, count, offset); }) +#endif #endif diff --git a/kernel-shared/extent_io.c b/kernel-shared/extent_io.c index d3d79bc8f748..b5984949f431 100644 --- a/kernel-shared/extent_io.c +++ b/kernel-shared/extent_io.c @@ -29,6 +29,7 @@ #include "kernel-shared/ctree.h" #include "kernel-shared/volumes.h" #include "common/utils.h" +#include "common/device-utils.h" #include "common/internal.h" void extent_io_tree_init(struct extent_io_tree *tree) @@ -809,7 +810,8 @@ out: int write_extent_to_disk(struct extent_buffer *eb) { int ret; - ret = pwrite(eb->fd, eb->data, eb->len, eb->dev_bytenr); + ret = btrfs_pwrite(eb->fd, eb->data, eb->len, eb->dev_bytenr, + eb->fs_info->zoned); if (ret < 0) goto out; if (ret != eb->len) { @@ -932,7 +934,8 @@ int write_data_to_disk(struct btrfs_fs_info *info, void *buf, u64 offset, this_len = min(this_len, bytes_left); dev_nr++; - ret = pwrite(device->fd, buf + total_write, this_len, dev_bytenr); + ret = btrfs_pwrite(device->fd, buf + total_write, + this_len, dev_bytenr, info->zoned); if (ret != this_len) { if (ret < 0) { fprintf(stderr, "Error writing to " diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c index 8d94f98a7fce..c2cce3b5f366 100644 --- a/kernel-shared/zoned.c +++ b/kernel-shared/zoned.c @@ -424,7 +424,7 @@ int zero_zone_blocks(int fd, struct btrfs_zoned_device_info *zinfo, off_t start, count = zone_len - (ofst & (zone_len - 1)); if (!zone_is_sequential(zinfo, ofst)) { - ret = device_zero_blocks(fd, ofst, count); + ret = device_zero_blocks(fd, ofst, count, true); if (ret != 0) return ret; } @@ -595,7 +595,7 @@ size_t btrfs_sb_io(int fd, void *buf, off_t offset, int rw) if (rw == READ) ret_sz = pread64(fd, buf, count, mapped); else - ret_sz = pwrite64(fd, buf, count, mapped); + ret_sz = btrfs_pwrite(fd, buf, count, mapped, true); if (ret_sz != count) return ret_sz; diff --git a/mkfs/common.c b/mkfs/common.c index 20a7d1155972..5c8d6ac13a3b 100644 --- a/mkfs/common.c +++ b/mkfs/common.c @@ -54,7 +54,7 @@ static int btrfs_write_empty_tree(int fd, struct btrfs_mkfs_config *cfg, btrfs_set_header_nritems(buf, 0); csum_tree_block_size(buf, btrfs_csum_type_size(cfg->csum_type), 0, cfg->csum_type); - ret = pwrite(fd, buf->data, cfg->nodesize, block); + ret = btrfs_pwrite(fd, buf->data, cfg->nodesize, block, cfg->zone_size); if (ret != cfg->nodesize) return ret < 0 ? -errno : -EIO; return 0; @@ -134,7 +134,8 @@ static int btrfs_create_tree_root(int fd, struct btrfs_mkfs_config *cfg, cfg->csum_type); /* write back root tree */ - ret = pwrite(fd, buf->data, cfg->nodesize, cfg->blocks[MKFS_ROOT_TREE]); + ret = btrfs_pwrite(fd, buf->data, cfg->nodesize, + cfg->blocks[MKFS_ROOT_TREE], cfg->zone_size); if (ret != cfg->nodesize) return (ret < 0 ? -errno : -EIO); @@ -422,7 +423,8 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg) btrfs_set_header_nritems(buf, nritems); csum_tree_block_size(buf, btrfs_csum_type_size(cfg->csum_type), 0, cfg->csum_type); - ret = pwrite(fd, buf->data, cfg->nodesize, cfg->blocks[MKFS_EXTENT_TREE]); + ret = btrfs_pwrite(fd, buf->data, cfg->nodesize, + cfg->blocks[MKFS_EXTENT_TREE], cfg->zone_size); if (ret != cfg->nodesize) { ret = (ret < 0 ? -errno : -EIO); goto out; @@ -510,7 +512,8 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg) btrfs_set_header_nritems(buf, nritems); csum_tree_block_size(buf, btrfs_csum_type_size(cfg->csum_type), 0, cfg->csum_type); - ret = pwrite(fd, buf->data, cfg->nodesize, cfg->blocks[MKFS_CHUNK_TREE]); + ret = btrfs_pwrite(fd, buf->data, cfg->nodesize, + cfg->blocks[MKFS_CHUNK_TREE], cfg->zone_size); if (ret != cfg->nodesize) { ret = (ret < 0 ? -errno : -EIO); goto out; @@ -550,7 +553,8 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg) btrfs_set_header_nritems(buf, nritems); csum_tree_block_size(buf, btrfs_csum_type_size(cfg->csum_type), 0, cfg->csum_type); - ret = pwrite(fd, buf->data, cfg->nodesize, cfg->blocks[MKFS_DEV_TREE]); + ret = btrfs_pwrite(fd, buf->data, cfg->nodesize, + cfg->blocks[MKFS_DEV_TREE], cfg->zone_size); if (ret != cfg->nodesize) { ret = (ret < 0 ? -errno : -EIO); goto out; From patchwork Mon Sep 27 04:15:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12519067 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B7DB5C433F5 for ; Mon, 27 Sep 2021 04:17:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 839B06113A for ; Mon, 27 Sep 2021 04:17:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232067AbhI0ERp (ORCPT ); Mon, 27 Sep 2021 00:17:45 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:56793 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229674AbhI0ERp (ORCPT ); Mon, 27 Sep 2021 00:17:45 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1632716167; x=1664252167; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=1mT84zWYTQOJVs8RO+eD/0Y9ALT8W91lxu0YhkGD1SY=; b=WelWK9Io7e1FB4PcFxEyEkXZQSaWPyKkLdZgADsMgK9WmK8s57Zq6PS0 9lnE9jU6+snPj5w04qXBDRozv8r02hVAfNo0ZruyYSfVudWCvAEoaWBae +skzfgWoL81rDelbTrSgPTwslSAJSHRX5G4NlQcWiyM/U2qVCBVGMH5Es XAca0nqT0r7tebMDUD4wM/Jfppnug5Yw1rdQQpd+mMC41p0XGj0XAeeGr LnZvqfqFb6D+x6uc3CaUVVtntRh6LKcTqA8JcYYkoNXoFFu72kUXPfczv OK5B6rF4famQUEzwZxha79F4eBdYH1Gs2bqCtwZ4ydKpbue7X4tVKwdkj g==; X-IronPort-AV: E=Sophos;i="5.85,325,1624291200"; d="scan'208";a="180095513" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 27 Sep 2021 12:16:07 +0800 IronPort-SDR: Btpih6xdWyYUoJsbw3xwBdVJ9ZGY3GkySQwaciui64VEiaQZ/A2t1LUd+5V068ylLp89kzTsbC vYFZHQooupYdetKj7SZy0lOerI2ZGMrpDXfaa7mu2s8qG3RSpb9bJaX2Acyd3KqLYHah2ofPH7 pJTxArZthJnZOe7dQx+EimsKZ6IgC6PL9aDk1SSMt0bLbEKoiiAh7PUKch28zwYs+YahpiB89G xDzO6JdZhche5Dy2lhfEjw2Y0dvTktDjCKCMQ3ZI+PgT3Fu1eHaneff+Y0zO/IIBCgvjVUU6Ay rqQrCHOb/Vqj9Np/sbNTtb91 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Sep 2021 20:50:41 -0700 IronPort-SDR: aOncc9ORqT98Kjktl5Kds0xrDNG9R+C4q8AjV8/7qTTs909ZMiigORAxGLeCYk+W2VsuB/0g/m m/TimdD3ajw1eIKJNXfgmcWEQX5J4r9OgykE4jGrtp2rOxjMgvB+x6I+OR/bseSWbkLqL+MopT C4dSsbRQzQRBDKE8JbNQ6MpZQWNgto0anrR5QU1DJqDIpfrhJQCCQFwd+dHdCpf7jrZ/QDEeLY E7MVETt5+kGL/hr6yHzSjrLNMixPiW/58xnfV6kQbJC30e+BK6gW8JN1NpxyxAQxtNlVzOIuT+ POw= WDCIronportException: Internal Received: from 1r3v0f3.ad.shared (HELO naota-xeon.wdc.com) ([10.225.49.32]) by uls-op-cesaip02.wdc.com with ESMTP; 26 Sep 2021 21:16:08 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org Cc: David Sterba , Naohiro Aota Subject: [PATCH 3/5] btrfs-progs: introduce btrfs_pread wrapper for pread Date: Mon, 27 Sep 2021 13:15:52 +0900 Message-Id: <20210927041554.325884-4-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20210927041554.325884-1-naohiro.aota@wdc.com> References: <20210927041554.325884-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Wrap pread with btrfs_pread as well. Signed-off-by: Naohiro Aota Reviewed-by: Johannes Thumshirn --- common/device-utils.h | 10 ++++++++++ kernel-shared/disk-io.c | 4 +++- kernel-shared/extent_io.c | 7 ++++--- kernel-shared/zoned.c | 2 +- 4 files changed, 18 insertions(+), 5 deletions(-) diff --git a/common/device-utils.h b/common/device-utils.h index 767dab4370e1..f79e746840fc 100644 --- a/common/device-utils.h +++ b/common/device-utils.h @@ -60,9 +60,19 @@ static inline ssize_t btrfs_pwrite(int fd, void *buf, size_t count, return btrfs_direct_pio(WRITE, fd, buf, count, offset); } +static inline ssize_t btrfs_pread(int fd, void *buf, size_t count, off_t offset, + bool direct) +{ + if (!direct) + return pread(fd, buf, count, offset); + + return btrfs_direct_pio(READ, fd, buf, count, offset); +} #else #define btrfs_pwrite(fd, buf, count, offset, direct) \ ({ (void)(direct); pwrite(fd, buf, count, offset); }) +#define btrfs_pread(fd, buf, count, offset, direct) \ + ({ (void)(direct); pread(fd, buf, count, offset); }) #endif #endif diff --git a/kernel-shared/disk-io.c b/kernel-shared/disk-io.c index 1cda6f3a98af..740500f9fdc9 100644 --- a/kernel-shared/disk-io.c +++ b/kernel-shared/disk-io.c @@ -35,6 +35,7 @@ #include "kernel-shared/print-tree.h" #include "common/rbtree-utils.h" #include "common/device-scan.h" +#include "common/device-utils.h" #include "crypto/hash.h" /* specified errno for check_tree_block */ @@ -476,7 +477,8 @@ int read_extent_data(struct btrfs_fs_info *fs_info, char *data, u64 logical, goto err; } - ret = pread64(device->fd, data, *len, multi->stripes[0].physical); + ret = btrfs_pread(device->fd, data, *len, multi->stripes[0].physical, + fs_info->zoned); if (ret != *len) ret = -EIO; else diff --git a/kernel-shared/extent_io.c b/kernel-shared/extent_io.c index b5984949f431..af09ade4025f 100644 --- a/kernel-shared/extent_io.c +++ b/kernel-shared/extent_io.c @@ -793,7 +793,8 @@ int read_extent_from_disk(struct extent_buffer *eb, unsigned long offset, unsigned long len) { int ret; - ret = pread(eb->fd, eb->data + offset, len, eb->dev_bytenr); + ret = btrfs_pread(eb->fd, eb->data + offset, len, eb->dev_bytenr, + eb->fs_info->zoned); if (ret < 0) { ret = -errno; goto out; @@ -850,8 +851,8 @@ int read_data_from_disk(struct btrfs_fs_info *info, void *buf, u64 offset, return -EIO; } - ret = pread(device->fd, buf + total_read, read_len, - multi->stripes[0].physical); + ret = btrfs_pread(device->fd, buf + total_read, read_len, + multi->stripes[0].physical, info->zoned); kfree(multi); if (ret < 0) { fprintf(stderr, "Error reading %llu, %d\n", offset, diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c index c2cce3b5f366..f5d2299fc744 100644 --- a/kernel-shared/zoned.c +++ b/kernel-shared/zoned.c @@ -593,7 +593,7 @@ size_t btrfs_sb_io(int fd, void *buf, off_t offset, int rw) return ret; if (rw == READ) - ret_sz = pread64(fd, buf, count, mapped); + ret_sz = btrfs_pread(fd, buf, count, mapped, true); else ret_sz = btrfs_pwrite(fd, buf, count, mapped, true); From patchwork Mon Sep 27 04:15:53 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12519061 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C844DC433FE for ; Mon, 27 Sep 2021 04:17:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A38A16113D for ; Mon, 27 Sep 2021 04:17:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232362AbhI0ERs (ORCPT ); Mon, 27 Sep 2021 00:17:48 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:56793 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229674AbhI0ERr (ORCPT ); Mon, 27 Sep 2021 00:17:47 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1632716170; x=1664252170; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=rr/ZFxW8UdfuFR2wHDmbJsGZPy4mqyRLH6KaZZ0pgWg=; b=AycaKvBnvn7Kb/W7ru16sXXhuvVjmXkNX/RuexHWMbwPzO7erN1T5nOH NQ68oT577oV4+8g2SyIopJ8Wqmt+ixc99ZAAxVRQJXP7dbaz++36okA9/ GQ8tYPHciTIW2Rysly7+ZI3SqmcFBEyeLQkmIPtT1tLQSdHUyWXsrCR3B wgBIm6k8K8XOj3tbYWaCdw3QBfAfUafCEre1F1QH7jJSaZR3AlV0QHzUR PNGylJkkbHz7r0XswLhZtXX9oAmdR5niVYQZP0IMhUu+9dHTb16WN2SMO 4z8SBuY/iniQDV8Bch/lA+0KHQRKsv5v+CErYbAMpaAWIQshod2LS3FRJ w==; X-IronPort-AV: E=Sophos;i="5.85,325,1624291200"; d="scan'208";a="180095514" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 27 Sep 2021 12:16:09 +0800 IronPort-SDR: Kk9PdOx0sjpMUAm1RoX/uh/0t18Bo+r4z6eAm+S83Y+Ehva69pBclhQlh054kWI+TczTL8ewic Gr4fFwVrxV4SnK7eP3yuqgLBCEq/H9SRx4C6BXc76act106UdnCXSR9c5urgzgK2t5ta2Ok1JQ l5dph9VkBmyqCVngzXp62gddfW2tNFsMLRD0oqcm8wDXNxe/3pSdVfMWZ8IAmznS80KiVdDU/g hRzzkVOrTUIElwjvPC8RIvV6imzHzOO0YwzM1ZLnsnRDe4r1XXFuCxALYS6yptAAjqrAr/D9Rj qL2szY0MC82zIYeBTjjGehwW Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Sep 2021 20:50:43 -0700 IronPort-SDR: VDFjTnmXBsYC6cSZE33HFGiGaQ6nk7xhseWUbzSxacuctHqQSxdLxxlPmnpVwFbEJzhUXhtGB0 4YuPwSVbHYuPbaUS9Y+4a6lwFwMLBBjfuULBVte4CZBAuoLzBHx1ryhKNV7Cn6xhATzraeXn58 CHZmGwr8R2w5Mb91DlJZgRJJrSsBC7GuYmdEi/Q90a+eZmk9iydqzEpXeiNVHJGQ9/mXCg6r5S 7qf1+dlBRirI2FK5IS4bJWZQuB2OSnX5CfdA2kLrY8TMS4sXnLMoS5A9ICawlMgLhpBBDNX7CU Gc8= WDCIronportException: Internal Received: from 1r3v0f3.ad.shared (HELO naota-xeon.wdc.com) ([10.225.49.32]) by uls-op-cesaip02.wdc.com with ESMTP; 26 Sep 2021 21:16:09 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org Cc: David Sterba , Naohiro Aota Subject: [PATCH 4/5] btrfs-progs: temporally set zoned flag for initial tree reading Date: Mon, 27 Sep 2021 13:15:53 +0900 Message-Id: <20210927041554.325884-5-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20210927041554.325884-1-naohiro.aota@wdc.com> References: <20210927041554.325884-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Functions to read data/metadata e.g. read_extent_from_disk() now depend on the fs_info->zoned flag to determine if they do direct-IO or not. The flag (and zone_size) is not known before reading the chunk tree and it set to 0 while in the initial chunk tree setup process. That will cause btrfs_pread() to fail because it does not align the buffer. Use fcntl() to find out the file descriptor is opened with O_DIRECT or not, and if it is, set the zoned flag to 1 temporally for this initial process. Signed-off-by: Naohiro Aota Reviewed-by: Johannes Thumshirn --- kernel-shared/disk-io.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/kernel-shared/disk-io.c b/kernel-shared/disk-io.c index 740500f9fdc9..dd48599a5f1f 100644 --- a/kernel-shared/disk-io.c +++ b/kernel-shared/disk-io.c @@ -1302,10 +1302,22 @@ static struct btrfs_fs_info *__open_ctree_fd(int fp, struct open_ctree_flags *oc if (ret) goto out_devices; + /* + * fs_info->zone_size (and zoned) are not known before reading the + * chunk tree, so it's 0 at this point. But, fs_info->zoned == 0 + * will cause btrfs_pread() not to use an aligned bounce buffer, + * causing EINVAL when the file is opened with O_DIRECT. Temporally + * set zoned = 1 in that case. + */ + if (fcntl(fp, F_GETFL) & O_DIRECT) + fs_info->zoned = 1; + ret = btrfs_setup_chunk_tree_and_device_map(fs_info, ocf->chunk_tree_bytenr); if (ret) goto out_chunk; + fs_info->zoned = 0; + /* Chunk tree root is unable to read, return directly */ if (!fs_info->chunk_root) return fs_info; From patchwork Mon Sep 27 04:15:54 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12519065 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D5FFEC4332F for ; Mon, 27 Sep 2021 04:17:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BA05C610A2 for ; Mon, 27 Sep 2021 04:17:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232544AbhI0ERt (ORCPT ); Mon, 27 Sep 2021 00:17:49 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:56793 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232539AbhI0ERs (ORCPT ); Mon, 27 Sep 2021 00:17:48 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1632716171; x=1664252171; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=2hv1JnP2LJNSkf2cXFkljyPQQ6LFpJDxTkeG2sNr+Z4=; b=ZsznW6szCREB5PNUnnzT5M4mwHX9EpCExqSmy04iqOa2JLTqcoj/Hi4G moyyfNzCWr1OhXeJAu6uv/ljYMkCNb8LaoyBCfYcVmgEYU3x6N+Tm9K8y 9NTfJPl8QhZI5w8hduC37E1Nbh2k8LqlsvHWpSKd2YAF9iWXL5ItIOt0T cwhsD0ffc9JOICERs4HENBMl0ocmCTk+Lx3/4ROnaHVgyUsaLHQutEZaJ CT3SWp8Pb7ICuTEc2JIjGxKaT5+M5t3UPnjIZgSmFLY+sTWA2ntOJ4sJ8 IIkMQOJHH5xaW+a78eCV/KA95KUqnl1f2s6ylZAp1M109u20cc+vC0zxN Q==; X-IronPort-AV: E=Sophos;i="5.85,325,1624291200"; d="scan'208";a="180095518" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 27 Sep 2021 12:16:11 +0800 IronPort-SDR: a/kq3dxlyo69gO+3puga2Y77gwm3u02cwls1tqnP0j3vYFxUP6M8WPp74Dnw7IsbR24bxMhyL4 IH3dKLD/DGzJ3emwDnOyWzRvSkhRjH8i6uyl1khIQn5eLyAGv6YBjzpzzuBsgq+8cBSNrej2d2 Uewn0fjQwEsXkDLb1wD4aXnv3gqaSxuSiBWQfYE1522WwYeIhOQ1pSfF1c9uJck/Q6iElyeqE9 Hfmal7u9bnrKb9X0XnCrb8ETmSnG3lijhQqanST4OKR//FfjNQ2r9wyA3cwMTIBgc3LxY0hX9k 2IyNptbsP3Kj8KsWY3U3/AqH Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Sep 2021 20:50:45 -0700 IronPort-SDR: 8bcp2gf3Qs6aDTc32Gi+IlKF8ld6OIvq2t7Ylp9QPDeC28kuIlfp6giLga24PzzNNtWbELwZko pDtq9nyo5FW9iAYhgv0r4N0rHthKvqrAXoHSwUicmErDrKbifKaFYaYQNu7Bp1XiiICuXAqFyu KL5eEqikC+W6l1UMV1yISwFt2+Z7btaXui49PNOca05VrURX0KalLBupGtqCxXjbX3aUN+Aug6 o7AQCLRUT52e8VE1kf1pARrpO4QLgJyLEhKJ/f8BfOQVmFaVTiSF0XQLbDOXA0hxzLpXOWz253 bCI= WDCIronportException: Internal Received: from 1r3v0f3.ad.shared (HELO naota-xeon.wdc.com) ([10.225.49.32]) by uls-op-cesaip02.wdc.com with ESMTP; 26 Sep 2021 21:16:11 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org Cc: David Sterba , Naohiro Aota Subject: [PATCH 5/5] btrfs-progs: use direct-IO for zoned device Date: Mon, 27 Sep 2021 13:15:54 +0900 Message-Id: <20210927041554.325884-6-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20210927041554.325884-1-naohiro.aota@wdc.com> References: <20210927041554.325884-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org We need to use direct-IO for zoned devices to preserve the write ordering. Instead of detecting if the device is zoned or not, we simply use direct-IO for any kind of device (even if emulated zoned mode on a regular device). Signed-off-by: Naohiro Aota --- kernel-shared/disk-io.c | 3 +++ kernel-shared/volumes.c | 4 ++++ mkfs/main.c | 7 ++++++- 3 files changed, 13 insertions(+), 1 deletion(-) diff --git a/kernel-shared/disk-io.c b/kernel-shared/disk-io.c index dd48599a5f1f..aabeba7821ed 100644 --- a/kernel-shared/disk-io.c +++ b/kernel-shared/disk-io.c @@ -1382,6 +1382,9 @@ struct btrfs_fs_info *open_ctree_fs_info(struct open_ctree_flags *ocf) if (!(ocf->flags & OPEN_CTREE_WRITES)) oflags = O_RDONLY; + if ((oflags & O_RDWR) && zoned_model(ocf->filename) == ZONED_HOST_MANAGED) + oflags |= O_DIRECT; + fp = open(ocf->filename, oflags); if (fp < 0) { error("cannot open '%s': %m", ocf->filename); diff --git a/kernel-shared/volumes.c b/kernel-shared/volumes.c index b2a6b04f8e3d..ff4bd0723dbb 100644 --- a/kernel-shared/volumes.c +++ b/kernel-shared/volumes.c @@ -455,6 +455,10 @@ int btrfs_open_devices(struct btrfs_fs_info *fs_info, continue; } + if ((flags & O_RDWR) && + zoned_model(device->name) == ZONED_HOST_MANAGED) + flags |= O_DIRECT; + fd = open(device->name, flags); if (fd < 0) { ret = -errno; diff --git a/mkfs/main.c b/mkfs/main.c index b925c572b2b3..01187763a90c 100644 --- a/mkfs/main.c +++ b/mkfs/main.c @@ -894,6 +894,7 @@ int BOX_MAIN(mkfs)(int argc, char **argv) int ssd = 0; int zoned = 0; int force_overwrite = 0; + int oflags; char *source_dir = NULL; bool source_dir_set = false; bool shrink_rootdir = false; @@ -1310,12 +1311,16 @@ int BOX_MAIN(mkfs)(int argc, char **argv) dev_cnt--; + oflags = O_RDWR; + if (zoned && zoned_model(file) == ZONED_HOST_MANAGED) + oflags |= O_DIRECT; + /* * Open without O_EXCL so that the problem should not occur by the * following operation in kernel: * (btrfs_register_one_device() fails if O_EXCL is on) */ - fd = open(file, O_RDWR); + fd = open(file, oflags); if (fd < 0) { error("unable to open %s: %m", file); goto error;