From patchwork Thu Apr 27 17:23:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sam Li X-Patchwork-Id: 13225638 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 2FB47C77B61 for ; Thu, 27 Apr 2023 17:24:54 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ps5LZ-0007Lv-61; Thu, 27 Apr 2023 13:23:57 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ps5LX-0007Kb-Bl; Thu, 27 Apr 2023 13:23:55 -0400 Received: from mail-pj1-x1034.google.com ([2607:f8b0:4864:20::1034]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1ps5LV-0007kZ-3D; Thu, 27 Apr 2023 13:23:55 -0400 Received: by mail-pj1-x1034.google.com with SMTP id 98e67ed59e1d1-2472740a0dbso7563189a91.3; Thu, 27 Apr 2023 10:23:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1682616231; x=1685208231; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=4ljvlsKI1p6VTuA2hxZcsJbTtcg7fLYzAzm1UAvy0oo=; b=bDx26VmUpQ084qA+HIJxHQTvlP+eQGMoVnjEejymhLj5TJnewHVsOixhS2w7AVOd/u cgqCjjTtsnJ/ngV6xufK0UmA9nAkGv4BbzRrcDBRiOVR3zxWxvSm8bgy0MBoSn/v9LkH yNqeb2t8dwsRnrrF5GU8Ct1/p51etOzgKe094y3Dt8mPDgkJ9k/rwBzzMX1A+yUtpcNn Sv1dx3m+OatnD6Tz6pNraGnn/s7n/IKZYEohNZ5HkILhcZb8YFEti2KANGMgD6tYK1xD Ds1Qy3zSbaJ8ynOXfHBUnJYmBcezQ34qDM3TxuCdpRXMvHDVy/pEtC2hvA9RHF+MyheI MKYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682616231; x=1685208231; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4ljvlsKI1p6VTuA2hxZcsJbTtcg7fLYzAzm1UAvy0oo=; b=PTdRDUb43bTvYoUalae9+0RACwEDXnwDEzruhjamgTADYVTQQgUc+dFkdsapzeWhb7 vAGQADfnuorRwbKL1Yalmcm1skN9xssSR1SR9KglvRzMTle5A5EjREbGh9XKxyu3sSE+ iGvmNeEpAWzOWSQj6jJWaqRutqxe+r0xEeBdd4eKSuxns+zYceAuGd2lyTr16C0ePtXn M0okZG+nq5Nm105QR7ImqVCfWXxC9GI7XDJ9U3i0KAU0CkLPo7t2qbDAVlqd+x7SGTAr IVR0eEmM4xQZ8RLCH0Ag9uTi1WRtowpKpYT29Sig0wiWqbs9dno3Q7Tqa4F2lExfLQzs O6cQ== X-Gm-Message-State: AC+VfDwire7prnfurlYosobSHKEeVfp2uOm1fRyep+UZhGFkJnRJs7l1 vgomdois/JsFdk9H45VCSuG6EXqysbyGz6jqrC8= X-Google-Smtp-Source: ACHHUZ4km2vPNzjsLc3zwXf232cQLJl8ldA5dyyfDZjwXFBJZEu7CMx787Bzw0PgV4o8s/im0N0Zbw== X-Received: by 2002:a17:90a:ac01:b0:237:40a5:7acf with SMTP id o1-20020a17090aac0100b0023740a57acfmr2544515pjq.33.1682616230535; Thu, 27 Apr 2023 10:23:50 -0700 (PDT) Received: from fedlinux.. ([106.84.128.101]) by smtp.gmail.com with ESMTPSA id x12-20020a65538c000000b0050bc03741ffsm11698712pgq.84.2023.04.27.10.23.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 Apr 2023 10:23:50 -0700 (PDT) From: Sam Li To: qemu-devel@nongnu.org Cc: dlemoal@kernel.org, dmitry.fomichev@wdc.com, Aarushi Mehta , qemu-block@nongnu.org, Kevin Wolf , Stefan Hajnoczi , Julia Suvorova , Hanna Reitz , hare@suse.de, Fam Zheng , Stefano Garzarella , Sam Li Subject: [PATCH v10 1/4] file-posix: add tracking of the zone write pointers Date: Fri, 28 Apr 2023 01:23:36 +0800 Message-Id: <20230427172339.3709-2-faithilikerun@gmail.com> X-Mailer: git-send-email 2.40.0 In-Reply-To: <20230427172339.3709-1-faithilikerun@gmail.com> References: <20230427172339.3709-1-faithilikerun@gmail.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::1034; envelope-from=faithilikerun@gmail.com; helo=mail-pj1-x1034.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Since Linux doesn't have a user API to issue zone append operations to zoned devices from user space, the file-posix driver is modified to add zone append emulation using regular writes. To do this, the file-posix driver tracks the wp location of all zones of the device. It uses an array of uint64_t. The most significant bit of each wp location indicates if the zone type is conventional zones. The zones wp can be changed due to the following operations issued: - zone reset: change the wp to the start offset of that zone - zone finish: change to the end location of that zone - write to a zone - zone append Signed-off-by: Sam Li --- block/file-posix.c | 177 ++++++++++++++++++++++++++++++- include/block/block-common.h | 14 +++ include/block/block_int-common.h | 5 + 3 files changed, 193 insertions(+), 3 deletions(-) diff --git a/block/file-posix.c b/block/file-posix.c index 701acddbca..c0c83c6631 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -1327,9 +1327,93 @@ static int hdev_get_max_segments(int fd, struct stat *st) } #if defined(CONFIG_BLKZONED) +/* + * If the reset_all flag is true, then the wps of zone whose state is + * not readonly or offline should be all reset to the start sector. + * Else, take the real wp of the device. + */ +static int get_zones_wp(BlockDriverState *bs, int fd, int64_t offset, + unsigned int nrz, bool reset_all) +{ + struct blk_zone *blkz; + size_t rep_size; + uint64_t sector = offset >> BDRV_SECTOR_BITS; + BlockZoneWps *wps = bs->wps; + unsigned int j = offset / bs->bl.zone_size; + unsigned int n = 0, i = 0; + int ret; + rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_zone); + g_autofree struct blk_zone_report *rep = NULL; + + rep = g_malloc(rep_size); + blkz = (struct blk_zone *)(rep + 1); + while (n < nrz) { + memset(rep, 0, rep_size); + rep->sector = sector; + rep->nr_zones = nrz - n; + + do { + ret = ioctl(fd, BLKREPORTZONE, rep); + } while (ret != 0 && errno == EINTR); + if (ret != 0) { + error_report("%d: ioctl BLKREPORTZONE at %" PRId64 " failed %d", + fd, offset, errno); + return -errno; + } + + if (!rep->nr_zones) { + break; + } + + for (i = 0; i < rep->nr_zones; ++i, ++n, ++j) { + /* + * The wp tracking cares only about sequential writes required and + * sequential write preferred zones so that the wp can advance to + * the right location. + * Use the most significant bit of the wp location to indicate the + * zone type: 0 for SWR/SWP zones and 1 for conventional zones. + */ + if (blkz[i].type == BLK_ZONE_TYPE_CONVENTIONAL) { + wps->wp[j] |= 1ULL << 63; + } else { + switch(blkz[i].cond) { + case BLK_ZONE_COND_FULL: + case BLK_ZONE_COND_READONLY: + /* Zone not writable */ + wps->wp[j] = (blkz[i].start + blkz[i].len) << BDRV_SECTOR_BITS; + break; + case BLK_ZONE_COND_OFFLINE: + /* Zone not writable nor readable */ + wps->wp[j] = (blkz[i].start) << BDRV_SECTOR_BITS; + break; + default: + if (reset_all) { + wps->wp[j] = blkz[i].start << BDRV_SECTOR_BITS; + } else { + wps->wp[j] = blkz[i].wp << BDRV_SECTOR_BITS; + } + break; + } + } + } + sector = blkz[i - 1].start + blkz[i - 1].len; + } + + return 0; +} + +static void update_zones_wp(BlockDriverState *bs, int fd, int64_t offset, + unsigned int nrz) +{ + if (get_zones_wp(bs, fd, offset, nrz, 0) < 0) { + error_report("update zone wp failed"); + } +} + static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st, Error **errp) { + BDRVRawState *s = bs->opaque; BlockZoneModel zoned; int ret; @@ -1380,6 +1464,23 @@ static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st, if (ret > 0) { bs->bl.max_append_sectors = ret >> BDRV_SECTOR_BITS; } + + ret = get_sysfs_long_val(st, "physical_block_size"); + if (ret >= 0) { + bs->bl.write_granularity = ret; + } + + /* The refresh_limits() function can be called multiple times. */ + g_free(bs->wps); + bs->wps = g_malloc(sizeof(BlockZoneWps) + + sizeof(int64_t) * bs->bl.nr_zones); + ret = get_zones_wp(bs, s->fd, 0, bs->bl.nr_zones, 0); + if (ret < 0) { + error_setg_errno(errp, -ret, "report wps failed"); + bs->wps = NULL; + return; + } + qemu_co_mutex_init(&bs->wps->colock); } #else /* !defined(CONFIG_BLKZONED) */ static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st, @@ -2351,9 +2452,15 @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset, { BDRVRawState *s = bs->opaque; RawPosixAIOData acb; + int ret; if (fd_open(bs) < 0) return -EIO; +#if defined(CONFIG_BLKZONED) + if (type & QEMU_AIO_WRITE && bs->wps) { + qemu_co_mutex_lock(&bs->wps->colock); + } +#endif /* * When using O_DIRECT, the request must be aligned to be able to use @@ -2366,12 +2473,15 @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset, #ifdef CONFIG_LINUX_IO_URING } else if (s->use_linux_io_uring) { assert(qiov->size == bytes); - return luring_co_submit(bs, s->fd, offset, qiov, type); + ret = luring_co_submit(bs, s->fd, offset, qiov, type); + goto out; #endif #ifdef CONFIG_LINUX_AIO } else if (s->use_linux_aio) { assert(qiov->size == bytes); - return laio_co_submit(s->fd, offset, qiov, type, s->aio_max_batch); + ret = laio_co_submit(s->fd, offset, qiov, type, + s->aio_max_batch); + goto out; #endif } @@ -2388,7 +2498,35 @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset, }; assert(qiov->size == bytes); - return raw_thread_pool_submit(handle_aiocb_rw, &acb); + ret = raw_thread_pool_submit(handle_aiocb_rw, &acb); + goto out; /* Avoid the compiler err of unused label */ + +out: +#if defined(CONFIG_BLKZONED) +{ + BlockZoneWps *wps = bs->wps; + if (ret == 0) { + if (type & QEMU_AIO_WRITE && wps && bs->bl.zone_size) { + uint64_t *wp = &wps->wp[offset / bs->bl.zone_size]; + if (!BDRV_ZT_IS_CONV(*wp)) { + /* Advance the wp if needed */ + if (offset + bytes > *wp) { + *wp = offset + bytes; + } + } + } + } else { + if (type & QEMU_AIO_WRITE) { + update_zones_wp(bs, s->fd, 0, 1); + } + } + + if (type & QEMU_AIO_WRITE && wps) { + qemu_co_mutex_unlock(&wps->colock); + } +} +#endif + return ret; } static int coroutine_fn raw_co_preadv(BlockDriverState *bs, int64_t offset, @@ -2491,6 +2629,9 @@ static void raw_close(BlockDriverState *bs) BDRVRawState *s = bs->opaque; if (s->fd >= 0) { +#if defined(CONFIG_BLKZONED) + g_free(bs->wps); +#endif qemu_close(s->fd); s->fd = -1; } @@ -3288,6 +3429,7 @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op, const char *op_name; unsigned long zo; int ret; + BlockZoneWps *wps = bs->wps; int64_t capacity = bs->total_sectors << BDRV_SECTOR_BITS; zone_size = bs->bl.zone_size; @@ -3305,6 +3447,14 @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op, return -EINVAL; } + uint32_t i = offset / bs->bl.zone_size; + uint32_t nrz = len / bs->bl.zone_size; + uint64_t *wp = &wps->wp[i]; + if (BDRV_ZT_IS_CONV(*wp) && len != capacity) { + error_report("zone mgmt operations are not allowed for conventional zones"); + return -EIO; + } + switch (op) { case BLK_ZO_OPEN: op_name = "BLKOPENZONE"; @@ -3342,7 +3492,28 @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op, len >> BDRV_SECTOR_BITS); ret = raw_thread_pool_submit(handle_aiocb_zone_mgmt, &acb); if (ret != 0) { + update_zones_wp(bs, s->fd, offset, i); + ret = -errno; error_report("ioctl %s failed %d", op_name, ret); + return ret; + } + + if (zo == BLKRESETZONE && len == capacity) { + ret = get_zones_wp(bs, s->fd, 0, bs->bl.nr_zones, 1); + if (ret < 0) { + error_report("reporting single wp failed"); + return ret; + } + } else if (zo == BLKRESETZONE) { + for (unsigned int j = 0; j < nrz; ++j) { + wp[j] = offset + j * zone_size; + } + } else if (zo == BLKFINISHZONE) { + for (unsigned int j = 0; j < nrz; ++j) { + /* The zoned device allows the last zone smaller that the + * zone size. */ + wp[j] = MIN(offset + (j + 1) * zone_size, offset + len); + } } return ret; diff --git a/include/block/block-common.h b/include/block/block-common.h index 1576fcf2ed..93196229ac 100644 --- a/include/block/block-common.h +++ b/include/block/block-common.h @@ -118,6 +118,14 @@ typedef struct BlockZoneDescriptor { BlockZoneState state; } BlockZoneDescriptor; +/* + * Track write pointers of a zone in bytes. + */ +typedef struct BlockZoneWps { + CoMutex colock; + uint64_t wp[]; +} BlockZoneWps; + typedef struct BlockDriverInfo { /* in bytes, 0 if irrelevant */ int cluster_size; @@ -240,6 +248,12 @@ typedef enum { #define BDRV_SECTOR_BITS 9 #define BDRV_SECTOR_SIZE (1ULL << BDRV_SECTOR_BITS) +/* + * Get the first most significant bit of wp. If it is zero, then + * the zone type is SWR. + */ +#define BDRV_ZT_IS_CONV(wp) (wp & (1ULL << 63)) + #define BDRV_REQUEST_MAX_SECTORS MIN_CONST(SIZE_MAX >> BDRV_SECTOR_BITS, \ INT_MAX >> BDRV_SECTOR_BITS) #define BDRV_REQUEST_MAX_BYTES (BDRV_REQUEST_MAX_SECTORS << BDRV_SECTOR_BITS) diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h index 3482cfa79e..d138a399c7 100644 --- a/include/block/block_int-common.h +++ b/include/block/block_int-common.h @@ -890,6 +890,8 @@ typedef struct BlockLimits { /* maximum number of active zones */ int64_t max_active_zones; + + int64_t write_granularity; } BlockLimits; typedef struct BdrvOpBlocker BdrvOpBlocker; @@ -1251,6 +1253,9 @@ struct BlockDriverState { CoMutex bsc_modify_lock; /* Always non-NULL, but must only be dereferenced under an RCU read guard */ BdrvBlockStatusCache *block_status_cache; + + /* array of write pointers' location of each zone in the zoned device. */ + BlockZoneWps *wps; }; struct BlockBackendRootState { From patchwork Thu Apr 27 17:23:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sam Li X-Patchwork-Id: 13225637 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B6290C77B73 for ; Thu, 27 Apr 2023 17:24:48 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ps5Lg-0007Tv-II; Thu, 27 Apr 2023 13:24:04 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ps5Ld-0007PU-RA; Thu, 27 Apr 2023 13:24:02 -0400 Received: from mail-pf1-x430.google.com ([2607:f8b0:4864:20::430]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1ps5LZ-0007lO-8m; Thu, 27 Apr 2023 13:23:59 -0400 Received: by mail-pf1-x430.google.com with SMTP id d2e1a72fcca58-63b7588005fso7288779b3a.0; Thu, 27 Apr 2023 10:23:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1682616235; x=1685208235; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=zyrkyhSX7mdMRQzCG5WEyK4xSiPeon1RyGFbywy7I7g=; b=J3SPkXEkkYqYlkuLF89BLnY2nSG8q71+5RLQcNwnNBZ9HgLyAcVAC+LtLtjbpIWglN sLlIRd/lUWGVmYNuQ7SS6wius211lfjlRVH54k2j/rPHR5S8I7tEpPSSTP8GtiNshfFz 1eUJ77+BKptSEVIA5PFPYcgdN1dj28p5RBeZhMOMxjmWrez6GKEe6A6PqAbUZzCTIoyq XKzR29yL9sfapjmVLvGNbEyY6NN0jf2rdzvojD0NJ2xJwPcmq1/ZryiXh3rw/nDpIhf9 XNxB/lbd2hK3+GsLC4pY6tpetHWuw3pOOGEXJtFT7XxQHLrreppwdxywVkt4HrGBZbQk T+6Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682616235; x=1685208235; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zyrkyhSX7mdMRQzCG5WEyK4xSiPeon1RyGFbywy7I7g=; b=QoBgkfs8j4TtioMwZs6ihX6A/RjBpBNPa8rLcCJ/C9cLxjLnGvry4T1tJNDqzylf1s 118FFWbtQl4RI3xT5gdCeWE7Xi2g8nAmIGlTOhvnPzSzsNF0cT/zjsqcDuy05vkWac+8 m7ljTKIQVnWL/a0okrkYqs6CuI4ZUKK8VVtThSYCXSRbylqc1eUaMpxaL906HIvBRI4E j+QrRyVBCI71NVRR9cPJgfI5rMMQNmdS7x4twpAjYU3SsEMmfxlteDwtQK1/96BYzdwH 23340VKHv+2nbsTvJY0/B7cXoWs66Vc06AGOzy7nV5Xtoud/yRqKZWUYRkUPox5dBFCl GHfA== X-Gm-Message-State: AC+VfDxkpKjRPri6rwsRFF8navrli9/jTAjMkcnVOzmNls+5K8o0Ft6E YYtTiE64Yn/RLdHneI3vOdhDxpv9hJESK4i7eLY= X-Google-Smtp-Source: ACHHUZ5tKD47WT/DtW+IBerPBRwpTaphdi83mPupzfXIEX6yNvPdhF0PpxSM6k0Jzoiv53TrszTzXg== X-Received: by 2002:a05:6a21:339f:b0:e9:5b0a:deff with SMTP id yy31-20020a056a21339f00b000e95b0adeffmr2991010pzb.22.1682616234534; Thu, 27 Apr 2023 10:23:54 -0700 (PDT) Received: from fedlinux.. ([106.84.128.101]) by smtp.gmail.com with ESMTPSA id x12-20020a65538c000000b0050bc03741ffsm11698712pgq.84.2023.04.27.10.23.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 Apr 2023 10:23:54 -0700 (PDT) From: Sam Li To: qemu-devel@nongnu.org Cc: dlemoal@kernel.org, dmitry.fomichev@wdc.com, Aarushi Mehta , qemu-block@nongnu.org, Kevin Wolf , Stefan Hajnoczi , Julia Suvorova , Hanna Reitz , hare@suse.de, Fam Zheng , Stefano Garzarella , Sam Li Subject: [PATCH v10 2/4] block: introduce zone append write for zoned devices Date: Fri, 28 Apr 2023 01:23:37 +0800 Message-Id: <20230427172339.3709-3-faithilikerun@gmail.com> X-Mailer: git-send-email 2.40.0 In-Reply-To: <20230427172339.3709-1-faithilikerun@gmail.com> References: <20230427172339.3709-1-faithilikerun@gmail.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::430; envelope-from=faithilikerun@gmail.com; helo=mail-pf1-x430.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org A zone append command is a write operation that specifies the first logical block of a zone as the write position. When writing to a zoned block device using zone append, the byte offset of the call may point at any position within the zone to which the data is being appended. Upon completion the device will respond with the position where the data has been written in the zone. Signed-off-by: Sam Li Reviewed-by: Dmitry Fomichev Reviewed-by: Stefan Hajnoczi --- block/block-backend.c | 61 +++++++++++++++++++++++++++++++ block/file-posix.c | 58 +++++++++++++++++++++++++---- block/io.c | 27 ++++++++++++++ block/io_uring.c | 4 ++ block/linux-aio.c | 3 ++ block/raw-format.c | 8 ++++ include/block/block-io.h | 4 ++ include/block/block_int-common.h | 3 ++ include/block/raw-aio.h | 4 +- include/sysemu/block-backend-io.h | 9 +++++ 10 files changed, 173 insertions(+), 8 deletions(-) diff --git a/block/block-backend.c b/block/block-backend.c index 67722eb46d..aa8657e5c8 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -1929,6 +1929,45 @@ BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op, return &acb->common; } +static void coroutine_fn blk_aio_zone_append_entry(void *opaque) +{ + BlkAioEmAIOCB *acb = opaque; + BlkRwCo *rwco = &acb->rwco; + + rwco->ret = blk_co_zone_append(rwco->blk, (int64_t *)(uintptr_t)acb->bytes, + rwco->iobuf, rwco->flags); + blk_aio_complete(acb); +} + +BlockAIOCB *blk_aio_zone_append(BlockBackend *blk, int64_t *offset, + QEMUIOVector *qiov, BdrvRequestFlags flags, + BlockCompletionFunc *cb, void *opaque) { + BlkAioEmAIOCB *acb; + Coroutine *co; + IO_CODE(); + + blk_inc_in_flight(blk); + acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque); + acb->rwco = (BlkRwCo) { + .blk = blk, + .ret = NOT_DONE, + .flags = flags, + .iobuf = qiov, + }; + acb->bytes = (int64_t)(uintptr_t)offset; + acb->has_returned = false; + + co = qemu_coroutine_create(blk_aio_zone_append_entry, acb); + aio_co_enter(blk_get_aio_context(blk), co); + acb->has_returned = true; + if (acb->rwco.ret != NOT_DONE) { + replay_bh_schedule_oneshot_event(blk_get_aio_context(blk), + blk_aio_complete_bh, acb); + } + + return &acb->common; +} + /* * Send a zone_report command. * offset is a byte offset from the start of the device. No alignment @@ -1982,6 +2021,28 @@ int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op, return ret; } +/* + * Send a zone_append command. + */ +int coroutine_fn blk_co_zone_append(BlockBackend *blk, int64_t *offset, + QEMUIOVector *qiov, BdrvRequestFlags flags) +{ + int ret; + IO_CODE(); + + blk_inc_in_flight(blk); + blk_wait_while_drained(blk); + GRAPH_RDLOCK_GUARD(); + if (!blk_is_available(blk)) { + blk_dec_in_flight(blk); + return -ENOMEDIUM; + } + + ret = bdrv_co_zone_append(blk_bs(blk), offset, qiov, flags); + blk_dec_in_flight(blk); + return ret; +} + void blk_drain(BlockBackend *blk) { BlockDriverState *bs = blk_bs(blk); diff --git a/block/file-posix.c b/block/file-posix.c index c0c83c6631..8fc7f73d2c 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -160,6 +160,7 @@ typedef struct BDRVRawState { bool has_write_zeroes:1; bool use_linux_aio:1; bool use_linux_io_uring:1; + int64_t *offset; /* offset of zone append operation */ int page_cache_inconsistent; /* errno from fdatasync failure */ bool has_fallocate; bool needs_alignment; @@ -1702,7 +1703,7 @@ static ssize_t handle_aiocb_rw_vector(RawPosixAIOData *aiocb) ssize_t len; len = RETRY_ON_EINTR( - (aiocb->aio_type & QEMU_AIO_WRITE) ? + (aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) ? qemu_pwritev(aiocb->aio_fildes, aiocb->io.iov, aiocb->io.niov, @@ -1731,7 +1732,7 @@ static ssize_t handle_aiocb_rw_linear(RawPosixAIOData *aiocb, char *buf) ssize_t len; while (offset < aiocb->aio_nbytes) { - if (aiocb->aio_type & QEMU_AIO_WRITE) { + if (aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) { len = pwrite(aiocb->aio_fildes, (const char *)buf + offset, aiocb->aio_nbytes - offset, @@ -1824,7 +1825,7 @@ static int handle_aiocb_rw(void *opaque) } nbytes = handle_aiocb_rw_linear(aiocb, buf); - if (!(aiocb->aio_type & QEMU_AIO_WRITE)) { + if (!(aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND))) { char *p = buf; size_t count = aiocb->aio_nbytes, copy; int i; @@ -2457,8 +2458,12 @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset, if (fd_open(bs) < 0) return -EIO; #if defined(CONFIG_BLKZONED) - if (type & QEMU_AIO_WRITE && bs->wps) { + if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) && bs->wps) { qemu_co_mutex_lock(&bs->wps->colock); + if (type & QEMU_AIO_ZONE_APPEND && bs->bl.zone_size) { + int index = offset / bs->bl.zone_size; + offset = bs->wps->wp[index]; + } } #endif @@ -2506,9 +2511,13 @@ out: { BlockZoneWps *wps = bs->wps; if (ret == 0) { - if (type & QEMU_AIO_WRITE && wps && bs->bl.zone_size) { + if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) + && wps && bs->bl.zone_size) { uint64_t *wp = &wps->wp[offset / bs->bl.zone_size]; if (!BDRV_ZT_IS_CONV(*wp)) { + if (type & QEMU_AIO_ZONE_APPEND) { + *s->offset = *wp; + } /* Advance the wp if needed */ if (offset + bytes > *wp) { *wp = offset + bytes; @@ -2516,12 +2525,12 @@ out: } } } else { - if (type & QEMU_AIO_WRITE) { + if (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) { update_zones_wp(bs, s->fd, 0, 1); } } - if (type & QEMU_AIO_WRITE && wps) { + if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) && wps) { qemu_co_mutex_unlock(&wps->colock); } } @@ -3520,6 +3529,40 @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op, } #endif +#if defined(CONFIG_BLKZONED) +static int coroutine_fn raw_co_zone_append(BlockDriverState *bs, + int64_t *offset, + QEMUIOVector *qiov, + BdrvRequestFlags flags) { + assert(flags == 0); + int64_t zone_size_mask = bs->bl.zone_size - 1; + int64_t iov_len = 0; + int64_t len = 0; + BDRVRawState *s = bs->opaque; + s->offset = offset; + + if (*offset & zone_size_mask) { + error_report("sector offset %" PRId64 " is not aligned to zone size " + "%" PRId32 "", *offset / 512, bs->bl.zone_size / 512); + return -EINVAL; + } + + int64_t wg = bs->bl.write_granularity; + int64_t wg_mask = wg - 1; + for (int i = 0; i < qiov->niov; i++) { + iov_len = qiov->iov[i].iov_len; + if (iov_len & wg_mask) { + error_report("len of IOVector[%d] %" PRId64 " is not aligned to " + "block size %" PRId64 "", i, iov_len, wg); + return -EINVAL; + } + len += iov_len; + } + + return raw_co_prw(bs, *offset, len, qiov, QEMU_AIO_ZONE_APPEND); +} +#endif + static coroutine_fn int raw_do_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes, bool blkdev) @@ -4281,6 +4324,7 @@ static BlockDriver bdrv_host_device = { /* zone management operations */ .bdrv_co_zone_report = raw_co_zone_report, .bdrv_co_zone_mgmt = raw_co_zone_mgmt, + .bdrv_co_zone_append = raw_co_zone_append, #endif }; diff --git a/block/io.c b/block/io.c index 74bab69b0f..20d1da8dc9 100644 --- a/block/io.c +++ b/block/io.c @@ -3156,6 +3156,33 @@ out: return co.ret; } +int coroutine_fn bdrv_co_zone_append(BlockDriverState *bs, int64_t *offset, + QEMUIOVector *qiov, + BdrvRequestFlags flags) +{ + int ret; + BlockDriver *drv = bs->drv; + CoroutineIOCompletion co = { + .coroutine = qemu_coroutine_self(), + }; + IO_CODE(); + + ret = bdrv_check_qiov_request(*offset, qiov->size, qiov, 0, NULL); + if (ret < 0) { + return ret; + } + + bdrv_inc_in_flight(bs); + if (!drv || !drv->bdrv_co_zone_append || bs->bl.zoned == BLK_Z_NONE) { + co.ret = -ENOTSUP; + goto out; + } + co.ret = drv->bdrv_co_zone_append(bs, offset, qiov, flags); +out: + bdrv_dec_in_flight(bs); + return co.ret; +} + void *qemu_blockalign(BlockDriverState *bs, size_t size) { IO_CODE(); diff --git a/block/io_uring.c b/block/io_uring.c index 989f9a99ed..82cab6a5bd 100644 --- a/block/io_uring.c +++ b/block/io_uring.c @@ -350,6 +350,10 @@ static int luring_do_submit(int fd, LuringAIOCB *luringcb, LuringState *s, io_uring_prep_writev(sqes, fd, luringcb->qiov->iov, luringcb->qiov->niov, offset); break; + case QEMU_AIO_ZONE_APPEND: + io_uring_prep_writev(sqes, fd, luringcb->qiov->iov, + luringcb->qiov->niov, offset); + break; case QEMU_AIO_READ: io_uring_prep_readv(sqes, fd, luringcb->qiov->iov, luringcb->qiov->niov, offset); diff --git a/block/linux-aio.c b/block/linux-aio.c index fc50cdd1bf..442c86209b 100644 --- a/block/linux-aio.c +++ b/block/linux-aio.c @@ -394,6 +394,9 @@ static int laio_do_submit(int fd, struct qemu_laiocb *laiocb, off_t offset, case QEMU_AIO_WRITE: io_prep_pwritev(iocbs, fd, qiov->iov, qiov->niov, offset); break; + case QEMU_AIO_ZONE_APPEND: + io_prep_pwritev(iocbs, fd, qiov->iov, qiov->niov, offset); + break; case QEMU_AIO_READ: io_prep_preadv(iocbs, fd, qiov->iov, qiov->niov, offset); break; diff --git a/block/raw-format.c b/block/raw-format.c index 1a1dce8da4..9816f1af80 100644 --- a/block/raw-format.c +++ b/block/raw-format.c @@ -332,6 +332,13 @@ raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op, return bdrv_co_zone_mgmt(bs->file->bs, op, offset, len); } +static int coroutine_fn GRAPH_RDLOCK +raw_co_zone_append(BlockDriverState *bs,int64_t *offset, QEMUIOVector *qiov, + BdrvRequestFlags flags) +{ + return bdrv_co_zone_append(bs->file->bs, offset, qiov, flags); +} + static int64_t coroutine_fn GRAPH_RDLOCK raw_co_getlength(BlockDriverState *bs) { @@ -637,6 +644,7 @@ BlockDriver bdrv_raw = { .bdrv_co_pdiscard = &raw_co_pdiscard, .bdrv_co_zone_report = &raw_co_zone_report, .bdrv_co_zone_mgmt = &raw_co_zone_mgmt, + .bdrv_co_zone_append = &raw_co_zone_append, .bdrv_co_block_status = &raw_co_block_status, .bdrv_co_copy_range_from = &raw_co_copy_range_from, .bdrv_co_copy_range_to = &raw_co_copy_range_to, diff --git a/include/block/block-io.h b/include/block/block-io.h index 58f415ab64..f44e524a1c 100644 --- a/include/block/block-io.h +++ b/include/block/block-io.h @@ -119,6 +119,10 @@ int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_report(BlockDriverState *bs, int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op, int64_t offset, int64_t len); +int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_append(BlockDriverState *bs, + int64_t *offset, + QEMUIOVector *qiov, + BdrvRequestFlags flags); bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs); int bdrv_block_status(BlockDriverState *bs, int64_t offset, diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h index d138a399c7..9a20ff1768 100644 --- a/include/block/block_int-common.h +++ b/include/block/block_int-common.h @@ -722,6 +722,9 @@ struct BlockDriver { BlockZoneDescriptor *zones); int coroutine_fn (*bdrv_co_zone_mgmt)(BlockDriverState *bs, BlockZoneOp op, int64_t offset, int64_t len); + int coroutine_fn (*bdrv_co_zone_append)(BlockDriverState *bs, + int64_t *offset, QEMUIOVector *qiov, + BdrvRequestFlags flags); /* removable device specific */ bool coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_is_inserted)( diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h index afb9bdf51b..0fe85ade77 100644 --- a/include/block/raw-aio.h +++ b/include/block/raw-aio.h @@ -30,6 +30,7 @@ #define QEMU_AIO_TRUNCATE 0x0080 #define QEMU_AIO_ZONE_REPORT 0x0100 #define QEMU_AIO_ZONE_MGMT 0x0200 +#define QEMU_AIO_ZONE_APPEND 0x0400 #define QEMU_AIO_TYPE_MASK \ (QEMU_AIO_READ | \ QEMU_AIO_WRITE | \ @@ -40,7 +41,8 @@ QEMU_AIO_COPY_RANGE | \ QEMU_AIO_TRUNCATE | \ QEMU_AIO_ZONE_REPORT | \ - QEMU_AIO_ZONE_MGMT) + QEMU_AIO_ZONE_MGMT | \ + QEMU_AIO_ZONE_APPEND) /* AIO flags */ #define QEMU_AIO_MISALIGNED 0x1000 diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-io.h index eb1c1ebfec..d62a7ee773 100644 --- a/include/sysemu/block-backend-io.h +++ b/include/sysemu/block-backend-io.h @@ -53,6 +53,9 @@ BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset, BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op, int64_t offset, int64_t len, BlockCompletionFunc *cb, void *opaque); +BlockAIOCB *blk_aio_zone_append(BlockBackend *blk, int64_t *offset, + QEMUIOVector *qiov, BdrvRequestFlags flags, + BlockCompletionFunc *cb, void *opaque); BlockAIOCB *blk_aio_pdiscard(BlockBackend *blk, int64_t offset, int64_t bytes, BlockCompletionFunc *cb, void *opaque); void blk_aio_cancel_async(BlockAIOCB *acb); @@ -208,6 +211,12 @@ int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op, int64_t offset, int64_t len); int co_wrapper_mixed blk_zone_mgmt(BlockBackend *blk, BlockZoneOp op, int64_t offset, int64_t len); +int coroutine_fn blk_co_zone_append(BlockBackend *blk, int64_t *offset, + QEMUIOVector *qiov, + BdrvRequestFlags flags); +int co_wrapper_mixed blk_zone_append(BlockBackend *blk, int64_t *offset, + QEMUIOVector *qiov, + BdrvRequestFlags flags); int co_wrapper_mixed blk_pdiscard(BlockBackend *blk, int64_t offset, int64_t bytes); From patchwork Thu Apr 27 17:23:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sam Li X-Patchwork-Id: 13225639 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B93ACC77B73 for ; Thu, 27 Apr 2023 17:25:02 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ps5M4-000062-RO; Thu, 27 Apr 2023 13:24:28 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ps5M2-0008Om-OR; Thu, 27 Apr 2023 13:24:26 -0400 Received: from mail-yb1-xb32.google.com ([2607:f8b0:4864:20::b32]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1ps5Lz-0007qM-7y; Thu, 27 Apr 2023 13:24:26 -0400 Received: by mail-yb1-xb32.google.com with SMTP id 3f1490d57ef6-b9a805fd0dcso1145413276.1; Thu, 27 Apr 2023 10:24:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1682616260; x=1685208260; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=KDeNqO7F4FiGzd0ELVM+fuyUs22M96ueXedU39c3dZc=; b=GydNz2e2jE86YduhEDl6jiQ+nM2oFVq85S9SRUjfqQbt3CWhS4Wk/XTG7xtQaUoYAz gT0ql59UqGTF9WBKEyFKzKLjyFClLuiCDtb6mPcIKSDyZTIzUDxrmxiIalwpeUGA19eg WMUQy2eOboi7JT1GIx/hXAUR5CPWG49qlW/SDvtsyOc83It2ke5HWDRZquU2jZ3vk+np fZ+fsdRYW3XcXRuFZbVhr4x2tsjOY/7fKWZcZvt1kFr1t1y22Vfl1d9+c+9njjxWD34N kAmwqi1DhdvqdQPOdhj8ZsdGDR3VYjSCVFQPn7ZrjmtJJh8BY1ePhYyhWX8YU0DKCjKQ TKdw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682616260; x=1685208260; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KDeNqO7F4FiGzd0ELVM+fuyUs22M96ueXedU39c3dZc=; b=V57MCJon9cSsKGLNaufYnQqdlccvk8FRMplWAI617wN56GlwpSnqWHOmPAlwFnCOfU 8oUqay0AwCGHeAJBa1K0PXUartABdsuKX/isYAibb4OKvbmedFA9XZylk520cxI9Yfcy kEby0+cTtKXXzEmIQmW3BmxlZSsXh8k0y4pW1Q+1p1hcivUZOmPbCFRKiqDWd26fq9TZ ZhLtwSHM6NIFAd4hX0RFhjjCGlQm4w/p+2eaGQPUS5tQZdL1m5sc8h+sadGhQzdqe35I JPPc6sXoAjgEM1qBMW9QeR52K9Gafy/KfjyS4OFP7arZgy2Hh/bFNXk/18EN+eSv1FdD qf8A== X-Gm-Message-State: AC+VfDxzChLoNwudDHkkGVwHmnV7eABkVgcE0KEIJz5DWzQk2rJewvr3 3qAtnvNCfVOo3JzaUhDSnUUpQUMyfny0pxLwz2s= X-Google-Smtp-Source: ACHHUZ5qCy/HDLNP45VCDO1+Wq4e0pw87DGdESxvarY0EV9JnZn0AU6jCUPfcbDNesz7i76XJLsvMQ== X-Received: by 2002:a17:90a:f3c9:b0:246:b60a:290b with SMTP id ha9-20020a17090af3c900b00246b60a290bmr2724510pjb.21.1682616238608; Thu, 27 Apr 2023 10:23:58 -0700 (PDT) Received: from fedlinux.. ([106.84.128.101]) by smtp.gmail.com with ESMTPSA id x12-20020a65538c000000b0050bc03741ffsm11698712pgq.84.2023.04.27.10.23.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 Apr 2023 10:23:58 -0700 (PDT) From: Sam Li To: qemu-devel@nongnu.org Cc: dlemoal@kernel.org, dmitry.fomichev@wdc.com, Aarushi Mehta , qemu-block@nongnu.org, Kevin Wolf , Stefan Hajnoczi , Julia Suvorova , Hanna Reitz , hare@suse.de, Fam Zheng , Stefano Garzarella , Sam Li Subject: [PATCH v10 3/4] qemu-iotests: test zone append operation Date: Fri, 28 Apr 2023 01:23:38 +0800 Message-Id: <20230427172339.3709-4-faithilikerun@gmail.com> X-Mailer: git-send-email 2.40.0 In-Reply-To: <20230427172339.3709-1-faithilikerun@gmail.com> References: <20230427172339.3709-1-faithilikerun@gmail.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::b32; envelope-from=faithilikerun@gmail.com; helo=mail-yb1-xb32.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org The patch tests zone append writes by reporting the zone wp after the completion of the call. "zap -p" option can print the sector offset value after completion, which should be the start sector where the append write begins. Signed-off-by: Sam Li Reviewed-by: Stefan Hajnoczi --- qemu-io-cmds.c | 75 ++++++++++++++++++++++++++++++ tests/qemu-iotests/tests/zoned | 16 +++++++ tests/qemu-iotests/tests/zoned.out | 16 +++++++ 3 files changed, 107 insertions(+) diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c index f35ea627d7..3f75d2f5a6 100644 --- a/qemu-io-cmds.c +++ b/qemu-io-cmds.c @@ -1874,6 +1874,80 @@ static const cmdinfo_t zone_reset_cmd = { .oneline = "reset a zone write pointer in zone block device", }; +static int do_aio_zone_append(BlockBackend *blk, QEMUIOVector *qiov, + int64_t *offset, int flags, int *total) +{ + int async_ret = NOT_DONE; + + blk_aio_zone_append(blk, offset, qiov, flags, aio_rw_done, &async_ret); + while (async_ret == NOT_DONE) { + main_loop_wait(false); + } + + *total = qiov->size; + return async_ret < 0 ? async_ret : 1; +} + +static int zone_append_f(BlockBackend *blk, int argc, char **argv) +{ + int ret; + bool pflag = false; + int flags = 0; + int total = 0; + int64_t offset; + char *buf; + int c, nr_iov; + int pattern = 0xcd; + QEMUIOVector qiov; + + if (optind > argc - 3) { + return -EINVAL; + } + + if ((c = getopt(argc, argv, "p")) != -1) { + pflag = true; + } + + offset = cvtnum(argv[optind]); + if (offset < 0) { + print_cvtnum_err(offset, argv[optind]); + return offset; + } + optind++; + nr_iov = argc - optind; + buf = create_iovec(blk, &qiov, &argv[optind], nr_iov, pattern, + flags & BDRV_REQ_REGISTERED_BUF); + if (buf == NULL) { + return -EINVAL; + } + ret = do_aio_zone_append(blk, &qiov, &offset, flags, &total); + if (ret < 0) { + printf("zone append failed: %s\n", strerror(-ret)); + goto out; + } + + if (pflag) { + printf("After zap done, the append sector is 0x%" PRIx64 "\n", + tosector(offset)); + } + +out: + qemu_io_free(blk, buf, qiov.size, + flags & BDRV_REQ_REGISTERED_BUF); + qemu_iovec_destroy(&qiov); + return ret; +} + +static const cmdinfo_t zone_append_cmd = { + .name = "zone_append", + .altname = "zap", + .cfunc = zone_append_f, + .argmin = 3, + .argmax = 4, + .args = "offset len [len..]", + .oneline = "append write a number of bytes at a specified offset", +}; + static int truncate_f(BlockBackend *blk, int argc, char **argv); static const cmdinfo_t truncate_cmd = { .name = "truncate", @@ -2672,6 +2746,7 @@ static void __attribute((constructor)) init_qemuio_commands(void) qemuio_add_command(&zone_close_cmd); qemuio_add_command(&zone_finish_cmd); qemuio_add_command(&zone_reset_cmd); + qemuio_add_command(&zone_append_cmd); qemuio_add_command(&truncate_cmd); qemuio_add_command(&length_cmd); qemuio_add_command(&info_cmd); diff --git a/tests/qemu-iotests/tests/zoned b/tests/qemu-iotests/tests/zoned index 56f60616b5..3d23ce9cc1 100755 --- a/tests/qemu-iotests/tests/zoned +++ b/tests/qemu-iotests/tests/zoned @@ -82,6 +82,22 @@ echo "(5) resetting the second zone" $QEMU_IO $IMG -c "zrs 268435456 268435456" echo "After resetting a zone:" $QEMU_IO $IMG -c "zrp 268435456 1" +echo +echo +echo "(6) append write" # the physical block size of the device is 4096 +$QEMU_IO $IMG -c "zrp 0 1" +$QEMU_IO $IMG -c "zap -p 0 0x1000 0x2000" +echo "After appending the first zone firstly:" +$QEMU_IO $IMG -c "zrp 0 1" +$QEMU_IO $IMG -c "zap -p 0 0x1000 0x2000" +echo "After appending the first zone secondly:" +$QEMU_IO $IMG -c "zrp 0 1" +$QEMU_IO $IMG -c "zap -p 268435456 0x1000 0x2000" +echo "After appending the second zone firstly:" +$QEMU_IO $IMG -c "zrp 268435456 1" +$QEMU_IO $IMG -c "zap -p 268435456 0x1000 0x2000" +echo "After appending the second zone secondly:" +$QEMU_IO $IMG -c "zrp 268435456 1" # success, all done echo "*** done" diff --git a/tests/qemu-iotests/tests/zoned.out b/tests/qemu-iotests/tests/zoned.out index b2d061da49..fe53ba4744 100644 --- a/tests/qemu-iotests/tests/zoned.out +++ b/tests/qemu-iotests/tests/zoned.out @@ -50,4 +50,20 @@ start: 0x80000, len 0x80000, cap 0x80000, wptr 0x100000, zcond:14, [type: 2] (5) resetting the second zone After resetting a zone: start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80000, zcond:1, [type: 2] + + +(6) append write +start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:1, [type: 2] +After zap done, the append sector is 0x0 +After appending the first zone firstly: +start: 0x0, len 0x80000, cap 0x80000, wptr 0x18, zcond:2, [type: 2] +After zap done, the append sector is 0x18 +After appending the first zone secondly: +start: 0x0, len 0x80000, cap 0x80000, wptr 0x30, zcond:2, [type: 2] +After zap done, the append sector is 0x80000 +After appending the second zone firstly: +start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80018, zcond:2, [type: 2] +After zap done, the append sector is 0x80018 +After appending the second zone secondly: +start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80030, zcond:2, [type: 2] *** done From patchwork Thu Apr 27 17:23:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sam Li X-Patchwork-Id: 13225636 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 4CBF3C77B73 for ; Thu, 27 Apr 2023 17:24:30 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ps5Lk-0007W5-Kn; Thu, 27 Apr 2023 13:24:08 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ps5Li-0007Ux-S9; Thu, 27 Apr 2023 13:24:06 -0400 Received: from mail-pf1-x42a.google.com ([2607:f8b0:4864:20::42a]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1ps5Lh-0007nA-Ae; Thu, 27 Apr 2023 13:24:06 -0400 Received: by mail-pf1-x42a.google.com with SMTP id d2e1a72fcca58-63b60365f53so10566086b3a.0; Thu, 27 Apr 2023 10:24:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1682616243; x=1685208243; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=rMm7qHxD1ENfDuHfN3u0ZIVnhWuSae3qlQpW2+gKwzo=; b=dhaM5Dz9c+kFH0iz+xyoPLvs0AFzZko354AAFcjRiDnD8wREQSEZf5Lh/Q76Q4xgdG BFZpD67GcGaAt+ca+8Rgmx+R96oO0Se6AjglboZG63hDhTSvCqcD2CHkSae1eBseeSYt W2p/hRZSz2hxBm52rD04+/JhlkZB0dlFSj9g7IwvkzPX2wA8wTr+bAV9Vp19YkNYwVcE NVY4uzfDcnkprCttRftFBDcWoKxGnKbNaTFy4bl7LbuShBc5jGHV+W79TUyXNq+LceOd 35hYxPRFOUaNe0NmnT+jsoujGAjcWs1c3WhA0VFgxKsTeW4r2lnFWI178sL4DfeG6yJB 4Ziw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682616243; x=1685208243; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rMm7qHxD1ENfDuHfN3u0ZIVnhWuSae3qlQpW2+gKwzo=; b=S8b2tZ/IXBuy17utWeKkm2bYjxxaz943kBsqGvuzbJB63NAQ8bQEYsaJDwQKPMuKj+ 0X95o/eg/C/hCbJCjxEdDb/y6N+65y5d711u2A6Z/BjI/Txvgg0TuTQ01Lb85Zg65+eq IxAQdho3M/J4ti4vabpk5JMd0KZeekA6PEFTF86TFreB+TVyHwA11NLQ2AlbqzPDqWrS r4bEaTU2bTiLGhGj9tAUNJtrpT7AZSVEEO1RLG7cPMGkOqiei8yaXhiou9nItihOAO6E Jkc9vd9nK+aJWnNPzk06BWIXc1Gba1swxI1YlXP+3WFNSGTOtkM/xXpgmp5LGEISLWqb A87w== X-Gm-Message-State: AC+VfDxVlFiXIdiFGQ+10G09Mk+ctJHC/RadlrrXuKeArAAGBLoDGWpH 76nWwP87IT2Ue8zZXU6urwitnkSt1wXlYZtSNiM= X-Google-Smtp-Source: ACHHUZ66mpu/JQDxC0Eg6srU06aaaBWNoClGy9dNu56CDoH7fC4KPt+3HMPNWgGvQmZ9yA5hMnuAVg== X-Received: by 2002:a05:6a21:33a7:b0:ef:1457:6cdf with SMTP id yy39-20020a056a2133a700b000ef14576cdfmr3066931pzb.19.1682616242764; Thu, 27 Apr 2023 10:24:02 -0700 (PDT) Received: from fedlinux.. ([106.84.128.101]) by smtp.gmail.com with ESMTPSA id x12-20020a65538c000000b0050bc03741ffsm11698712pgq.84.2023.04.27.10.23.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 Apr 2023 10:24:02 -0700 (PDT) From: Sam Li To: qemu-devel@nongnu.org Cc: dlemoal@kernel.org, dmitry.fomichev@wdc.com, Aarushi Mehta , qemu-block@nongnu.org, Kevin Wolf , Stefan Hajnoczi , Julia Suvorova , Hanna Reitz , hare@suse.de, Fam Zheng , Stefano Garzarella , Sam Li Subject: [PATCH v10 4/4] block: add some trace events for zone append Date: Fri, 28 Apr 2023 01:23:39 +0800 Message-Id: <20230427172339.3709-5-faithilikerun@gmail.com> X-Mailer: git-send-email 2.40.0 In-Reply-To: <20230427172339.3709-1-faithilikerun@gmail.com> References: <20230427172339.3709-1-faithilikerun@gmail.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::42a; envelope-from=faithilikerun@gmail.com; helo=mail-pf1-x42a.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Signed-off-by: Sam Li Reviewed-by: Dmitry Fomichev Reviewed-by: Stefan Hajnoczi --- block/file-posix.c | 3 +++ block/trace-events | 2 ++ 2 files changed, 5 insertions(+) diff --git a/block/file-posix.c b/block/file-posix.c index 8fc7f73d2c..5f1745ede8 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -2517,6 +2517,8 @@ out: if (!BDRV_ZT_IS_CONV(*wp)) { if (type & QEMU_AIO_ZONE_APPEND) { *s->offset = *wp; + trace_zbd_zone_append_complete(bs, *s->offset + >> BDRV_SECTOR_BITS); } /* Advance the wp if needed */ if (offset + bytes > *wp) { @@ -3559,6 +3561,7 @@ static int coroutine_fn raw_co_zone_append(BlockDriverState *bs, len += iov_len; } + trace_zbd_zone_append(bs, *offset >> BDRV_SECTOR_BITS); return raw_co_prw(bs, *offset, len, qiov, QEMU_AIO_ZONE_APPEND); } #endif diff --git a/block/trace-events b/block/trace-events index 3f4e1d088a..32665158d6 100644 --- a/block/trace-events +++ b/block/trace-events @@ -211,6 +211,8 @@ file_hdev_is_sg(int type, int version) "SG device found: type=%d, version=%d" file_flush_fdatasync_failed(int err) "errno %d" zbd_zone_report(void *bs, unsigned int nr_zones, int64_t sector) "bs %p report %d zones starting at sector offset 0x%" PRIx64 "" zbd_zone_mgmt(void *bs, const char *op_name, int64_t sector, int64_t len) "bs %p %s starts at sector offset 0x%" PRIx64 " over a range of 0x%" PRIx64 " sectors" +zbd_zone_append(void *bs, int64_t sector) "bs %p append at sector offset 0x%" PRIx64 "" +zbd_zone_append_complete(void *bs, int64_t sector) "bs %p returns append sector 0x%" PRIx64 "" # ssh.c sftp_error(const char *op, const char *ssh_err, int ssh_err_code, int sftp_err_code) "%s failed: %s (libssh error code: %d, sftp error code: %d)"