From patchwork Thu Apr 20 12:09:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 13218651 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AD319C77B7C for ; Thu, 20 Apr 2023 12:10:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234563AbjDTMKv (ORCPT ); Thu, 20 Apr 2023 08:10:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60622 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234541AbjDTMKs (ORCPT ); Thu, 20 Apr 2023 08:10:48 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AC5371FE2 for ; Thu, 20 Apr 2023 05:09:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681992597; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1AiZxmz/CBIedH7uROXIH01annQLpYiu54dssrBE9oY=; b=GOaIB+Drnsxhj7nPp88ZSDec5PcwxmdReSz+PEHWH4MFGxmoP4p5QVFjFsh9y3+GHsZ5+/ CfeclWyza5x/9djXkIbWcuHncZboiH7Ug+BceDmrrShcb0myS+Pp3tYyzz67TqEICJvpvt dJBtTvarxh6LI0nEOjtriYZEu47WZPo= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-27-iOqRg9jFNQq_y3ttfm-9BA-1; Thu, 20 Apr 2023 08:09:54 -0400 X-MC-Unique: iOqRg9jFNQq_y3ttfm-9BA-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id EEAB8101A531; Thu, 20 Apr 2023 12:09:53 +0000 (UTC) Received: from localhost (unknown [10.39.193.254]) by smtp.corp.redhat.com (Postfix) with ESMTP id 22B191410F1C; Thu, 20 Apr 2023 12:09:52 +0000 (UTC) From: Stefan Hajnoczi To: qemu-devel@nongnu.org Cc: Raphael Norwitz , Kevin Wolf , Markus Armbruster , Julia Suvorova , Eric Blake , Paolo Bonzini , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , =?utf-8?q?Marc?= =?utf-8?q?-Andr=C3=A9_Lureau?= , "Michael S. Tsirkin" , Thomas Huth , qemu-block@nongnu.org, Cornelia Huck , "Dr. David Alan Gilbert" , =?utf-8?q?Daniel_P=2E_Berra?= =?utf-8?q?ng=C3=A9?= , Peter Maydell , Stefano Garzarella , kvm@vger.kernel.org, virtio-fs@redhat.com, Stefan Hajnoczi , Hanna Reitz , Fam Zheng , Aarushi Mehta , Sam Li , Damien Le Moal , Hannes Reinecke , Dmitry Fomichev Subject: [PULL 01/20] block/block-common: add zoned device structs Date: Thu, 20 Apr 2023 08:09:29 -0400 Message-Id: <20230420120948.436661-2-stefanha@redhat.com> In-Reply-To: <20230420120948.436661-1-stefanha@redhat.com> References: <20230420120948.436661-1-stefanha@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Sam Li Signed-off-by: Sam Li Reviewed-by: Stefan Hajnoczi Reviewed-by: Damien Le Moal Reviewed-by: Hannes Reinecke Reviewed-by: Dmitry Fomichev Acked-by: Kevin Wolf Message-id: 20230324090605.28361-2-faithilikerun@gmail.com [Adjust commit message prefix as suggested by Philippe Mathieu-Daudé . --Stefan] Signed-off-by: Stefan Hajnoczi --- include/block/block-common.h | 43 ++++++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+) diff --git a/include/block/block-common.h b/include/block/block-common.h index b5122ef8ab..1576fcf2ed 100644 --- a/include/block/block-common.h +++ b/include/block/block-common.h @@ -75,6 +75,49 @@ typedef struct BlockDriver BlockDriver; typedef struct BdrvChild BdrvChild; typedef struct BdrvChildClass BdrvChildClass; +typedef enum BlockZoneOp { + BLK_ZO_OPEN, + BLK_ZO_CLOSE, + BLK_ZO_FINISH, + BLK_ZO_RESET, +} BlockZoneOp; + +typedef enum BlockZoneModel { + BLK_Z_NONE = 0x0, /* Regular block device */ + BLK_Z_HM = 0x1, /* Host-managed zoned block device */ + BLK_Z_HA = 0x2, /* Host-aware zoned block device */ +} BlockZoneModel; + +typedef enum BlockZoneState { + BLK_ZS_NOT_WP = 0x0, + BLK_ZS_EMPTY = 0x1, + BLK_ZS_IOPEN = 0x2, + BLK_ZS_EOPEN = 0x3, + BLK_ZS_CLOSED = 0x4, + BLK_ZS_RDONLY = 0xD, + BLK_ZS_FULL = 0xE, + BLK_ZS_OFFLINE = 0xF, +} BlockZoneState; + +typedef enum BlockZoneType { + BLK_ZT_CONV = 0x1, /* Conventional random writes supported */ + BLK_ZT_SWR = 0x2, /* Sequential writes required */ + BLK_ZT_SWP = 0x3, /* Sequential writes preferred */ +} BlockZoneType; + +/* + * Zone descriptor data structure. + * Provides information on a zone with all position and size values in bytes. + */ +typedef struct BlockZoneDescriptor { + uint64_t start; + uint64_t length; + uint64_t cap; + uint64_t wp; + BlockZoneType type; + BlockZoneState state; +} BlockZoneDescriptor; + typedef struct BlockDriverInfo { /* in bytes, 0 if irrelevant */ int cluster_size; From patchwork Thu Apr 20 12:09:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 13218650 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF470C77B72 for ; Thu, 20 Apr 2023 12:10:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234558AbjDTMKs (ORCPT ); Thu, 20 Apr 2023 08:10:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60660 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234545AbjDTMKr (ORCPT ); Thu, 20 Apr 2023 08:10:47 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4C7254234 for ; Thu, 20 Apr 2023 05:10:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681992601; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gnXsxCjOlbLfHx8U9ekD+98ki5dPTH3RKv9KHzMjdMo=; b=cGZXx2cimDBWBCiXl9OO3Z21vogQU5qmZ+2YXWJtXv/CAQHskvp9xPgedJSbrFwf166azv cy6pDRSV4qDMffpS6gdr3cBx1j9RVMvL56ChJSXdniEOVrdNpzekeDr7GoC+9QUmn5kppa x/KN+foBY6mwgz9XgkFtVeK2EUmZLRs= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-104-GDhS_C3dPpe2GWXLnt97qw-1; Thu, 20 Apr 2023 08:09:57 -0400 X-MC-Unique: GDhS_C3dPpe2GWXLnt97qw-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id BE4958996FA; Thu, 20 Apr 2023 12:09:56 +0000 (UTC) Received: from localhost (unknown [10.39.193.254]) by smtp.corp.redhat.com (Postfix) with ESMTP id DF86D1121315; Thu, 20 Apr 2023 12:09:55 +0000 (UTC) From: Stefan Hajnoczi To: qemu-devel@nongnu.org Cc: Raphael Norwitz , Kevin Wolf , Markus Armbruster , Julia Suvorova , Eric Blake , Paolo Bonzini , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , =?utf-8?q?Marc?= =?utf-8?q?-Andr=C3=A9_Lureau?= , "Michael S. Tsirkin" , Thomas Huth , qemu-block@nongnu.org, Cornelia Huck , "Dr. David Alan Gilbert" , =?utf-8?q?Daniel_P=2E_Berra?= =?utf-8?q?ng=C3=A9?= , Peter Maydell , Stefano Garzarella , kvm@vger.kernel.org, virtio-fs@redhat.com, Stefan Hajnoczi , Hanna Reitz , Fam Zheng , Aarushi Mehta , Sam Li , Hannes Reinecke , Damien Le Moal , Dmitry Fomichev Subject: [PULL 02/20] block/file-posix: introduce helper functions for sysfs attributes Date: Thu, 20 Apr 2023 08:09:30 -0400 Message-Id: <20230420120948.436661-3-stefanha@redhat.com> In-Reply-To: <20230420120948.436661-1-stefanha@redhat.com> References: <20230420120948.436661-1-stefanha@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Sam Li Use get_sysfs_str_val() to get the string value of device zoned model. Then get_sysfs_zoned_model() can convert it to BlockZoneModel type of QEMU. Use get_sysfs_long_val() to get the long value of zoned device information. Signed-off-by: Sam Li Reviewed-by: Hannes Reinecke Reviewed-by: Stefan Hajnoczi Reviewed-by: Damien Le Moal Reviewed-by: Dmitry Fomichev Acked-by: Kevin Wolf Message-id: 20230324090605.28361-3-faithilikerun@gmail.com [Adjust commit message prefix as suggested by Philippe Mathieu-Daudé . --Stefan] Signed-off-by: Stefan Hajnoczi --- include/block/block_int-common.h | 3 + block/file-posix.c | 130 ++++++++++++++++++++++--------- 2 files changed, 95 insertions(+), 38 deletions(-) diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h index f01bb8b617..06dd89031f 100644 --- a/include/block/block_int-common.h +++ b/include/block/block_int-common.h @@ -861,6 +861,9 @@ typedef struct BlockLimits { * an explicit monitor command to load the disk inside the guest). */ bool has_variable_length; + + /* device zone model */ + BlockZoneModel zoned; } BlockLimits; typedef struct BdrvOpBlocker BdrvOpBlocker; diff --git a/block/file-posix.c b/block/file-posix.c index c2dee3f056..2697cb6699 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -1202,15 +1202,93 @@ static int hdev_get_max_hw_transfer(int fd, struct stat *st) #endif } -static int hdev_get_max_segments(int fd, struct stat *st) +/* + * Get a sysfs attribute value as character string. + */ +static int get_sysfs_str_val(struct stat *st, const char *attribute, + char **val) { +#ifdef CONFIG_LINUX + g_autofree char *sysfspath = NULL; + int ret; + size_t len; + + if (!S_ISBLK(st->st_mode)) { + return -ENOTSUP; + } + + sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/%s", + major(st->st_rdev), minor(st->st_rdev), + attribute); + ret = g_file_get_contents(sysfspath, val, &len, NULL); + if (ret == -1) { + return -ENOENT; + } + + /* The file is ended with '\n' */ + char *p; + p = *val; + if (*(p + len - 1) == '\n') { + *(p + len - 1) = '\0'; + } + return ret; +#else + return -ENOTSUP; +#endif +} + +static int get_sysfs_zoned_model(struct stat *st, BlockZoneModel *zoned) +{ + g_autofree char *val = NULL; + int ret; + + ret = get_sysfs_str_val(st, "zoned", &val); + if (ret < 0) { + return ret; + } + + if (strcmp(val, "host-managed") == 0) { + *zoned = BLK_Z_HM; + } else if (strcmp(val, "host-aware") == 0) { + *zoned = BLK_Z_HA; + } else if (strcmp(val, "none") == 0) { + *zoned = BLK_Z_NONE; + } else { + return -ENOTSUP; + } + return 0; +} + +/* + * Get a sysfs attribute value as a long integer. + */ +static long get_sysfs_long_val(struct stat *st, const char *attribute) { #ifdef CONFIG_LINUX - char buf[32]; + g_autofree char *str = NULL; const char *end; - char *sysfspath = NULL; + long val; + int ret; + + ret = get_sysfs_str_val(st, attribute, &str); + if (ret < 0) { + return ret; + } + + /* The file is ended with '\n', pass 'end' to accept that. */ + ret = qemu_strtol(str, &end, 10, &val); + if (ret == 0 && end && *end == '\0') { + ret = val; + } + return ret; +#else + return -ENOTSUP; +#endif +} + +static int hdev_get_max_segments(int fd, struct stat *st) +{ +#ifdef CONFIG_LINUX int ret; - int sysfd = -1; - long max_segments; if (S_ISCHR(st->st_mode)) { if (ioctl(fd, SG_GET_SG_TABLESIZE, &ret) == 0) { @@ -1218,39 +1296,7 @@ static int hdev_get_max_segments(int fd, struct stat *st) } return -ENOTSUP; } - - if (!S_ISBLK(st->st_mode)) { - return -ENOTSUP; - } - - sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/max_segments", - major(st->st_rdev), minor(st->st_rdev)); - sysfd = open(sysfspath, O_RDONLY); - if (sysfd == -1) { - ret = -errno; - goto out; - } - ret = RETRY_ON_EINTR(read(sysfd, buf, sizeof(buf) - 1)); - if (ret < 0) { - ret = -errno; - goto out; - } else if (ret == 0) { - ret = -EIO; - goto out; - } - buf[ret] = 0; - /* The file is ended with '\n', pass 'end' to accept that. */ - ret = qemu_strtol(buf, &end, 10, &max_segments); - if (ret == 0 && end && *end == '\n') { - ret = max_segments; - } - -out: - if (sysfd != -1) { - close(sysfd); - } - g_free(sysfspath); - return ret; + return get_sysfs_long_val(st, "max_segments"); #else return -ENOTSUP; #endif @@ -1260,6 +1306,8 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp) { BDRVRawState *s = bs->opaque; struct stat st; + int ret; + BlockZoneModel zoned; s->needs_alignment = raw_needs_alignment(bs); raw_probe_alignment(bs, s->fd, errp); @@ -1297,6 +1345,12 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp) bs->bl.max_hw_iov = ret; } } + + ret = get_sysfs_zoned_model(&st, &zoned); + if (ret < 0) { + zoned = BLK_Z_NONE; + } + bs->bl.zoned = zoned; } static int check_for_dasd(int fd) From patchwork Thu Apr 20 12:09:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 13218652 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B586EC77B73 for ; Thu, 20 Apr 2023 12:10:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234584AbjDTMK5 (ORCPT ); Thu, 20 Apr 2023 08:10:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60672 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234579AbjDTMKz (ORCPT ); Thu, 20 Apr 2023 08:10:55 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 690AA468F for ; Thu, 20 Apr 2023 05:10:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681992603; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UasNMmv+QQG6Cg+2/jqDxBRY8IjSm7jPHc6tYGOD5CU=; b=d+dBA0r2DuFTT9uF9ztCcRLzu1oRx5q8Wex1Xi6WgPmZM9UxPsF2v2JewDuqClh3KFpM8f YG+Ey7Q0MaIJrOUFlbeWfB/7zpCkXM+2ceWZ3ebB1GalNkNLv01ldW/So886x7VZPZFVEt JGet/e4d5FynSURoyMitdtfBl2cibUk= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-586-_FEb-aCHNpy_Owy85JzgWQ-1; Thu, 20 Apr 2023 08:10:00 -0400 X-MC-Unique: _FEb-aCHNpy_Owy85JzgWQ-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id CCEC585A588; Thu, 20 Apr 2023 12:09:59 +0000 (UTC) Received: from localhost (unknown [10.39.193.254]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8618E440BC; Thu, 20 Apr 2023 12:09:58 +0000 (UTC) From: Stefan Hajnoczi To: qemu-devel@nongnu.org Cc: Raphael Norwitz , Kevin Wolf , Markus Armbruster , Julia Suvorova , Eric Blake , Paolo Bonzini , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , =?utf-8?q?Marc?= =?utf-8?q?-Andr=C3=A9_Lureau?= , "Michael S. Tsirkin" , Thomas Huth , qemu-block@nongnu.org, Cornelia Huck , "Dr. David Alan Gilbert" , =?utf-8?q?Daniel_P=2E_Berra?= =?utf-8?q?ng=C3=A9?= , Peter Maydell , Stefano Garzarella , kvm@vger.kernel.org, virtio-fs@redhat.com, Stefan Hajnoczi , Hanna Reitz , Fam Zheng , Aarushi Mehta , Sam Li , Hannes Reinecke , Dmitry Fomichev Subject: [PULL 03/20] block/block-backend: add block layer APIs resembling Linux ZonedBlockDevice ioctls Date: Thu, 20 Apr 2023 08:09:31 -0400 Message-Id: <20230420120948.436661-4-stefanha@redhat.com> In-Reply-To: <20230420120948.436661-1-stefanha@redhat.com> References: <20230420120948.436661-1-stefanha@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Sam Li Add zoned device option to host_device BlockDriver. It will be presented only for zoned host block devices. By adding zone management operations to the host_block_device BlockDriver, users can use the new block layer APIs including Report Zone and four zone management operations (open, close, finish, reset, reset_all). Qemu-io uses the new APIs to perform zoned storage commands of the device: zone_report(zrp), zone_open(zo), zone_close(zc), zone_reset(zrs), zone_finish(zf). For example, to test zone_report, use following command: $ ./build/qemu-io --image-opts -n driver=host_device, filename=/dev/nullb0 -c "zrp offset nr_zones" Signed-off-by: Sam Li Reviewed-by: Hannes Reinecke Reviewed-by: Stefan Hajnoczi Reviewed-by: Dmitry Fomichev Acked-by: Kevin Wolf Message-id: 20230324090605.28361-4-faithilikerun@gmail.com [Adjust commit message prefix as suggested by Philippe Mathieu-Daudé and remove spurious ret = -errno in raw_co_zone_mgmt(). --Stefan] Signed-off-by: Stefan Hajnoczi --- meson.build | 4 + include/block/block-io.h | 9 + include/block/block_int-common.h | 21 ++ include/block/raw-aio.h | 6 +- include/sysemu/block-backend-io.h | 18 ++ block/block-backend.c | 133 +++++++++++++ block/file-posix.c | 306 +++++++++++++++++++++++++++++- block/io.c | 41 ++++ qemu-io-cmds.c | 149 +++++++++++++++ 9 files changed, 684 insertions(+), 3 deletions(-) diff --git a/meson.build b/meson.build index c44d05a13f..a1bc310114 100644 --- a/meson.build +++ b/meson.build @@ -1966,6 +1966,7 @@ config_host_data.set('CONFIG_REPLICATION', get_option('replication').allowed()) # has_header config_host_data.set('CONFIG_EPOLL', cc.has_header('sys/epoll.h')) config_host_data.set('CONFIG_LINUX_MAGIC_H', cc.has_header('linux/magic.h')) +config_host_data.set('CONFIG_BLKZONED', cc.has_header('linux/blkzoned.h')) config_host_data.set('CONFIG_VALGRIND_H', cc.has_header('valgrind/valgrind.h')) config_host_data.set('HAVE_BTRFS_H', cc.has_header('linux/btrfs.h')) config_host_data.set('HAVE_DRM_H', cc.has_header('libdrm/drm.h')) @@ -2052,6 +2053,9 @@ config_host_data.set('HAVE_SIGEV_NOTIFY_THREAD_ID', config_host_data.set('HAVE_STRUCT_STAT_ST_ATIM', cc.has_member('struct stat', 'st_atim', prefix: '#include ')) +config_host_data.set('HAVE_BLK_ZONE_REP_CAPACITY', + cc.has_member('struct blk_zone', 'capacity', + prefix: '#include ')) # has_type config_host_data.set('CONFIG_IOVEC', diff --git a/include/block/block-io.h b/include/block/block-io.h index 5dab88521d..58f415ab64 100644 --- a/include/block/block-io.h +++ b/include/block/block-io.h @@ -111,6 +111,15 @@ int coroutine_fn GRAPH_RDLOCK bdrv_co_flush(BlockDriverState *bs); int coroutine_fn GRAPH_RDLOCK bdrv_co_pdiscard(BdrvChild *child, int64_t offset, int64_t bytes); +/* Report zone information of zone block device. */ +int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_report(BlockDriverState *bs, + int64_t offset, + unsigned int *nr_zones, + BlockZoneDescriptor *zones); +int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_mgmt(BlockDriverState *bs, + BlockZoneOp op, + int64_t offset, int64_t len); + bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs); int bdrv_block_status(BlockDriverState *bs, int64_t offset, int64_t bytes, int64_t *pnum, int64_t *map, diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h index 06dd89031f..4e765050c1 100644 --- a/include/block/block_int-common.h +++ b/include/block/block_int-common.h @@ -712,6 +712,12 @@ struct BlockDriver { int coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_load_vmstate)( BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos); + int coroutine_fn (*bdrv_co_zone_report)(BlockDriverState *bs, + int64_t offset, unsigned int *nr_zones, + BlockZoneDescriptor *zones); + int coroutine_fn (*bdrv_co_zone_mgmt)(BlockDriverState *bs, BlockZoneOp op, + int64_t offset, int64_t len); + /* removable device specific */ bool coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_is_inserted)( BlockDriverState *bs); @@ -864,6 +870,21 @@ typedef struct BlockLimits { /* device zone model */ BlockZoneModel zoned; + + /* zone size expressed in bytes */ + uint32_t zone_size; + + /* total number of zones */ + uint32_t nr_zones; + + /* maximum sectors of a zone append write operation */ + int64_t max_append_sectors; + + /* maximum number of open zones */ + int64_t max_open_zones; + + /* maximum number of active zones */ + int64_t max_active_zones; } BlockLimits; typedef struct BdrvOpBlocker BdrvOpBlocker; diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h index f8cda9df91..eda6a7a253 100644 --- a/include/block/raw-aio.h +++ b/include/block/raw-aio.h @@ -28,6 +28,8 @@ #define QEMU_AIO_WRITE_ZEROES 0x0020 #define QEMU_AIO_COPY_RANGE 0x0040 #define QEMU_AIO_TRUNCATE 0x0080 +#define QEMU_AIO_ZONE_REPORT 0x0100 +#define QEMU_AIO_ZONE_MGMT 0x0200 #define QEMU_AIO_TYPE_MASK \ (QEMU_AIO_READ | \ QEMU_AIO_WRITE | \ @@ -36,7 +38,9 @@ QEMU_AIO_DISCARD | \ QEMU_AIO_WRITE_ZEROES | \ QEMU_AIO_COPY_RANGE | \ - QEMU_AIO_TRUNCATE) + QEMU_AIO_TRUNCATE | \ + QEMU_AIO_ZONE_REPORT | \ + QEMU_AIO_ZONE_MGMT) /* AIO flags */ #define QEMU_AIO_MISALIGNED 0x1000 diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-io.h index bb25493ba1..097a7a3851 100644 --- a/include/sysemu/block-backend-io.h +++ b/include/sysemu/block-backend-io.h @@ -46,6 +46,13 @@ BlockAIOCB *blk_aio_pwritev(BlockBackend *blk, int64_t offset, BlockCompletionFunc *cb, void *opaque); BlockAIOCB *blk_aio_flush(BlockBackend *blk, BlockCompletionFunc *cb, void *opaque); +BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset, + unsigned int *nr_zones, + BlockZoneDescriptor *zones, + BlockCompletionFunc *cb, void *opaque); +BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op, + int64_t offset, int64_t len, + BlockCompletionFunc *cb, void *opaque); BlockAIOCB *blk_aio_pdiscard(BlockBackend *blk, int64_t offset, int64_t bytes, BlockCompletionFunc *cb, void *opaque); void blk_aio_cancel_async(BlockAIOCB *acb); @@ -186,6 +193,17 @@ int co_wrapper_mixed blk_pwrite_zeroes(BlockBackend *blk, int64_t offset, int coroutine_fn blk_co_pwrite_zeroes(BlockBackend *blk, int64_t offset, int64_t bytes, BdrvRequestFlags flags); +int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset, + unsigned int *nr_zones, + BlockZoneDescriptor *zones); +int co_wrapper_mixed blk_zone_report(BlockBackend *blk, int64_t offset, + unsigned int *nr_zones, + BlockZoneDescriptor *zones); +int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op, + int64_t offset, int64_t len); +int co_wrapper_mixed blk_zone_mgmt(BlockBackend *blk, BlockZoneOp op, + int64_t offset, int64_t len); + int co_wrapper_mixed blk_pdiscard(BlockBackend *blk, int64_t offset, int64_t bytes); int coroutine_fn blk_co_pdiscard(BlockBackend *blk, int64_t offset, diff --git a/block/block-backend.c b/block/block-backend.c index 55efc735b4..4475b8b085 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -1833,6 +1833,139 @@ int coroutine_fn blk_co_flush(BlockBackend *blk) return ret; } +static void coroutine_fn blk_aio_zone_report_entry(void *opaque) +{ + BlkAioEmAIOCB *acb = opaque; + BlkRwCo *rwco = &acb->rwco; + + rwco->ret = blk_co_zone_report(rwco->blk, rwco->offset, + (unsigned int*)acb->bytes,rwco->iobuf); + blk_aio_complete(acb); +} + +BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset, + unsigned int *nr_zones, + BlockZoneDescriptor *zones, + BlockCompletionFunc *cb, void *opaque) +{ + BlkAioEmAIOCB *acb; + Coroutine *co; + IO_CODE(); + + blk_inc_in_flight(blk); + acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque); + acb->rwco = (BlkRwCo) { + .blk = blk, + .offset = offset, + .iobuf = zones, + .ret = NOT_DONE, + }; + acb->bytes = (int64_t)nr_zones, + acb->has_returned = false; + + co = qemu_coroutine_create(blk_aio_zone_report_entry, acb); + aio_co_enter(blk_get_aio_context(blk), co); + + acb->has_returned = true; + if (acb->rwco.ret != NOT_DONE) { + replay_bh_schedule_oneshot_event(blk_get_aio_context(blk), + blk_aio_complete_bh, acb); + } + + return &acb->common; +} + +static void coroutine_fn blk_aio_zone_mgmt_entry(void *opaque) +{ + BlkAioEmAIOCB *acb = opaque; + BlkRwCo *rwco = &acb->rwco; + + rwco->ret = blk_co_zone_mgmt(rwco->blk, (BlockZoneOp)rwco->iobuf, + rwco->offset, acb->bytes); + blk_aio_complete(acb); +} + +BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op, + int64_t offset, int64_t len, + BlockCompletionFunc *cb, void *opaque) { + BlkAioEmAIOCB *acb; + Coroutine *co; + IO_CODE(); + + blk_inc_in_flight(blk); + acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque); + acb->rwco = (BlkRwCo) { + .blk = blk, + .offset = offset, + .iobuf = (void *)op, + .ret = NOT_DONE, + }; + acb->bytes = len; + acb->has_returned = false; + + co = qemu_coroutine_create(blk_aio_zone_mgmt_entry, acb); + aio_co_enter(blk_get_aio_context(blk), co); + + acb->has_returned = true; + if (acb->rwco.ret != NOT_DONE) { + replay_bh_schedule_oneshot_event(blk_get_aio_context(blk), + blk_aio_complete_bh, acb); + } + + return &acb->common; +} + +/* + * Send a zone_report command. + * offset is a byte offset from the start of the device. No alignment + * required for offset. + * nr_zones represents IN maximum and OUT actual. + */ +int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset, + unsigned int *nr_zones, + BlockZoneDescriptor *zones) +{ + int ret; + IO_CODE(); + + blk_inc_in_flight(blk); /* increase before waiting */ + blk_wait_while_drained(blk); + if (!blk_is_available(blk)) { + blk_dec_in_flight(blk); + return -ENOMEDIUM; + } + ret = bdrv_co_zone_report(blk_bs(blk), offset, nr_zones, zones); + blk_dec_in_flight(blk); + return ret; +} + +/* + * Send a zone_management command. + * op is the zone operation; + * offset is the byte offset from the start of the zoned device; + * len is the maximum number of bytes the command should operate on. It + * should be aligned with the device zone size. + */ +int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op, + int64_t offset, int64_t len) +{ + int ret; + IO_CODE(); + + blk_inc_in_flight(blk); + blk_wait_while_drained(blk); + + ret = blk_check_byte_request(blk, offset, len); + if (ret < 0) { + blk_dec_in_flight(blk); + return ret; + } + + ret = bdrv_co_zone_mgmt(blk_bs(blk), op, offset, len); + blk_dec_in_flight(blk); + return ret; +} + void blk_drain(BlockBackend *blk) { BlockDriverState *bs = blk_bs(blk); diff --git a/block/file-posix.c b/block/file-posix.c index 2697cb6699..2076da293d 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -68,6 +68,9 @@ #include #include #include +#if defined(CONFIG_BLKZONED) +#include +#endif #include #include #include @@ -216,6 +219,13 @@ typedef struct RawPosixAIOData { PreallocMode prealloc; Error **errp; } truncate; + struct { + unsigned int *nr_zones; + BlockZoneDescriptor *zones; + } zone_report; + struct { + unsigned long op; + } zone_mgmt; }; } RawPosixAIOData; @@ -1351,6 +1361,50 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp) zoned = BLK_Z_NONE; } bs->bl.zoned = zoned; + if (zoned != BLK_Z_NONE) { + /* + * The zoned device must at least have zone size and nr_zones fields. + */ + ret = get_sysfs_long_val(&st, "chunk_sectors"); + if (ret < 0) { + error_setg_errno(errp, -ret, "Unable to read chunk_sectors " + "sysfs attribute"); + goto out; + } else if (!ret) { + error_setg(errp, "Read 0 from chunk_sectors sysfs attribute"); + goto out; + } + bs->bl.zone_size = ret << BDRV_SECTOR_BITS; + + ret = get_sysfs_long_val(&st, "nr_zones"); + if (ret < 0) { + error_setg_errno(errp, -ret, "Unable to read nr_zones " + "sysfs attribute"); + goto out; + } else if (!ret) { + error_setg(errp, "Read 0 from nr_zones sysfs attribute"); + goto out; + } + bs->bl.nr_zones = ret; + + ret = get_sysfs_long_val(&st, "zone_append_max_bytes"); + if (ret > 0) { + bs->bl.max_append_sectors = ret >> BDRV_SECTOR_BITS; + } + + ret = get_sysfs_long_val(&st, "max_open_zones"); + if (ret >= 0) { + bs->bl.max_open_zones = ret; + } + + ret = get_sysfs_long_val(&st, "max_active_zones"); + if (ret >= 0) { + bs->bl.max_active_zones = ret; + } + return; + } +out: + bs->bl.zoned = BLK_Z_NONE; } static int check_for_dasd(int fd) @@ -1374,9 +1428,12 @@ static int hdev_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz) BDRVRawState *s = bs->opaque; int ret; - /* If DASD, get blocksizes */ + /* If DASD or zoned devices, get blocksizes */ if (check_for_dasd(s->fd) < 0) { - return -ENOTSUP; + /* zoned devices are not DASD */ + if (bs->bl.zoned == BLK_Z_NONE) { + return -ENOTSUP; + } } ret = probe_logical_blocksize(s->fd, &bsz->log); if (ret < 0) { @@ -1844,6 +1901,146 @@ static off_t copy_file_range(int in_fd, off_t *in_off, int out_fd, } #endif +/* + * parse_zone - Fill a zone descriptor + */ +#if defined(CONFIG_BLKZONED) +static inline int parse_zone(struct BlockZoneDescriptor *zone, + const struct blk_zone *blkz) { + zone->start = blkz->start << BDRV_SECTOR_BITS; + zone->length = blkz->len << BDRV_SECTOR_BITS; + zone->wp = blkz->wp << BDRV_SECTOR_BITS; + +#ifdef HAVE_BLK_ZONE_REP_CAPACITY + zone->cap = blkz->capacity << BDRV_SECTOR_BITS; +#else + zone->cap = blkz->len << BDRV_SECTOR_BITS; +#endif + + switch (blkz->type) { + case BLK_ZONE_TYPE_SEQWRITE_REQ: + zone->type = BLK_ZT_SWR; + break; + case BLK_ZONE_TYPE_SEQWRITE_PREF: + zone->type = BLK_ZT_SWP; + break; + case BLK_ZONE_TYPE_CONVENTIONAL: + zone->type = BLK_ZT_CONV; + break; + default: + error_report("Unsupported zone type: 0x%x", blkz->type); + return -ENOTSUP; + } + + switch (blkz->cond) { + case BLK_ZONE_COND_NOT_WP: + zone->state = BLK_ZS_NOT_WP; + break; + case BLK_ZONE_COND_EMPTY: + zone->state = BLK_ZS_EMPTY; + break; + case BLK_ZONE_COND_IMP_OPEN: + zone->state = BLK_ZS_IOPEN; + break; + case BLK_ZONE_COND_EXP_OPEN: + zone->state = BLK_ZS_EOPEN; + break; + case BLK_ZONE_COND_CLOSED: + zone->state = BLK_ZS_CLOSED; + break; + case BLK_ZONE_COND_READONLY: + zone->state = BLK_ZS_RDONLY; + break; + case BLK_ZONE_COND_FULL: + zone->state = BLK_ZS_FULL; + break; + case BLK_ZONE_COND_OFFLINE: + zone->state = BLK_ZS_OFFLINE; + break; + default: + error_report("Unsupported zone state: 0x%x", blkz->cond); + return -ENOTSUP; + } + return 0; +} +#endif + +#if defined(CONFIG_BLKZONED) +static int handle_aiocb_zone_report(void *opaque) +{ + RawPosixAIOData *aiocb = opaque; + int fd = aiocb->aio_fildes; + unsigned int *nr_zones = aiocb->zone_report.nr_zones; + BlockZoneDescriptor *zones = aiocb->zone_report.zones; + /* zoned block devices use 512-byte sectors */ + uint64_t sector = aiocb->aio_offset / 512; + + struct blk_zone *blkz; + size_t rep_size; + unsigned int nrz; + int ret, n = 0, i = 0; + + nrz = *nr_zones; + rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_zone); + g_autofree struct blk_zone_report *rep = NULL; + rep = g_malloc(rep_size); + + blkz = (struct blk_zone *)(rep + 1); + while (n < nrz) { + memset(rep, 0, rep_size); + rep->sector = sector; + rep->nr_zones = nrz - n; + + do { + ret = ioctl(fd, BLKREPORTZONE, rep); + } while (ret != 0 && errno == EINTR); + if (ret != 0) { + error_report("%d: ioctl BLKREPORTZONE at %" PRId64 " failed %d", + fd, sector, errno); + return -errno; + } + + if (!rep->nr_zones) { + break; + } + + for (i = 0; i < rep->nr_zones; i++, n++) { + ret = parse_zone(&zones[n], &blkz[i]); + if (ret != 0) { + return ret; + } + + /* The next report should start after the last zone reported */ + sector = blkz[i].start + blkz[i].len; + } + } + + *nr_zones = n; + return 0; +} +#endif + +#if defined(CONFIG_BLKZONED) +static int handle_aiocb_zone_mgmt(void *opaque) +{ + RawPosixAIOData *aiocb = opaque; + int fd = aiocb->aio_fildes; + uint64_t sector = aiocb->aio_offset / 512; + int64_t nr_sectors = aiocb->aio_nbytes / 512; + struct blk_zone_range range; + int ret; + + /* Execute the operation */ + range.sector = sector; + range.nr_sectors = nr_sectors; + do { + ret = ioctl(fd, aiocb->zone_mgmt.op, &range); + } while (ret != 0 && errno == EINTR); + + return ret; +} +#endif + static int handle_aiocb_copy_range(void *opaque) { RawPosixAIOData *aiocb = opaque; @@ -3034,6 +3231,104 @@ static void raw_account_discard(BDRVRawState *s, uint64_t nbytes, int ret) } } +/* + * zone report - Get a zone block device's information in the form + * of an array of zone descriptors. + * zones is an array of zone descriptors to hold zone information on reply; + * offset can be any byte within the entire size of the device; + * nr_zones is the maxium number of sectors the command should operate on. + */ +#if defined(CONFIG_BLKZONED) +static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t offset, + unsigned int *nr_zones, + BlockZoneDescriptor *zones) { + BDRVRawState *s = bs->opaque; + RawPosixAIOData acb = (RawPosixAIOData) { + .bs = bs, + .aio_fildes = s->fd, + .aio_type = QEMU_AIO_ZONE_REPORT, + .aio_offset = offset, + .zone_report = { + .nr_zones = nr_zones, + .zones = zones, + }, + }; + + return raw_thread_pool_submit(bs, handle_aiocb_zone_report, &acb); +} +#endif + +/* + * zone management operations - Execute an operation on a zone + */ +#if defined(CONFIG_BLKZONED) +static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op, + int64_t offset, int64_t len) { + BDRVRawState *s = bs->opaque; + RawPosixAIOData acb; + int64_t zone_size, zone_size_mask; + const char *op_name; + unsigned long zo; + int ret; + int64_t capacity = bs->total_sectors << BDRV_SECTOR_BITS; + + zone_size = bs->bl.zone_size; + zone_size_mask = zone_size - 1; + if (offset & zone_size_mask) { + error_report("sector offset %" PRId64 " is not aligned to zone size " + "%" PRId64 "", offset / 512, zone_size / 512); + return -EINVAL; + } + + if (((offset + len) < capacity && len & zone_size_mask) || + offset + len > capacity) { + error_report("number of sectors %" PRId64 " is not aligned to zone size" + " %" PRId64 "", len / 512, zone_size / 512); + return -EINVAL; + } + + switch (op) { + case BLK_ZO_OPEN: + op_name = "BLKOPENZONE"; + zo = BLKOPENZONE; + break; + case BLK_ZO_CLOSE: + op_name = "BLKCLOSEZONE"; + zo = BLKCLOSEZONE; + break; + case BLK_ZO_FINISH: + op_name = "BLKFINISHZONE"; + zo = BLKFINISHZONE; + break; + case BLK_ZO_RESET: + op_name = "BLKRESETZONE"; + zo = BLKRESETZONE; + break; + default: + error_report("Unsupported zone op: 0x%x", op); + return -ENOTSUP; + } + + acb = (RawPosixAIOData) { + .bs = bs, + .aio_fildes = s->fd, + .aio_type = QEMU_AIO_ZONE_MGMT, + .aio_offset = offset, + .aio_nbytes = len, + .zone_mgmt = { + .op = zo, + }, + }; + + ret = raw_thread_pool_submit(bs, handle_aiocb_zone_mgmt, &acb); + if (ret != 0) { + error_report("ioctl %s failed %d", op_name, ret); + } + + return ret; +} +#endif + static coroutine_fn int raw_do_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes, bool blkdev) @@ -3789,6 +4084,13 @@ static BlockDriver bdrv_host_device = { #ifdef __linux__ .bdrv_co_ioctl = hdev_co_ioctl, #endif + + /* zoned device */ +#if defined(CONFIG_BLKZONED) + /* zone management operations */ + .bdrv_co_zone_report = raw_co_zone_report, + .bdrv_co_zone_mgmt = raw_co_zone_mgmt, +#endif }; #if defined(__linux__) || defined(__FreeBSD__) || defined(__FreeBSD_kernel__) diff --git a/block/io.c b/block/io.c index 2e267a85ab..ca941457d4 100644 --- a/block/io.c +++ b/block/io.c @@ -3115,6 +3115,47 @@ out: return co.ret; } +int coroutine_fn bdrv_co_zone_report(BlockDriverState *bs, int64_t offset, + unsigned int *nr_zones, + BlockZoneDescriptor *zones) +{ + BlockDriver *drv = bs->drv; + CoroutineIOCompletion co = { + .coroutine = qemu_coroutine_self(), + }; + IO_CODE(); + + bdrv_inc_in_flight(bs); + if (!drv || !drv->bdrv_co_zone_report || bs->bl.zoned == BLK_Z_NONE) { + co.ret = -ENOTSUP; + goto out; + } + co.ret = drv->bdrv_co_zone_report(bs, offset, nr_zones, zones); +out: + bdrv_dec_in_flight(bs); + return co.ret; +} + +int coroutine_fn bdrv_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op, + int64_t offset, int64_t len) +{ + BlockDriver *drv = bs->drv; + CoroutineIOCompletion co = { + .coroutine = qemu_coroutine_self(), + }; + IO_CODE(); + + bdrv_inc_in_flight(bs); + if (!drv || !drv->bdrv_co_zone_mgmt || bs->bl.zoned == BLK_Z_NONE) { + co.ret = -ENOTSUP; + goto out; + } + co.ret = drv->bdrv_co_zone_mgmt(bs, op, offset, len); +out: + bdrv_dec_in_flight(bs); + return co.ret; +} + void *qemu_blockalign(BlockDriverState *bs, size_t size) { IO_CODE(); diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c index e7a02f5b99..f35ea627d7 100644 --- a/qemu-io-cmds.c +++ b/qemu-io-cmds.c @@ -1730,6 +1730,150 @@ static const cmdinfo_t flush_cmd = { .oneline = "flush all in-core file state to disk", }; +static inline int64_t tosector(int64_t bytes) +{ + return bytes >> BDRV_SECTOR_BITS; +} + +static int zone_report_f(BlockBackend *blk, int argc, char **argv) +{ + int ret; + int64_t offset; + unsigned int nr_zones; + + ++optind; + offset = cvtnum(argv[optind]); + ++optind; + nr_zones = cvtnum(argv[optind]); + + g_autofree BlockZoneDescriptor *zones = NULL; + zones = g_new(BlockZoneDescriptor, nr_zones); + ret = blk_zone_report(blk, offset, &nr_zones, zones); + if (ret < 0) { + printf("zone report failed: %s\n", strerror(-ret)); + } else { + for (int i = 0; i < nr_zones; ++i) { + printf("start: 0x%" PRIx64 ", len 0x%" PRIx64 ", " + "cap"" 0x%" PRIx64 ", wptr 0x%" PRIx64 ", " + "zcond:%u, [type: %u]\n", + tosector(zones[i].start), tosector(zones[i].length), + tosector(zones[i].cap), tosector(zones[i].wp), + zones[i].state, zones[i].type); + } + } + return ret; +} + +static const cmdinfo_t zone_report_cmd = { + .name = "zone_report", + .altname = "zrp", + .cfunc = zone_report_f, + .argmin = 2, + .argmax = 2, + .args = "offset number", + .oneline = "report zone information", +}; + +static int zone_open_f(BlockBackend *blk, int argc, char **argv) +{ + int ret; + int64_t offset, len; + ++optind; + offset = cvtnum(argv[optind]); + ++optind; + len = cvtnum(argv[optind]); + ret = blk_zone_mgmt(blk, BLK_ZO_OPEN, offset, len); + if (ret < 0) { + printf("zone open failed: %s\n", strerror(-ret)); + } + return ret; +} + +static const cmdinfo_t zone_open_cmd = { + .name = "zone_open", + .altname = "zo", + .cfunc = zone_open_f, + .argmin = 2, + .argmax = 2, + .args = "offset len", + .oneline = "explicit open a range of zones in zone block device", +}; + +static int zone_close_f(BlockBackend *blk, int argc, char **argv) +{ + int ret; + int64_t offset, len; + ++optind; + offset = cvtnum(argv[optind]); + ++optind; + len = cvtnum(argv[optind]); + ret = blk_zone_mgmt(blk, BLK_ZO_CLOSE, offset, len); + if (ret < 0) { + printf("zone close failed: %s\n", strerror(-ret)); + } + return ret; +} + +static const cmdinfo_t zone_close_cmd = { + .name = "zone_close", + .altname = "zc", + .cfunc = zone_close_f, + .argmin = 2, + .argmax = 2, + .args = "offset len", + .oneline = "close a range of zones in zone block device", +}; + +static int zone_finish_f(BlockBackend *blk, int argc, char **argv) +{ + int ret; + int64_t offset, len; + ++optind; + offset = cvtnum(argv[optind]); + ++optind; + len = cvtnum(argv[optind]); + ret = blk_zone_mgmt(blk, BLK_ZO_FINISH, offset, len); + if (ret < 0) { + printf("zone finish failed: %s\n", strerror(-ret)); + } + return ret; +} + +static const cmdinfo_t zone_finish_cmd = { + .name = "zone_finish", + .altname = "zf", + .cfunc = zone_finish_f, + .argmin = 2, + .argmax = 2, + .args = "offset len", + .oneline = "finish a range of zones in zone block device", +}; + +static int zone_reset_f(BlockBackend *blk, int argc, char **argv) +{ + int ret; + int64_t offset, len; + ++optind; + offset = cvtnum(argv[optind]); + ++optind; + len = cvtnum(argv[optind]); + ret = blk_zone_mgmt(blk, BLK_ZO_RESET, offset, len); + if (ret < 0) { + printf("zone reset failed: %s\n", strerror(-ret)); + } + return ret; +} + +static const cmdinfo_t zone_reset_cmd = { + .name = "zone_reset", + .altname = "zrs", + .cfunc = zone_reset_f, + .argmin = 2, + .argmax = 2, + .args = "offset len", + .oneline = "reset a zone write pointer in zone block device", +}; + static int truncate_f(BlockBackend *blk, int argc, char **argv); static const cmdinfo_t truncate_cmd = { .name = "truncate", @@ -2523,6 +2667,11 @@ static void __attribute((constructor)) init_qemuio_commands(void) qemuio_add_command(&aio_write_cmd); qemuio_add_command(&aio_flush_cmd); qemuio_add_command(&flush_cmd); + qemuio_add_command(&zone_report_cmd); + qemuio_add_command(&zone_open_cmd); + qemuio_add_command(&zone_close_cmd); + qemuio_add_command(&zone_finish_cmd); + qemuio_add_command(&zone_reset_cmd); qemuio_add_command(&truncate_cmd); qemuio_add_command(&length_cmd); qemuio_add_command(&info_cmd); From patchwork Thu Apr 20 12:09:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 13218653 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0CA35C77B72 for ; Thu, 20 Apr 2023 12:11:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234548AbjDTMK7 (ORCPT ); Thu, 20 Apr 2023 08:10:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60678 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234541AbjDTMK6 (ORCPT ); Thu, 20 Apr 2023 08:10:58 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ABB3640C0 for ; Thu, 20 Apr 2023 05:10:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681992607; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=o556zBoKl55l9Ra7Kz5hgPULlT1aL1OylaCX0+Xboh0=; b=PS8cmxmSzUmOJVrxmQjIp+tSepZOxuHflEfxMJse7ug3Zrfm9DFWqG8uv6K9y1Os4KN1GJ OM8tLhzPE+y8aFEu9FATKDzqrfeK54Rf4Ze4S1ydH9RJhs83YR8X19KhHMFFy7MqeOZ9oF 8QfPz2ELrOcmoRoBKEwqBjqHzZZCTVo= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-231-Z6tE0vZEN2yTlBXOtbk7Tg-1; Thu, 20 Apr 2023 08:10:03 -0400 X-MC-Unique: Z6tE0vZEN2yTlBXOtbk7Tg-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id AA16888B77E; Thu, 20 Apr 2023 12:10:02 +0000 (UTC) Received: from localhost (unknown [10.39.193.254]) by smtp.corp.redhat.com (Postfix) with ESMTP id E9D422166B33; Thu, 20 Apr 2023 12:10:01 +0000 (UTC) From: Stefan Hajnoczi To: qemu-devel@nongnu.org Cc: Raphael Norwitz , Kevin Wolf , Markus Armbruster , Julia Suvorova , Eric Blake , Paolo Bonzini , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , =?utf-8?q?Marc?= =?utf-8?q?-Andr=C3=A9_Lureau?= , "Michael S. Tsirkin" , Thomas Huth , qemu-block@nongnu.org, Cornelia Huck , "Dr. David Alan Gilbert" , =?utf-8?q?Daniel_P=2E_Berra?= =?utf-8?q?ng=C3=A9?= , Peter Maydell , Stefano Garzarella , kvm@vger.kernel.org, virtio-fs@redhat.com, Stefan Hajnoczi , Hanna Reitz , Fam Zheng , Aarushi Mehta , Sam Li , Damien Le Moal , Hannes Reinecke , Dmitry Fomichev Subject: [PULL 04/20] block/raw-format: add zone operations to pass through requests Date: Thu, 20 Apr 2023 08:09:32 -0400 Message-Id: <20230420120948.436661-5-stefanha@redhat.com> In-Reply-To: <20230420120948.436661-1-stefanha@redhat.com> References: <20230420120948.436661-1-stefanha@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Sam Li raw-format driver usually sits on top of file-posix driver. It needs to pass through requests of zone commands. Signed-off-by: Sam Li Reviewed-by: Stefan Hajnoczi Reviewed-by: Damien Le Moal Reviewed-by: Hannes Reinecke Reviewed-by: Dmitry Fomichev Acked-by: Kevin Wolf Message-id: 20230324090605.28361-5-faithilikerun@gmail.com [Adjust commit message prefix as suggested by Philippe Mathieu-Daudé . --Stefan] Signed-off-by: Stefan Hajnoczi --- block/raw-format.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/block/raw-format.c b/block/raw-format.c index 06b8030d9d..f167448462 100644 --- a/block/raw-format.c +++ b/block/raw-format.c @@ -317,6 +317,21 @@ raw_co_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes) return bdrv_co_pdiscard(bs->file, offset, bytes); } +static int coroutine_fn GRAPH_RDLOCK +raw_co_zone_report(BlockDriverState *bs, int64_t offset, + unsigned int *nr_zones, + BlockZoneDescriptor *zones) +{ + return bdrv_co_zone_report(bs->file->bs, offset, nr_zones, zones); +} + +static int coroutine_fn GRAPH_RDLOCK +raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op, + int64_t offset, int64_t len) +{ + return bdrv_co_zone_mgmt(bs->file->bs, op, offset, len); +} + static int64_t coroutine_fn GRAPH_RDLOCK raw_co_getlength(BlockDriverState *bs) { @@ -619,6 +634,8 @@ BlockDriver bdrv_raw = { .bdrv_co_pwritev = &raw_co_pwritev, .bdrv_co_pwrite_zeroes = &raw_co_pwrite_zeroes, .bdrv_co_pdiscard = &raw_co_pdiscard, + .bdrv_co_zone_report = &raw_co_zone_report, + .bdrv_co_zone_mgmt = &raw_co_zone_mgmt, .bdrv_co_block_status = &raw_co_block_status, .bdrv_co_copy_range_from = &raw_co_copy_range_from, .bdrv_co_copy_range_to = &raw_co_copy_range_to, From patchwork Thu Apr 20 12:09:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 13218657 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 077EEC77B72 for ; Thu, 20 Apr 2023 12:11:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234579AbjDTMLJ (ORCPT ); Thu, 20 Apr 2023 08:11:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60710 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234602AbjDTMLC (ORCPT ); Thu, 20 Apr 2023 08:11:02 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2BB023C22 for ; Thu, 20 Apr 2023 05:10:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681992613; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=iuH4IzkKP9vYI2VNNqhM3FuXIC5qlLpZyHIFACbbeuI=; b=GjMmFAQcvJD28gL0Ws2CZhy3MdWNwFmCIDwc4N8IFmw/JXDYTYoSTqCit+ki1ngZ6JFW7s 2hDEEwiORh6h4nKCoqJ+ePSd6435cLsIuL7O9J1hJbIhIZatH2xfzF5OUC3sAdW7+RWUcJ HMZ/ZAhe7jKWjLDaKB3wVx+d7ZRUcxc= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-630-ZZMLslQTOZCaCE-ZBl9p0A-1; Thu, 20 Apr 2023 08:10:06 -0400 X-MC-Unique: ZZMLslQTOZCaCE-ZBl9p0A-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 89B69811E7D; Thu, 20 Apr 2023 12:10:05 +0000 (UTC) Received: from localhost (unknown [10.39.193.254]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9910E20239E0; Thu, 20 Apr 2023 12:10:04 +0000 (UTC) From: Stefan Hajnoczi To: qemu-devel@nongnu.org Cc: Raphael Norwitz , Kevin Wolf , Markus Armbruster , Julia Suvorova , Eric Blake , Paolo Bonzini , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , =?utf-8?q?Marc?= =?utf-8?q?-Andr=C3=A9_Lureau?= , "Michael S. Tsirkin" , Thomas Huth , qemu-block@nongnu.org, Cornelia Huck , "Dr. David Alan Gilbert" , =?utf-8?q?Daniel_P=2E_Berra?= =?utf-8?q?ng=C3=A9?= , Peter Maydell , Stefano Garzarella , kvm@vger.kernel.org, virtio-fs@redhat.com, Stefan Hajnoczi , Hanna Reitz , Fam Zheng , Aarushi Mehta , Sam Li , Hannes Reinecke , Dmitry Fomichev Subject: [PULL 05/20] block: add zoned BlockDriver check to block layer Date: Thu, 20 Apr 2023 08:09:33 -0400 Message-Id: <20230420120948.436661-6-stefanha@redhat.com> In-Reply-To: <20230420120948.436661-1-stefanha@redhat.com> References: <20230420120948.436661-1-stefanha@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Sam Li Putting zoned/non-zoned BlockDrivers on top of each other is not allowed. Signed-off-by: Sam Li Reviewed-by: Stefan Hajnoczi Reviewed-by: Hannes Reinecke Reviewed-by: Dmitry Fomichev Acked-by: Kevin Wolf Message-id: 20230324090605.28361-6-faithilikerun@gmail.com [Adjust commit message prefix as suggested by Philippe Mathieu-Daudé and clarify that the check is about zoned BlockDrivers. --Stefan] Signed-off-by: Stefan Hajnoczi --- include/block/block_int-common.h | 5 +++++ block.c | 19 +++++++++++++++++++ block/file-posix.c | 12 ++++++++++++ block/raw-format.c | 1 + 4 files changed, 37 insertions(+) diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h index 4e765050c1..0164dc76e2 100644 --- a/include/block/block_int-common.h +++ b/include/block/block_int-common.h @@ -137,6 +137,11 @@ struct BlockDriver { */ bool is_format; + /* + * Set to true if the BlockDriver supports zoned children. + */ + bool supports_zoned_children; + /* * Drivers not implementing bdrv_parse_filename nor bdrv_open should have * this field set to true, except ones that are defined only by their diff --git a/block.c b/block.c index d79a52ca74..0a1e79c4fc 100644 --- a/block.c +++ b/block.c @@ -7967,6 +7967,25 @@ void bdrv_add_child(BlockDriverState *parent_bs, BlockDriverState *child_bs, return; } + /* + * Non-zoned block drivers do not follow zoned storage constraints + * (i.e. sequential writes to zones). Refuse mixing zoned and non-zoned + * drivers in a graph. + */ + if (!parent_bs->drv->supports_zoned_children && + child_bs->bl.zoned == BLK_Z_HM) { + /* + * The host-aware model allows zoned storage constraints and random + * write. Allow mixing host-aware and non-zoned drivers. Using + * host-aware device as a regular device. + */ + error_setg(errp, "Cannot add a %s child to a %s parent", + child_bs->bl.zoned == BLK_Z_HM ? "zoned" : "non-zoned", + parent_bs->drv->supports_zoned_children ? + "support zoned children" : "not support zoned children"); + return; + } + if (!QLIST_EMPTY(&child_bs->parents)) { error_setg(errp, "The node %s already has a parent", child_bs->node_name); diff --git a/block/file-posix.c b/block/file-posix.c index 2076da293d..5c5bef61e2 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -776,6 +776,18 @@ static int raw_open_common(BlockDriverState *bs, QDict *options, goto fail; } } +#ifdef CONFIG_BLKZONED + /* + * The kernel page cache does not reliably work for writes to SWR zones + * of zoned block device because it can not guarantee the order of writes. + */ + if ((bs->bl.zoned != BLK_Z_NONE) && + (!(s->open_flags & O_DIRECT))) { + error_setg(errp, "The driver supports zoned devices, and it requires " + "cache.direct=on, which was not specified."); + return -EINVAL; /* No host kernel page cache */ + } +#endif if (S_ISBLK(st.st_mode)) { #ifdef __linux__ diff --git a/block/raw-format.c b/block/raw-format.c index f167448462..1a1dce8da4 100644 --- a/block/raw-format.c +++ b/block/raw-format.c @@ -623,6 +623,7 @@ static void raw_child_perm(BlockDriverState *bs, BdrvChild *c, BlockDriver bdrv_raw = { .format_name = "raw", .instance_size = sizeof(BDRVRawState), + .supports_zoned_children = true, .bdrv_probe = &raw_probe, .bdrv_reopen_prepare = &raw_reopen_prepare, .bdrv_reopen_commit = &raw_reopen_commit, From patchwork Thu Apr 20 12:09:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 13218656 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D8D75C77B76 for ; Thu, 20 Apr 2023 12:11:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234609AbjDTMLI (ORCPT ); Thu, 20 Apr 2023 08:11:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60696 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234596AbjDTMLC (ORCPT ); Thu, 20 Apr 2023 08:11:02 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F259B1706 for ; Thu, 20 Apr 2023 05:10:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681992612; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Gc6CmGleQPVY34YOqLhSarn06z9aG6CKwZ1JpgdLYj4=; b=QkxXSC+NQ4fF/QAOO2nGHAWl6oNL+wpbPpSQdFiTlWkDbih/okk+ZonB5gkTvUM2Wo72Bp h/aDGuxZMFkhn+fXQvb2t+UQRC+h/1TEcect/dRfOq0AnM9FBxoKlKmVKRo0FreCwAXNLW +86B26scPfGNzyJx8aKNTKRNXpgJk7M= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-186-7QFL4mE4MKSi5EH5sWVBww-1; Thu, 20 Apr 2023 08:10:09 -0400 X-MC-Unique: 7QFL4mE4MKSi5EH5sWVBww-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 69FD5858F09; Thu, 20 Apr 2023 12:10:08 +0000 (UTC) Received: from localhost (unknown [10.39.193.254]) by smtp.corp.redhat.com (Postfix) with ESMTP id A91B3492B00; Thu, 20 Apr 2023 12:10:07 +0000 (UTC) From: Stefan Hajnoczi To: qemu-devel@nongnu.org Cc: Raphael Norwitz , Kevin Wolf , Markus Armbruster , Julia Suvorova , Eric Blake , Paolo Bonzini , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , =?utf-8?q?Marc?= =?utf-8?q?-Andr=C3=A9_Lureau?= , "Michael S. Tsirkin" , Thomas Huth , qemu-block@nongnu.org, Cornelia Huck , "Dr. David Alan Gilbert" , =?utf-8?q?Daniel_P=2E_Berra?= =?utf-8?q?ng=C3=A9?= , Peter Maydell , Stefano Garzarella , kvm@vger.kernel.org, virtio-fs@redhat.com, Stefan Hajnoczi , Hanna Reitz , Fam Zheng , Aarushi Mehta , Sam Li Subject: [PULL 06/20] iotests: test new zone operations Date: Thu, 20 Apr 2023 08:09:34 -0400 Message-Id: <20230420120948.436661-7-stefanha@redhat.com> In-Reply-To: <20230420120948.436661-1-stefanha@redhat.com> References: <20230420120948.436661-1-stefanha@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Sam Li The new block layer APIs of zoned block devices can be tested by: $ tests/qemu-iotests/check zoned Run each zone operation on a newly created null_blk device and see whether it outputs the same zone information. Signed-off-by: Sam Li Reviewed-by: Stefan Hajnoczi Acked-by: Kevin Wolf Message-id: 20230324090605.28361-7-faithilikerun@gmail.com [Adjust commit message prefix as suggested by Philippe Mathieu-Daudé . --Stefan] Signed-off-by: Stefan Hajnoczi --- tests/qemu-iotests/tests/zoned | 89 ++++++++++++++++++++++++++++++ tests/qemu-iotests/tests/zoned.out | 53 ++++++++++++++++++ 2 files changed, 142 insertions(+) create mode 100755 tests/qemu-iotests/tests/zoned create mode 100644 tests/qemu-iotests/tests/zoned.out diff --git a/tests/qemu-iotests/tests/zoned b/tests/qemu-iotests/tests/zoned new file mode 100755 index 0000000000..56f60616b5 --- /dev/null +++ b/tests/qemu-iotests/tests/zoned @@ -0,0 +1,89 @@ +#!/usr/bin/env bash +# +# Test zone management operations. +# + +seq="$(basename $0)" +echo "QA output created by $seq" +status=1 # failure is the default! + +_cleanup() +{ + _cleanup_test_img + sudo -n rmmod null_blk +} +trap "_cleanup; exit \$status" 0 1 2 3 15 + +# get standard environment, filters and checks +. ../common.rc +. ../common.filter +. ../common.qemu + +# This test only runs on Linux hosts with raw image files. +_supported_fmt raw +_supported_proto file +_supported_os Linux + +sudo -n true || \ + _notrun 'Password-less sudo required' + +IMG="--image-opts -n driver=host_device,filename=/dev/nullb0" +QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT + +echo "Testing a null_blk device:" +echo "case 1: if the operations work" +sudo -n modprobe null_blk nr_devices=1 zoned=1 +sudo -n chmod 0666 /dev/nullb0 + +echo "(1) report the first zone:" +$QEMU_IO $IMG -c "zrp 0 1" +echo +echo "report the first 10 zones" +$QEMU_IO $IMG -c "zrp 0 10" +echo +echo "report the last zone:" +$QEMU_IO $IMG -c "zrp 0x3e70000000 2" # 0x3e70000000 / 512 = 0x1f380000 +echo +echo +echo "(2) opening the first zone" +$QEMU_IO $IMG -c "zo 0 268435456" # 268435456 / 512 = 524288 +echo "report after:" +$QEMU_IO $IMG -c "zrp 0 1" +echo +echo "opening the second zone" +$QEMU_IO $IMG -c "zo 268435456 268435456" # +echo "report after:" +$QEMU_IO $IMG -c "zrp 268435456 1" +echo +echo "opening the last zone" +$QEMU_IO $IMG -c "zo 0x3e70000000 268435456" +echo "report after:" +$QEMU_IO $IMG -c "zrp 0x3e70000000 2" +echo +echo +echo "(3) closing the first zone" +$QEMU_IO $IMG -c "zc 0 268435456" +echo "report after:" +$QEMU_IO $IMG -c "zrp 0 1" +echo +echo "closing the last zone" +$QEMU_IO $IMG -c "zc 0x3e70000000 268435456" +echo "report after:" +$QEMU_IO $IMG -c "zrp 0x3e70000000 2" +echo +echo +echo "(4) finishing the second zone" +$QEMU_IO $IMG -c "zf 268435456 268435456" +echo "After finishing a zone:" +$QEMU_IO $IMG -c "zrp 268435456 1" +echo +echo +echo "(5) resetting the second zone" +$QEMU_IO $IMG -c "zrs 268435456 268435456" +echo "After resetting a zone:" +$QEMU_IO $IMG -c "zrp 268435456 1" + +# success, all done +echo "*** done" +rm -f $seq.full +status=0 diff --git a/tests/qemu-iotests/tests/zoned.out b/tests/qemu-iotests/tests/zoned.out new file mode 100644 index 0000000000..b2d061da49 --- /dev/null +++ b/tests/qemu-iotests/tests/zoned.out @@ -0,0 +1,53 @@ +QA output created by zoned +Testing a null_blk device: +case 1: if the operations work +(1) report the first zone: +start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:1, [type: 2] + +report the first 10 zones +start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:1, [type: 2] +start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80000, zcond:1, [type: 2] +start: 0x100000, len 0x80000, cap 0x80000, wptr 0x100000, zcond:1, [type: 2] +start: 0x180000, len 0x80000, cap 0x80000, wptr 0x180000, zcond:1, [type: 2] +start: 0x200000, len 0x80000, cap 0x80000, wptr 0x200000, zcond:1, [type: 2] +start: 0x280000, len 0x80000, cap 0x80000, wptr 0x280000, zcond:1, [type: 2] +start: 0x300000, len 0x80000, cap 0x80000, wptr 0x300000, zcond:1, [type: 2] +start: 0x380000, len 0x80000, cap 0x80000, wptr 0x380000, zcond:1, [type: 2] +start: 0x400000, len 0x80000, cap 0x80000, wptr 0x400000, zcond:1, [type: 2] +start: 0x480000, len 0x80000, cap 0x80000, wptr 0x480000, zcond:1, [type: 2] + +report the last zone: +start: 0x1f380000, len 0x80000, cap 0x80000, wptr 0x1f380000, zcond:1, [type: 2] + + +(2) opening the first zone +report after: +start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:3, [type: 2] + +opening the second zone +report after: +start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80000, zcond:3, [type: 2] + +opening the last zone +report after: +start: 0x1f380000, len 0x80000, cap 0x80000, wptr 0x1f380000, zcond:3, [type: 2] + + +(3) closing the first zone +report after: +start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:1, [type: 2] + +closing the last zone +report after: +start: 0x1f380000, len 0x80000, cap 0x80000, wptr 0x1f380000, zcond:1, [type: 2] + + +(4) finishing the second zone +After finishing a zone: +start: 0x80000, len 0x80000, cap 0x80000, wptr 0x100000, zcond:14, [type: 2] + + +(5) resetting the second zone +After resetting a zone: +start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80000, zcond:1, [type: 2] +*** done From patchwork Thu Apr 20 12:09:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 13218655 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 236C4C77B73 for ; Thu, 20 Apr 2023 12:11:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234619AbjDTMLG (ORCPT ); Thu, 20 Apr 2023 08:11:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60718 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234585AbjDTMLA (ORCPT ); Thu, 20 Apr 2023 08:11:00 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D52424695 for ; Thu, 20 Apr 2023 05:10:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681992615; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bzP2l0WT/BeijcUHX+/L8vv2O8887qjbvkE/lDDIehc=; b=Vi2wgvCNzH3l83URoDy9xkxL7qMFJimjkDB7kVzwpcIGAWg6jhXuGQ7iiOC4bSYraKEXC1 4YSrPkwG1F9reC17TGNiOXXqq8rndQUz0pstpLbTVrRZh8Wn5OMMSIZrJa0OlMlehaKygp eJNsj7QybfaLBIsXPJqUWox06o4aBoY= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-20-X3cgm63KNE-brf5guvDyLg-1; Thu, 20 Apr 2023 08:10:12 -0400 X-MC-Unique: X3cgm63KNE-brf5guvDyLg-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 5CF3B85A588; Thu, 20 Apr 2023 12:10:11 +0000 (UTC) Received: from localhost (unknown [10.39.193.254]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3E4D2C16024; Thu, 20 Apr 2023 12:10:09 +0000 (UTC) From: Stefan Hajnoczi To: qemu-devel@nongnu.org Cc: Raphael Norwitz , Kevin Wolf , Markus Armbruster , Julia Suvorova , Eric Blake , Paolo Bonzini , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , =?utf-8?q?Marc?= =?utf-8?q?-Andr=C3=A9_Lureau?= , "Michael S. Tsirkin" , Thomas Huth , qemu-block@nongnu.org, Cornelia Huck , "Dr. David Alan Gilbert" , =?utf-8?q?Daniel_P=2E_Berra?= =?utf-8?q?ng=C3=A9?= , Peter Maydell , Stefano Garzarella , kvm@vger.kernel.org, virtio-fs@redhat.com, Stefan Hajnoczi , Hanna Reitz , Fam Zheng , Aarushi Mehta , Sam Li , Dmitry Fomichev Subject: [PULL 07/20] block: add some trace events for new block layer APIs Date: Thu, 20 Apr 2023 08:09:35 -0400 Message-Id: <20230420120948.436661-8-stefanha@redhat.com> In-Reply-To: <20230420120948.436661-1-stefanha@redhat.com> References: <20230420120948.436661-1-stefanha@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Sam Li Signed-off-by: Sam Li Reviewed-by: Stefan Hajnoczi Reviewed-by: Dmitry Fomichev Acked-by: Kevin Wolf Message-id: 20230324090605.28361-8-faithilikerun@gmail.com Signed-off-by: Stefan Hajnoczi --- block/file-posix.c | 3 +++ block/trace-events | 2 ++ 2 files changed, 5 insertions(+) diff --git a/block/file-posix.c b/block/file-posix.c index 5c5bef61e2..88ec87de21 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -3266,6 +3266,7 @@ static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t offset, }, }; + trace_zbd_zone_report(bs, *nr_zones, offset >> BDRV_SECTOR_BITS); return raw_thread_pool_submit(bs, handle_aiocb_zone_report, &acb); } #endif @@ -3332,6 +3333,8 @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op, }, }; + trace_zbd_zone_mgmt(bs, op_name, offset >> BDRV_SECTOR_BITS, + len >> BDRV_SECTOR_BITS); ret = raw_thread_pool_submit(bs, handle_aiocb_zone_mgmt, &acb); if (ret != 0) { error_report("ioctl %s failed %d", op_name, ret); diff --git a/block/trace-events b/block/trace-events index 48dbf10c66..3f4e1d088a 100644 --- a/block/trace-events +++ b/block/trace-events @@ -209,6 +209,8 @@ file_FindEjectableOpticalMedia(const char *media) "Matching using %s" file_setup_cdrom(const char *partition) "Using %s as optical disc" file_hdev_is_sg(int type, int version) "SG device found: type=%d, version=%d" file_flush_fdatasync_failed(int err) "errno %d" +zbd_zone_report(void *bs, unsigned int nr_zones, int64_t sector) "bs %p report %d zones starting at sector offset 0x%" PRIx64 "" +zbd_zone_mgmt(void *bs, const char *op_name, int64_t sector, int64_t len) "bs %p %s starts at sector offset 0x%" PRIx64 " over a range of 0x%" PRIx64 " sectors" # ssh.c sftp_error(const char *op, const char *ssh_err, int ssh_err_code, int sftp_err_code) "%s failed: %s (libssh error code: %d, sftp error code: %d)" From patchwork Thu Apr 20 12:09:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 13218654 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 43DE7C77B72 for ; Thu, 20 Apr 2023 12:11:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234601AbjDTMLH (ORCPT ); Thu, 20 Apr 2023 08:11:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60730 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234580AbjDTMLA (ORCPT ); Thu, 20 Apr 2023 08:11:00 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8D5E946B2 for ; Thu, 20 Apr 2023 05:10:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681992617; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Dumx9OyXSJM0VXHbbWlRStbm/s7uCqu9hlgSJmvoF10=; b=B02Qp4+TFJBqrmQjvjFZSPPCPsaD8hqQjeNHh212+nWLW1IcQ9DWRK+0lh3AiYi4TVjqsZ IyYP1eumkbil5zEMiLSF4/s0KgwD32xIr71AphFj7cC5kAjf12BibFvN8FG4zfv+IqnkTl VYQppJrvCm73ZcVc20WK7PV+rAzQ5Jw= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-265-2125kGK8NDm6sKHD_QAVyQ-1; Thu, 20 Apr 2023 08:10:14 -0400 X-MC-Unique: 2125kGK8NDm6sKHD_QAVyQ-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id D75173C0C8A6; Thu, 20 Apr 2023 12:10:13 +0000 (UTC) Received: from localhost (unknown [10.39.193.254]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3B0F82166B33; Thu, 20 Apr 2023 12:10:12 +0000 (UTC) From: Stefan Hajnoczi To: qemu-devel@nongnu.org Cc: Raphael Norwitz , Kevin Wolf , Markus Armbruster , Julia Suvorova , Eric Blake , Paolo Bonzini , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , =?utf-8?q?Marc?= =?utf-8?q?-Andr=C3=A9_Lureau?= , "Michael S. Tsirkin" , Thomas Huth , qemu-block@nongnu.org, Cornelia Huck , "Dr. David Alan Gilbert" , =?utf-8?q?Daniel_P=2E_Berra?= =?utf-8?q?ng=C3=A9?= , Peter Maydell , Stefano Garzarella , kvm@vger.kernel.org, virtio-fs@redhat.com, Stefan Hajnoczi , Hanna Reitz , Fam Zheng , Aarushi Mehta , Sam Li , Damien Le Moal , Dmitry Fomichev Subject: [PULL 08/20] docs/zoned-storage: add zoned device documentation Date: Thu, 20 Apr 2023 08:09:36 -0400 Message-Id: <20230420120948.436661-9-stefanha@redhat.com> In-Reply-To: <20230420120948.436661-1-stefanha@redhat.com> References: <20230420120948.436661-1-stefanha@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Sam Li Add the documentation about the zoned device support to virtio-blk emulation. Signed-off-by: Sam Li Reviewed-by: Stefan Hajnoczi Reviewed-by: Damien Le Moal Reviewed-by: Dmitry Fomichev Acked-by: Kevin Wolf Message-id: 20230324090605.28361-9-faithilikerun@gmail.com [Add index-api.rst to fix "zoned-storage.rst:document isn't included in any toctree" error. --Stefan] Signed-off-by: Stefan Hajnoczi --- docs/devel/index-api.rst | 1 + docs/devel/zoned-storage.rst | 43 ++++++++++++++++++++++++++ docs/system/qemu-block-drivers.rst.inc | 6 ++++ 3 files changed, 50 insertions(+) create mode 100644 docs/devel/zoned-storage.rst diff --git a/docs/devel/index-api.rst b/docs/devel/index-api.rst index 60c0d7459d..7108821746 100644 --- a/docs/devel/index-api.rst +++ b/docs/devel/index-api.rst @@ -12,3 +12,4 @@ generated from in-code annotations to function prototypes. memory modules ui + zoned-storage diff --git a/docs/devel/zoned-storage.rst b/docs/devel/zoned-storage.rst new file mode 100644 index 0000000000..6a36133e51 --- /dev/null +++ b/docs/devel/zoned-storage.rst @@ -0,0 +1,43 @@ +============= +zoned-storage +============= + +Zoned Block Devices (ZBDs) divide the LBA space into block regions called zones +that are larger than the LBA size. They can only allow sequential writes, which +can reduce write amplification in SSDs, and potentially lead to higher +throughput and increased capacity. More details about ZBDs can be found at: + +https://zonedstorage.io/docs/introduction/zoned-storage + +1. Block layer APIs for zoned storage +------------------------------------- +QEMU block layer supports three zoned storage models: +- BLK_Z_HM: The host-managed zoned model only allows sequential writes access +to zones. It supports ZBD-specific I/O commands that can be used by a host to +manage the zones of a device. +- BLK_Z_HA: The host-aware zoned model allows random write operations in +zones, making it backward compatible with regular block devices. +- BLK_Z_NONE: The non-zoned model has no zones support. It includes both +regular and drive-managed ZBD devices. ZBD-specific I/O commands are not +supported. + +The block device information resides inside BlockDriverState. QEMU uses +BlockLimits struct(BlockDriverState::bl) that is continuously accessed by the +block layer while processing I/O requests. A BlockBackend has a root pointer to +a BlockDriverState graph(for example, raw format on top of file-posix). The +zoned storage information can be propagated from the leaf BlockDriverState all +the way up to the BlockBackend. If the zoned storage model in file-posix is +set to BLK_Z_HM, then block drivers will declare support for zoned host device. + +The block layer APIs support commands needed for zoned storage devices, +including report zones, four zone operations, and zone append. + +2. Emulating zoned storage controllers +-------------------------------------- +When the BlockBackend's BlockLimits model reports a zoned storage device, users +like the virtio-blk emulation or the qemu-io-cmds.c utility can use block layer +APIs for zoned storage emulation or testing. + +For example, to test zone_report on a null_blk device using qemu-io is: +$ path/to/qemu-io --image-opts -n driver=host_device,filename=/dev/nullb0 +-c "zrp offset nr_zones" diff --git a/docs/system/qemu-block-drivers.rst.inc b/docs/system/qemu-block-drivers.rst.inc index dfe5d2293d..105cb9679c 100644 --- a/docs/system/qemu-block-drivers.rst.inc +++ b/docs/system/qemu-block-drivers.rst.inc @@ -430,6 +430,12 @@ Hard disks you may corrupt your host data (use the ``-snapshot`` command line option or modify the device permissions accordingly). +Zoned block devices + Zoned block devices can be passed through to the guest if the emulated storage + controller supports zoned storage. Use ``--blockdev host_device, + node-name=drive0,filename=/dev/nullb0,cache.direct=on`` to pass through + ``/dev/nullb0`` as ``drive0``. + Windows ^^^^^^^ From patchwork Thu Apr 20 12:09:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 13218659 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7EE86C77B76 for ; Thu, 20 Apr 2023 12:11:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234682AbjDTMLP (ORCPT ); Thu, 20 Apr 2023 08:11:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60748 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234595AbjDTMLK (ORCPT ); Thu, 20 Apr 2023 08:11:10 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C3EF949CF for ; Thu, 20 Apr 2023 05:10:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681992621; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DyRg2nO17XfSte2Qiq60HFpJmMSh6St95+0xBJuJYUw=; b=GPMaIqXJm5ZYCIo7nMs+pPZsyuUj8C66ISyLECFYzcGwBRdaO6PRep8cjP284v1eVlvtZw uSdZM3hjKHJSKAmnrNrosEpUrxzfQ2cIcHKnapFJ2OGPRLp1dIDCMJ5NJ0EQB0ZlMZvN7w KWDiptU6xH1WlflFSd5Lkna1YFEPr+Y= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-283-ugQ21I8BN42FIut2abyb8g-1; Thu, 20 Apr 2023 08:10:16 -0400 X-MC-Unique: ugQ21I8BN42FIut2abyb8g-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 6759D800B35; Thu, 20 Apr 2023 12:10:16 +0000 (UTC) Received: from localhost (unknown [10.39.193.254]) by smtp.corp.redhat.com (Postfix) with ESMTP id 909FB1410F1C; Thu, 20 Apr 2023 12:10:15 +0000 (UTC) From: Stefan Hajnoczi To: qemu-devel@nongnu.org Cc: Raphael Norwitz , Kevin Wolf , Markus Armbruster , Julia Suvorova , Eric Blake , Paolo Bonzini , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , =?utf-8?q?Marc?= =?utf-8?q?-Andr=C3=A9_Lureau?= , "Michael S. Tsirkin" , Thomas Huth , qemu-block@nongnu.org, Cornelia Huck , "Dr. David Alan Gilbert" , =?utf-8?q?Daniel_P=2E_Berra?= =?utf-8?q?ng=C3=A9?= , Peter Maydell , Stefano Garzarella , kvm@vger.kernel.org, virtio-fs@redhat.com, Stefan Hajnoczi , Hanna Reitz , Fam Zheng , Aarushi Mehta Subject: [PULL 09/20] block/dmg: Declare a type definition for DMG uncompress function Date: Thu, 20 Apr 2023 08:09:37 -0400 Message-Id: <20230420120948.436661-10-stefanha@redhat.com> In-Reply-To: <20230420120948.436661-1-stefanha@redhat.com> References: <20230420120948.436661-1-stefanha@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Philippe Mathieu-Daudé Introduce the BdrvDmgUncompressFunc type defintion. To emphasis dmg_uncompress_bz2 and dmg_uncompress_lzfse are pointer to functions, declare them using this new typedef. Signed-off-by: Philippe Mathieu-Daudé Message-id: 20230320152610.32052-1-philmd@linaro.org Signed-off-by: Stefan Hajnoczi --- block/dmg.h | 8 ++++---- block/dmg.c | 7 ++----- 2 files changed, 6 insertions(+), 9 deletions(-) diff --git a/block/dmg.h b/block/dmg.h index e488601b62..dcd6165e63 100644 --- a/block/dmg.h +++ b/block/dmg.h @@ -51,10 +51,10 @@ typedef struct BDRVDMGState { z_stream zstream; } BDRVDMGState; -extern int (*dmg_uncompress_bz2)(char *next_in, unsigned int avail_in, - char *next_out, unsigned int avail_out); +typedef int BdrvDmgUncompressFunc(char *next_in, unsigned int avail_in, + char *next_out, unsigned int avail_out); -extern int (*dmg_uncompress_lzfse)(char *next_in, unsigned int avail_in, - char *next_out, unsigned int avail_out); +extern BdrvDmgUncompressFunc *dmg_uncompress_bz2; +extern BdrvDmgUncompressFunc *dmg_uncompress_lzfse; #endif diff --git a/block/dmg.c b/block/dmg.c index e10b9a2ba5..2769900359 100644 --- a/block/dmg.c +++ b/block/dmg.c @@ -31,11 +31,8 @@ #include "qemu/memalign.h" #include "dmg.h" -int (*dmg_uncompress_bz2)(char *next_in, unsigned int avail_in, - char *next_out, unsigned int avail_out); - -int (*dmg_uncompress_lzfse)(char *next_in, unsigned int avail_in, - char *next_out, unsigned int avail_out); +BdrvDmgUncompressFunc *dmg_uncompress_bz2; +BdrvDmgUncompressFunc *dmg_uncompress_lzfse; enum { /* Limit chunk sizes to prevent unreasonable amounts of memory being used From patchwork Thu Apr 20 12:09:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 13218658 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4C488C77B72 for ; Thu, 20 Apr 2023 12:11:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234611AbjDTMLN (ORCPT ); Thu, 20 Apr 2023 08:11:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60754 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234591AbjDTMLK (ORCPT ); Thu, 20 Apr 2023 08:11:10 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C468749C4 for ; Thu, 20 Apr 2023 05:10:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681992622; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=eqZK9/+gQ+RYD/sl35HuMKvZlCTpzsr62KI4FGTu1OM=; b=Ff1dWOXbeldL9Lgr9K1DoT3HeuTjwqK1o+oFS/Yj0fw6OrNwzI0qDvgeIy72W6/uhmz2ow QNCsCmJulWe1OQb1FSgtl03AGrhZkUygSpy448gpyMK7jCWTQUWJlkgClPL1uWtFNj48QC zRtA2imOhL7qAp1b0lPV0W77wh3wDTA= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-552-xidT4ylsMWO2qjKet-XB9g-1; Thu, 20 Apr 2023 08:10:19 -0400 X-MC-Unique: xidT4ylsMWO2qjKet-XB9g-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 2CF1C185A78B; Thu, 20 Apr 2023 12:10:19 +0000 (UTC) Received: from localhost (unknown [10.39.193.254]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6FFC42026D16; Thu, 20 Apr 2023 12:10:18 +0000 (UTC) From: Stefan Hajnoczi To: qemu-devel@nongnu.org Cc: Raphael Norwitz , Kevin Wolf , Markus Armbruster , Julia Suvorova , Eric Blake , Paolo Bonzini , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , =?utf-8?q?Marc?= =?utf-8?q?-Andr=C3=A9_Lureau?= , "Michael S. Tsirkin" , Thomas Huth , qemu-block@nongnu.org, Cornelia Huck , "Dr. David Alan Gilbert" , =?utf-8?q?Daniel_P=2E_Berra?= =?utf-8?q?ng=C3=A9?= , Peter Maydell , Stefano Garzarella , kvm@vger.kernel.org, virtio-fs@redhat.com, Stefan Hajnoczi , Hanna Reitz , Fam Zheng , Aarushi Mehta , Thomas De Schampheleire Subject: [PULL 10/20] tracetool: use relative paths for '#line' preprocessor directives Date: Thu, 20 Apr 2023 08:09:38 -0400 Message-Id: <20230420120948.436661-11-stefanha@redhat.com> In-Reply-To: <20230420120948.436661-1-stefanha@redhat.com> References: <20230420120948.436661-1-stefanha@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Thomas De Schampheleire The event filename is an absolute path. Convert it to a relative path when writing '#line' directives, to preserve reproducibility of the generated output when different base paths are used. Signed-off-by: Thomas De Schampheleire Signed-off-by: Stefan Hajnoczi Message-Id: <20230406080045.21696-1-thomas.de_schampheleire@nokia.com> --- scripts/tracetool/backend/ftrace.py | 4 +++- scripts/tracetool/backend/log.py | 4 +++- scripts/tracetool/backend/syslog.py | 4 +++- 3 files changed, 9 insertions(+), 3 deletions(-) diff --git a/scripts/tracetool/backend/ftrace.py b/scripts/tracetool/backend/ftrace.py index 5fa30ccc08..baed2ae61c 100644 --- a/scripts/tracetool/backend/ftrace.py +++ b/scripts/tracetool/backend/ftrace.py @@ -12,6 +12,8 @@ __email__ = "stefanha@redhat.com" +import os.path + from tracetool import out @@ -45,7 +47,7 @@ def generate_h(event, group): args=event.args, event_id="TRACE_" + event.name.upper(), event_lineno=event.lineno, - event_filename=event.filename, + event_filename=os.path.relpath(event.filename), fmt=event.fmt.rstrip("\n"), argnames=argnames) diff --git a/scripts/tracetool/backend/log.py b/scripts/tracetool/backend/log.py index 17ba1cd90e..de27b7e62e 100644 --- a/scripts/tracetool/backend/log.py +++ b/scripts/tracetool/backend/log.py @@ -12,6 +12,8 @@ __email__ = "stefanha@redhat.com" +import os.path + from tracetool import out @@ -53,7 +55,7 @@ def generate_h(event, group): ' }', cond=cond, event_lineno=event.lineno, - event_filename=event.filename, + event_filename=os.path.relpath(event.filename), name=event.name, fmt=event.fmt.rstrip("\n"), argnames=argnames) diff --git a/scripts/tracetool/backend/syslog.py b/scripts/tracetool/backend/syslog.py index 5a3a00fe31..012970f6cc 100644 --- a/scripts/tracetool/backend/syslog.py +++ b/scripts/tracetool/backend/syslog.py @@ -12,6 +12,8 @@ __email__ = "stefanha@redhat.com" +import os.path + from tracetool import out @@ -41,7 +43,7 @@ def generate_h(event, group): ' }', cond=cond, event_lineno=event.lineno, - event_filename=event.filename, + event_filename=os.path.relpath(event.filename), name=event.name, fmt=event.fmt.rstrip("\n"), argnames=argnames) From patchwork Thu Apr 20 12:09:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 13218660 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 58A44C77B72 for ; Thu, 20 Apr 2023 12:11:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234602AbjDTMLX (ORCPT ); Thu, 20 Apr 2023 08:11:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60776 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234687AbjDTMLQ (ORCPT ); Thu, 20 Apr 2023 08:11:16 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1E5314C00 for ; Thu, 20 Apr 2023 05:10:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681992628; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Zv4qFHHFhKR8pNTeKmXr5iiv0yWC3pqkRu90BlIzQI0=; b=iqLpIBX1RGyueWetMwBvAcREFI9xa5tXzwcRIZ7p2V8JRkyHkxijztK9wHHONUudQcwStU p1RbpEbW0Be6hzx6iSwBtnIYdFIP8+h8vtGAHhAcP6slcnlQ1LlODwYbcwJg/b7oSxuw+b dZ+QrVawZVZf9H+K67rM4qVcpC1XfPk= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-596-5isHoiuePLGfWf_qg1iF4Q-1; Thu, 20 Apr 2023 08:10:22 -0400 X-MC-Unique: 5isHoiuePLGfWf_qg1iF4Q-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 23B51811E7C; Thu, 20 Apr 2023 12:10:22 +0000 (UTC) Received: from localhost (unknown [10.39.193.254]) by smtp.corp.redhat.com (Postfix) with ESMTP id 23D521121315; Thu, 20 Apr 2023 12:10:20 +0000 (UTC) From: Stefan Hajnoczi To: qemu-devel@nongnu.org Cc: Raphael Norwitz , Kevin Wolf , Markus Armbruster , Julia Suvorova , Eric Blake , Paolo Bonzini , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , =?utf-8?q?Marc?= =?utf-8?q?-Andr=C3=A9_Lureau?= , "Michael S. Tsirkin" , Thomas Huth , qemu-block@nongnu.org, Cornelia Huck , "Dr. David Alan Gilbert" , =?utf-8?q?Daniel_P=2E_Berra?= =?utf-8?q?ng=C3=A9?= , Peter Maydell , Stefano Garzarella , kvm@vger.kernel.org, virtio-fs@redhat.com, Stefan Hajnoczi , Hanna Reitz , Fam Zheng , Aarushi Mehta , Sam Li Subject: [PULL 11/20] file-posix: add tracking of the zone write pointers Date: Thu, 20 Apr 2023 08:09:39 -0400 Message-Id: <20230420120948.436661-12-stefanha@redhat.com> In-Reply-To: <20230420120948.436661-1-stefanha@redhat.com> References: <20230420120948.436661-1-stefanha@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Sam Li Since Linux doesn't have a user API to issue zone append operations to zoned devices from user space, the file-posix driver is modified to add zone append emulation using regular writes. To do this, the file-posix driver tracks the wp location of all zones of the device. It uses an array of uint64_t. The most significant bit of each wp location indicates if the zone type is conventional zones. The zones wp can be changed due to the following operations issued: - zone reset: change the wp to the start offset of that zone - zone finish: change to the end location of that zone - write to a zone - zone append Signed-off-by: Sam Li Message-id: 20230407081657.17947-2-faithilikerun@gmail.com Signed-off-by: Stefan Hajnoczi --- include/block/block-common.h | 14 +++ include/block/block_int-common.h | 5 + block/file-posix.c | 173 ++++++++++++++++++++++++++++++- 3 files changed, 189 insertions(+), 3 deletions(-) diff --git a/include/block/block-common.h b/include/block/block-common.h index 1576fcf2ed..93196229ac 100644 --- a/include/block/block-common.h +++ b/include/block/block-common.h @@ -118,6 +118,14 @@ typedef struct BlockZoneDescriptor { BlockZoneState state; } BlockZoneDescriptor; +/* + * Track write pointers of a zone in bytes. + */ +typedef struct BlockZoneWps { + CoMutex colock; + uint64_t wp[]; +} BlockZoneWps; + typedef struct BlockDriverInfo { /* in bytes, 0 if irrelevant */ int cluster_size; @@ -240,6 +248,12 @@ typedef enum { #define BDRV_SECTOR_BITS 9 #define BDRV_SECTOR_SIZE (1ULL << BDRV_SECTOR_BITS) +/* + * Get the first most significant bit of wp. If it is zero, then + * the zone type is SWR. + */ +#define BDRV_ZT_IS_CONV(wp) (wp & (1ULL << 63)) + #define BDRV_REQUEST_MAX_SECTORS MIN_CONST(SIZE_MAX >> BDRV_SECTOR_BITS, \ INT_MAX >> BDRV_SECTOR_BITS) #define BDRV_REQUEST_MAX_BYTES (BDRV_REQUEST_MAX_SECTORS << BDRV_SECTOR_BITS) diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h index 0164dc76e2..fba95181bd 100644 --- a/include/block/block_int-common.h +++ b/include/block/block_int-common.h @@ -890,6 +890,8 @@ typedef struct BlockLimits { /* maximum number of active zones */ int64_t max_active_zones; + + int64_t write_granularity; } BlockLimits; typedef struct BdrvOpBlocker BdrvOpBlocker; @@ -1251,6 +1253,9 @@ struct BlockDriverState { CoMutex bsc_modify_lock; /* Always non-NULL, but must only be dereferenced under an RCU read guard */ BdrvBlockStatusCache *block_status_cache; + + /* array of write pointers' location of each zone in the zoned device. */ + BlockZoneWps *wps; }; struct BlockBackendRootState { diff --git a/block/file-posix.c b/block/file-posix.c index 88ec87de21..e6d0df052e 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -1324,6 +1324,90 @@ static int hdev_get_max_segments(int fd, struct stat *st) #endif } +#if defined(CONFIG_BLKZONED) +/* + * If the reset_all flag is true, then the wps of zone whose state is + * not readonly or offline should be all reset to the start sector. + * Else, take the real wp of the device. + */ +static int get_zones_wp(BlockDriverState *bs, int fd, int64_t offset, + unsigned int nrz, bool reset_all) +{ + struct blk_zone *blkz; + size_t rep_size; + uint64_t sector = offset >> BDRV_SECTOR_BITS; + BlockZoneWps *wps = bs->wps; + int j = offset / bs->bl.zone_size; + int ret, n = 0, i = 0; + rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_zone); + g_autofree struct blk_zone_report *rep = NULL; + + rep = g_malloc(rep_size); + blkz = (struct blk_zone *)(rep + 1); + while (n < nrz) { + memset(rep, 0, rep_size); + rep->sector = sector; + rep->nr_zones = nrz - n; + + do { + ret = ioctl(fd, BLKREPORTZONE, rep); + } while (ret != 0 && errno == EINTR); + if (ret != 0) { + error_report("%d: ioctl BLKREPORTZONE at %" PRId64 " failed %d", + fd, offset, errno); + return -errno; + } + + if (!rep->nr_zones) { + break; + } + + for (i = 0; i < rep->nr_zones; ++i, ++n, ++j) { + /* + * The wp tracking cares only about sequential writes required and + * sequential write preferred zones so that the wp can advance to + * the right location. + * Use the most significant bit of the wp location to indicate the + * zone type: 0 for SWR/SWP zones and 1 for conventional zones. + */ + if (blkz[i].type == BLK_ZONE_TYPE_CONVENTIONAL) { + wps->wp[j] |= 1ULL << 63; + } else { + switch(blkz[i].cond) { + case BLK_ZONE_COND_FULL: + case BLK_ZONE_COND_READONLY: + /* Zone not writable */ + wps->wp[j] = (blkz[i].start + blkz[i].len) << BDRV_SECTOR_BITS; + break; + case BLK_ZONE_COND_OFFLINE: + /* Zone not writable nor readable */ + wps->wp[j] = (blkz[i].start) << BDRV_SECTOR_BITS; + break; + default: + if (reset_all) { + wps->wp[j] = blkz[i].start << BDRV_SECTOR_BITS; + } else { + wps->wp[j] = blkz[i].wp << BDRV_SECTOR_BITS; + } + break; + } + } + } + sector = blkz[i - 1].start + blkz[i - 1].len; + } + + return 0; +} + +static void update_zones_wp(BlockDriverState *bs, int fd, int64_t offset, + unsigned int nrz) +{ + if (get_zones_wp(bs, fd, offset, nrz, 0) < 0) { + error_report("update zone wp failed"); + } +} +#endif + static void raw_refresh_limits(BlockDriverState *bs, Error **errp) { BDRVRawState *s = bs->opaque; @@ -1413,6 +1497,23 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp) if (ret >= 0) { bs->bl.max_active_zones = ret; } + + ret = get_sysfs_long_val(&st, "physical_block_size"); + if (ret >= 0) { + bs->bl.write_granularity = ret; + } + + /* The refresh_limits() function can be called multiple times. */ + g_free(bs->wps); + bs->wps = g_malloc(sizeof(BlockZoneWps) + + sizeof(int64_t) * bs->bl.nr_zones); + ret = get_zones_wp(bs, s->fd, 0, bs->bl.nr_zones, 0); + if (ret < 0) { + error_setg_errno(errp, -ret, "report wps failed"); + bs->wps = NULL; + return; + } + qemu_co_mutex_init(&bs->wps->colock); return; } out: @@ -2338,9 +2439,15 @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset, { BDRVRawState *s = bs->opaque; RawPosixAIOData acb; + int ret; if (fd_open(bs) < 0) return -EIO; +#if defined(CONFIG_BLKZONED) + if (type & QEMU_AIO_WRITE && bs->wps) { + qemu_co_mutex_lock(&bs->wps->colock); + } +#endif /* * When using O_DIRECT, the request must be aligned to be able to use @@ -2354,14 +2461,16 @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset, } else if (s->use_linux_io_uring) { LuringState *aio = aio_get_linux_io_uring(bdrv_get_aio_context(bs)); assert(qiov->size == bytes); - return luring_co_submit(bs, aio, s->fd, offset, qiov, type); + ret = luring_co_submit(bs, aio, s->fd, offset, qiov, type); + goto out; #endif #ifdef CONFIG_LINUX_AIO } else if (s->use_linux_aio) { LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs)); assert(qiov->size == bytes); - return laio_co_submit(bs, aio, s->fd, offset, qiov, type, + ret = laio_co_submit(bs, aio, s->fd, offset, qiov, type, s->aio_max_batch); + goto out; #endif } @@ -2378,7 +2487,32 @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset, }; assert(qiov->size == bytes); - return raw_thread_pool_submit(bs, handle_aiocb_rw, &acb); + ret = raw_thread_pool_submit(bs, handle_aiocb_rw, &acb); + +out: +#if defined(CONFIG_BLKZONED) + BlockZoneWps *wps = bs->wps; + if (ret == 0) { + if (type & QEMU_AIO_WRITE && wps && bs->bl.zone_size) { + uint64_t *wp = &wps->wp[offset / bs->bl.zone_size]; + if (!BDRV_ZT_IS_CONV(*wp)) { + /* Advance the wp if needed */ + if (offset + bytes > *wp) { + *wp = offset + bytes; + } + } + } + } else { + if (type & QEMU_AIO_WRITE) { + update_zones_wp(bs, s->fd, 0, 1); + } + } + + if (type & QEMU_AIO_WRITE && wps) { + qemu_co_mutex_unlock(&wps->colock); + } +#endif + return ret; } static int coroutine_fn raw_co_preadv(BlockDriverState *bs, int64_t offset, @@ -2486,6 +2620,9 @@ static void raw_close(BlockDriverState *bs) BDRVRawState *s = bs->opaque; if (s->fd >= 0) { +#if defined(CONFIG_BLKZONED) + g_free(bs->wps); +#endif qemu_close(s->fd); s->fd = -1; } @@ -3283,6 +3420,7 @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op, const char *op_name; unsigned long zo; int ret; + BlockZoneWps *wps = bs->wps; int64_t capacity = bs->total_sectors << BDRV_SECTOR_BITS; zone_size = bs->bl.zone_size; @@ -3300,6 +3438,15 @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op, return -EINVAL; } + QEMU_LOCK_GUARD(&wps->colock); + uint32_t i = offset / bs->bl.zone_size; + uint32_t nrz = len / bs->bl.zone_size; + uint64_t *wp = &wps->wp[i]; + if (BDRV_ZT_IS_CONV(*wp) && len != capacity) { + error_report("zone mgmt operations are not allowed for conventional zones"); + return -EIO; + } + switch (op) { case BLK_ZO_OPEN: op_name = "BLKOPENZONE"; @@ -3337,7 +3484,27 @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op, len >> BDRV_SECTOR_BITS); ret = raw_thread_pool_submit(bs, handle_aiocb_zone_mgmt, &acb); if (ret != 0) { + update_zones_wp(bs, s->fd, offset, i); error_report("ioctl %s failed %d", op_name, ret); + return ret; + } + + if (zo == BLKRESETZONE && len == capacity) { + ret = get_zones_wp(bs, s->fd, 0, bs->bl.nr_zones, 1); + if (ret < 0) { + error_report("reporting single wp failed"); + return ret; + } + } else if (zo == BLKRESETZONE) { + for (int j = 0; j < nrz; ++j) { + wp[j] = offset + j * zone_size; + } + } else if (zo == BLKFINISHZONE) { + for (int j = 0; j < nrz; ++j) { + /* The zoned device allows the last zone smaller that the + * zone size. */ + wp[j] = MIN(offset + (j + 1) * zone_size, offset + len); + } } return ret; From patchwork Thu Apr 20 12:09:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 13218662 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E7C7EC77B7C for ; Thu, 20 Apr 2023 12:11:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234686AbjDTML0 (ORCPT ); Thu, 20 Apr 2023 08:11:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60814 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234676AbjDTMLV (ORCPT ); Thu, 20 Apr 2023 08:11:21 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8CF5846A6 for ; Thu, 20 Apr 2023 05:10:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681992628; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SvNJKi4ELWUGE+VZkyWsO5nnxyITi0Li4kDv2KjmLbo=; b=VnPLiAiScGeuQ9P3mDsk8apScTcRsFex32u18Lwcb8Gcc8+5nyuyVvLBUEM3d53LGz1nKz 83Y434k6f7q4D5h1fCGwkrtFAyXGt7hQr8yUcJ0SlneqjzwRKyD90GDv8UPEum5RJl+YWU P8724329VEEkZ9QayScLs6QxztfeA6c= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-313-bUfeuD0VOued47atnMt-_w-1; Thu, 20 Apr 2023 08:10:25 -0400 X-MC-Unique: bUfeuD0VOued47atnMt-_w-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 200FA811E7E; Thu, 20 Apr 2023 12:10:25 +0000 (UTC) Received: from localhost (unknown [10.39.193.254]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3B4B61121315; Thu, 20 Apr 2023 12:10:23 +0000 (UTC) From: Stefan Hajnoczi To: qemu-devel@nongnu.org Cc: Raphael Norwitz , Kevin Wolf , Markus Armbruster , Julia Suvorova , Eric Blake , Paolo Bonzini , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , =?utf-8?q?Marc?= =?utf-8?q?-Andr=C3=A9_Lureau?= , "Michael S. Tsirkin" , Thomas Huth , qemu-block@nongnu.org, Cornelia Huck , "Dr. David Alan Gilbert" , =?utf-8?q?Daniel_P=2E_Berra?= =?utf-8?q?ng=C3=A9?= , Peter Maydell , Stefano Garzarella , kvm@vger.kernel.org, virtio-fs@redhat.com, Stefan Hajnoczi , Hanna Reitz , Fam Zheng , Aarushi Mehta , Sam Li , Dmitry Fomichev Subject: [PULL 12/20] block: introduce zone append write for zoned devices Date: Thu, 20 Apr 2023 08:09:40 -0400 Message-Id: <20230420120948.436661-13-stefanha@redhat.com> In-Reply-To: <20230420120948.436661-1-stefanha@redhat.com> References: <20230420120948.436661-1-stefanha@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Sam Li A zone append command is a write operation that specifies the first logical block of a zone as the write position. When writing to a zoned block device using zone append, the byte offset of the call may point at any position within the zone to which the data is being appended. Upon completion the device will respond with the position where the data has been written in the zone. Signed-off-by: Sam Li Reviewed-by: Dmitry Fomichev Reviewed-by: Stefan Hajnoczi Message-id: 20230407081657.17947-3-faithilikerun@gmail.com Signed-off-by: Stefan Hajnoczi --- include/block/block-io.h | 4 +++ include/block/block_int-common.h | 3 ++ include/block/raw-aio.h | 4 ++- include/sysemu/block-backend-io.h | 9 +++++ block/block-backend.c | 60 +++++++++++++++++++++++++++++++ block/file-posix.c | 58 ++++++++++++++++++++++++++---- block/io.c | 27 ++++++++++++++ block/io_uring.c | 4 +++ block/linux-aio.c | 3 ++ block/raw-format.c | 8 +++++ 10 files changed, 172 insertions(+), 8 deletions(-) diff --git a/include/block/block-io.h b/include/block/block-io.h index 58f415ab64..f44e524a1c 100644 --- a/include/block/block-io.h +++ b/include/block/block-io.h @@ -119,6 +119,10 @@ int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_report(BlockDriverState *bs, int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op, int64_t offset, int64_t len); +int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_append(BlockDriverState *bs, + int64_t *offset, + QEMUIOVector *qiov, + BdrvRequestFlags flags); bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs); int bdrv_block_status(BlockDriverState *bs, int64_t offset, diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h index fba95181bd..06aea47480 100644 --- a/include/block/block_int-common.h +++ b/include/block/block_int-common.h @@ -722,6 +722,9 @@ struct BlockDriver { BlockZoneDescriptor *zones); int coroutine_fn (*bdrv_co_zone_mgmt)(BlockDriverState *bs, BlockZoneOp op, int64_t offset, int64_t len); + int coroutine_fn (*bdrv_co_zone_append)(BlockDriverState *bs, + int64_t *offset, QEMUIOVector *qiov, + BdrvRequestFlags flags); /* removable device specific */ bool coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_is_inserted)( diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h index eda6a7a253..fb9c9f5a01 100644 --- a/include/block/raw-aio.h +++ b/include/block/raw-aio.h @@ -30,6 +30,7 @@ #define QEMU_AIO_TRUNCATE 0x0080 #define QEMU_AIO_ZONE_REPORT 0x0100 #define QEMU_AIO_ZONE_MGMT 0x0200 +#define QEMU_AIO_ZONE_APPEND 0x0400 #define QEMU_AIO_TYPE_MASK \ (QEMU_AIO_READ | \ QEMU_AIO_WRITE | \ @@ -40,7 +41,8 @@ QEMU_AIO_COPY_RANGE | \ QEMU_AIO_TRUNCATE | \ QEMU_AIO_ZONE_REPORT | \ - QEMU_AIO_ZONE_MGMT) + QEMU_AIO_ZONE_MGMT | \ + QEMU_AIO_ZONE_APPEND) /* AIO flags */ #define QEMU_AIO_MISALIGNED 0x1000 diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-io.h index 097a7a3851..3122017bdf 100644 --- a/include/sysemu/block-backend-io.h +++ b/include/sysemu/block-backend-io.h @@ -53,6 +53,9 @@ BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset, BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op, int64_t offset, int64_t len, BlockCompletionFunc *cb, void *opaque); +BlockAIOCB *blk_aio_zone_append(BlockBackend *blk, int64_t *offset, + QEMUIOVector *qiov, BdrvRequestFlags flags, + BlockCompletionFunc *cb, void *opaque); BlockAIOCB *blk_aio_pdiscard(BlockBackend *blk, int64_t offset, int64_t bytes, BlockCompletionFunc *cb, void *opaque); void blk_aio_cancel_async(BlockAIOCB *acb); @@ -203,6 +206,12 @@ int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op, int64_t offset, int64_t len); int co_wrapper_mixed blk_zone_mgmt(BlockBackend *blk, BlockZoneOp op, int64_t offset, int64_t len); +int coroutine_fn blk_co_zone_append(BlockBackend *blk, int64_t *offset, + QEMUIOVector *qiov, + BdrvRequestFlags flags); +int co_wrapper_mixed blk_zone_append(BlockBackend *blk, int64_t *offset, + QEMUIOVector *qiov, + BdrvRequestFlags flags); int co_wrapper_mixed blk_pdiscard(BlockBackend *blk, int64_t offset, int64_t bytes); diff --git a/block/block-backend.c b/block/block-backend.c index 4475b8b085..6b2b92b7ff 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -1915,6 +1915,45 @@ BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op, return &acb->common; } +static void coroutine_fn blk_aio_zone_append_entry(void *opaque) +{ + BlkAioEmAIOCB *acb = opaque; + BlkRwCo *rwco = &acb->rwco; + + rwco->ret = blk_co_zone_append(rwco->blk, (int64_t *)acb->bytes, + rwco->iobuf, rwco->flags); + blk_aio_complete(acb); +} + +BlockAIOCB *blk_aio_zone_append(BlockBackend *blk, int64_t *offset, + QEMUIOVector *qiov, BdrvRequestFlags flags, + BlockCompletionFunc *cb, void *opaque) { + BlkAioEmAIOCB *acb; + Coroutine *co; + IO_CODE(); + + blk_inc_in_flight(blk); + acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque); + acb->rwco = (BlkRwCo) { + .blk = blk, + .ret = NOT_DONE, + .flags = flags, + .iobuf = qiov, + }; + acb->bytes = (int64_t)offset; + acb->has_returned = false; + + co = qemu_coroutine_create(blk_aio_zone_append_entry, acb); + aio_co_enter(blk_get_aio_context(blk), co); + acb->has_returned = true; + if (acb->rwco.ret != NOT_DONE) { + replay_bh_schedule_oneshot_event(blk_get_aio_context(blk), + blk_aio_complete_bh, acb); + } + + return &acb->common; +} + /* * Send a zone_report command. * offset is a byte offset from the start of the device. No alignment @@ -1966,6 +2005,27 @@ int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op, return ret; } +/* + * Send a zone_append command. + */ +int coroutine_fn blk_co_zone_append(BlockBackend *blk, int64_t *offset, + QEMUIOVector *qiov, BdrvRequestFlags flags) +{ + int ret; + IO_CODE(); + + blk_inc_in_flight(blk); + blk_wait_while_drained(blk); + if (!blk_is_available(blk)) { + blk_dec_in_flight(blk); + return -ENOMEDIUM; + } + + ret = bdrv_co_zone_append(blk_bs(blk), offset, qiov, flags); + blk_dec_in_flight(blk); + return ret; +} + void blk_drain(BlockBackend *blk) { BlockDriverState *bs = blk_bs(blk); diff --git a/block/file-posix.c b/block/file-posix.c index e6d0df052e..444b34dad2 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -160,6 +160,7 @@ typedef struct BDRVRawState { bool has_write_zeroes:1; bool use_linux_aio:1; bool use_linux_io_uring:1; + int64_t *offset; /* offset of zone append operation */ int page_cache_inconsistent; /* errno from fdatasync failure */ bool has_fallocate; bool needs_alignment; @@ -1687,7 +1688,7 @@ static ssize_t handle_aiocb_rw_vector(RawPosixAIOData *aiocb) ssize_t len; len = RETRY_ON_EINTR( - (aiocb->aio_type & QEMU_AIO_WRITE) ? + (aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) ? qemu_pwritev(aiocb->aio_fildes, aiocb->io.iov, aiocb->io.niov, @@ -1716,7 +1717,7 @@ static ssize_t handle_aiocb_rw_linear(RawPosixAIOData *aiocb, char *buf) ssize_t len; while (offset < aiocb->aio_nbytes) { - if (aiocb->aio_type & QEMU_AIO_WRITE) { + if (aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) { len = pwrite(aiocb->aio_fildes, (const char *)buf + offset, aiocb->aio_nbytes - offset, @@ -1809,7 +1810,7 @@ static int handle_aiocb_rw(void *opaque) } nbytes = handle_aiocb_rw_linear(aiocb, buf); - if (!(aiocb->aio_type & QEMU_AIO_WRITE)) { + if (!(aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND))) { char *p = buf; size_t count = aiocb->aio_nbytes, copy; int i; @@ -2444,8 +2445,12 @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset, if (fd_open(bs) < 0) return -EIO; #if defined(CONFIG_BLKZONED) - if (type & QEMU_AIO_WRITE && bs->wps) { + if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) && bs->wps) { qemu_co_mutex_lock(&bs->wps->colock); + if (type & QEMU_AIO_ZONE_APPEND && bs->bl.zone_size) { + int index = offset / bs->bl.zone_size; + offset = bs->wps->wp[index]; + } } #endif @@ -2493,9 +2498,13 @@ out: #if defined(CONFIG_BLKZONED) BlockZoneWps *wps = bs->wps; if (ret == 0) { - if (type & QEMU_AIO_WRITE && wps && bs->bl.zone_size) { + if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) + && wps && bs->bl.zone_size) { uint64_t *wp = &wps->wp[offset / bs->bl.zone_size]; if (!BDRV_ZT_IS_CONV(*wp)) { + if (type & QEMU_AIO_ZONE_APPEND) { + *s->offset = *wp; + } /* Advance the wp if needed */ if (offset + bytes > *wp) { *wp = offset + bytes; @@ -2503,12 +2512,12 @@ out: } } } else { - if (type & QEMU_AIO_WRITE) { + if (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) { update_zones_wp(bs, s->fd, 0, 1); } } - if (type & QEMU_AIO_WRITE && wps) { + if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) && wps) { qemu_co_mutex_unlock(&wps->colock); } #endif @@ -3511,6 +3520,40 @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op, } #endif +#if defined(CONFIG_BLKZONED) +static int coroutine_fn raw_co_zone_append(BlockDriverState *bs, + int64_t *offset, + QEMUIOVector *qiov, + BdrvRequestFlags flags) { + assert(flags == 0); + int64_t zone_size_mask = bs->bl.zone_size - 1; + int64_t iov_len = 0; + int64_t len = 0; + BDRVRawState *s = bs->opaque; + s->offset = offset; + + if (*offset & zone_size_mask) { + error_report("sector offset %" PRId64 " is not aligned to zone size " + "%" PRId32 "", *offset / 512, bs->bl.zone_size / 512); + return -EINVAL; + } + + int64_t wg = bs->bl.write_granularity; + int64_t wg_mask = wg - 1; + for (int i = 0; i < qiov->niov; i++) { + iov_len = qiov->iov[i].iov_len; + if (iov_len & wg_mask) { + error_report("len of IOVector[%d] %" PRId64 " is not aligned to " + "block size %" PRId64 "", i, iov_len, wg); + return -EINVAL; + } + len += iov_len; + } + + return raw_co_prw(bs, *offset, len, qiov, QEMU_AIO_ZONE_APPEND); +} +#endif + static coroutine_fn int raw_do_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes, bool blkdev) @@ -4272,6 +4315,7 @@ static BlockDriver bdrv_host_device = { /* zone management operations */ .bdrv_co_zone_report = raw_co_zone_report, .bdrv_co_zone_mgmt = raw_co_zone_mgmt, + .bdrv_co_zone_append = raw_co_zone_append, #endif }; diff --git a/block/io.c b/block/io.c index ca941457d4..79e14b674c 100644 --- a/block/io.c +++ b/block/io.c @@ -3156,6 +3156,33 @@ out: return co.ret; } +int coroutine_fn bdrv_co_zone_append(BlockDriverState *bs, int64_t *offset, + QEMUIOVector *qiov, + BdrvRequestFlags flags) +{ + int ret; + BlockDriver *drv = bs->drv; + CoroutineIOCompletion co = { + .coroutine = qemu_coroutine_self(), + }; + IO_CODE(); + + ret = bdrv_check_qiov_request(*offset, qiov->size, qiov, 0, NULL); + if (ret < 0) { + return ret; + } + + bdrv_inc_in_flight(bs); + if (!drv || !drv->bdrv_co_zone_append || bs->bl.zoned == BLK_Z_NONE) { + co.ret = -ENOTSUP; + goto out; + } + co.ret = drv->bdrv_co_zone_append(bs, offset, qiov, flags); +out: + bdrv_dec_in_flight(bs); + return co.ret; +} + void *qemu_blockalign(BlockDriverState *bs, size_t size) { IO_CODE(); diff --git a/block/io_uring.c b/block/io_uring.c index 973e15d876..f7488c241a 100644 --- a/block/io_uring.c +++ b/block/io_uring.c @@ -345,6 +345,10 @@ static int luring_do_submit(int fd, LuringAIOCB *luringcb, LuringState *s, io_uring_prep_writev(sqes, fd, luringcb->qiov->iov, luringcb->qiov->niov, offset); break; + case QEMU_AIO_ZONE_APPEND: + io_uring_prep_writev(sqes, fd, luringcb->qiov->iov, + luringcb->qiov->niov, offset); + break; case QEMU_AIO_READ: io_uring_prep_readv(sqes, fd, luringcb->qiov->iov, luringcb->qiov->niov, offset); diff --git a/block/linux-aio.c b/block/linux-aio.c index d2cfb7f523..1959834156 100644 --- a/block/linux-aio.c +++ b/block/linux-aio.c @@ -389,6 +389,9 @@ static int laio_do_submit(int fd, struct qemu_laiocb *laiocb, off_t offset, case QEMU_AIO_WRITE: io_prep_pwritev(iocbs, fd, qiov->iov, qiov->niov, offset); break; + case QEMU_AIO_ZONE_APPEND: + io_prep_pwritev(iocbs, fd, qiov->iov, qiov->niov, offset); + break; case QEMU_AIO_READ: io_prep_preadv(iocbs, fd, qiov->iov, qiov->niov, offset); break; diff --git a/block/raw-format.c b/block/raw-format.c index 1a1dce8da4..9816f1af80 100644 --- a/block/raw-format.c +++ b/block/raw-format.c @@ -332,6 +332,13 @@ raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op, return bdrv_co_zone_mgmt(bs->file->bs, op, offset, len); } +static int coroutine_fn GRAPH_RDLOCK +raw_co_zone_append(BlockDriverState *bs,int64_t *offset, QEMUIOVector *qiov, + BdrvRequestFlags flags) +{ + return bdrv_co_zone_append(bs->file->bs, offset, qiov, flags); +} + static int64_t coroutine_fn GRAPH_RDLOCK raw_co_getlength(BlockDriverState *bs) { @@ -637,6 +644,7 @@ BlockDriver bdrv_raw = { .bdrv_co_pdiscard = &raw_co_pdiscard, .bdrv_co_zone_report = &raw_co_zone_report, .bdrv_co_zone_mgmt = &raw_co_zone_mgmt, + .bdrv_co_zone_append = &raw_co_zone_append, .bdrv_co_block_status = &raw_co_block_status, .bdrv_co_copy_range_from = &raw_co_copy_range_from, .bdrv_co_copy_range_to = &raw_co_copy_range_to, From patchwork Thu Apr 20 12:09:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 13218661 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2606BC77B73 for ; Thu, 20 Apr 2023 12:11:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234654AbjDTMLY (ORCPT ); Thu, 20 Apr 2023 08:11:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60862 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234591AbjDTMLS (ORCPT ); Thu, 20 Apr 2023 08:11:18 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A4EE94EC6 for ; Thu, 20 Apr 2023 05:10:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681992633; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PrFpVAN6JVD0pfkfnDSKx7XSwcNvFBzPVVBfVtgxg3Q=; b=BxVRQ/TjxZMCX8//FVLQquvxz1d/4o61tJLA/AiKA9v86e506bRuL1kIyIx3jONMqE3HL8 1MBnVcrtScMqScKKAZO/7yypik3l1EtxKRF9F1HuuKHqWdzQByzuNs5MfHsfoJwmXCXrH+ +u/0C5KMS6JwW4n3fU6Xgii/5hxe3xI= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-581-LWG6yLJgMoiquaLYconLfg-1; Thu, 20 Apr 2023 08:10:28 -0400 X-MC-Unique: LWG6yLJgMoiquaLYconLfg-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id A7E62811E7C; Thu, 20 Apr 2023 12:10:27 +0000 (UTC) Received: from localhost (unknown [10.39.193.254]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0E036492B05; Thu, 20 Apr 2023 12:10:26 +0000 (UTC) From: Stefan Hajnoczi To: qemu-devel@nongnu.org Cc: Raphael Norwitz , Kevin Wolf , Markus Armbruster , Julia Suvorova , Eric Blake , Paolo Bonzini , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , =?utf-8?q?Marc?= =?utf-8?q?-Andr=C3=A9_Lureau?= , "Michael S. Tsirkin" , Thomas Huth , qemu-block@nongnu.org, Cornelia Huck , "Dr. David Alan Gilbert" , =?utf-8?q?Daniel_P=2E_Berra?= =?utf-8?q?ng=C3=A9?= , Peter Maydell , Stefano Garzarella , kvm@vger.kernel.org, virtio-fs@redhat.com, Stefan Hajnoczi , Hanna Reitz , Fam Zheng , Aarushi Mehta , Sam Li Subject: [PULL 13/20] qemu-iotests: test zone append operation Date: Thu, 20 Apr 2023 08:09:41 -0400 Message-Id: <20230420120948.436661-14-stefanha@redhat.com> In-Reply-To: <20230420120948.436661-1-stefanha@redhat.com> References: <20230420120948.436661-1-stefanha@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.9 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Sam Li The patch tests zone append writes by reporting the zone wp after the completion of the call. "zap -p" option can print the sector offset value after completion, which should be the start sector where the append write begins. Signed-off-by: Sam Li Reviewed-by: Stefan Hajnoczi Message-id: 20230407081657.17947-4-faithilikerun@gmail.com Signed-off-by: Stefan Hajnoczi --- qemu-io-cmds.c | 75 ++++++++++++++++++++++++++++++ tests/qemu-iotests/tests/zoned | 16 +++++++ tests/qemu-iotests/tests/zoned.out | 16 +++++++ 3 files changed, 107 insertions(+) diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c index f35ea627d7..3f75d2f5a6 100644 --- a/qemu-io-cmds.c +++ b/qemu-io-cmds.c @@ -1874,6 +1874,80 @@ static const cmdinfo_t zone_reset_cmd = { .oneline = "reset a zone write pointer in zone block device", }; +static int do_aio_zone_append(BlockBackend *blk, QEMUIOVector *qiov, + int64_t *offset, int flags, int *total) +{ + int async_ret = NOT_DONE; + + blk_aio_zone_append(blk, offset, qiov, flags, aio_rw_done, &async_ret); + while (async_ret == NOT_DONE) { + main_loop_wait(false); + } + + *total = qiov->size; + return async_ret < 0 ? async_ret : 1; +} + +static int zone_append_f(BlockBackend *blk, int argc, char **argv) +{ + int ret; + bool pflag = false; + int flags = 0; + int total = 0; + int64_t offset; + char *buf; + int c, nr_iov; + int pattern = 0xcd; + QEMUIOVector qiov; + + if (optind > argc - 3) { + return -EINVAL; + } + + if ((c = getopt(argc, argv, "p")) != -1) { + pflag = true; + } + + offset = cvtnum(argv[optind]); + if (offset < 0) { + print_cvtnum_err(offset, argv[optind]); + return offset; + } + optind++; + nr_iov = argc - optind; + buf = create_iovec(blk, &qiov, &argv[optind], nr_iov, pattern, + flags & BDRV_REQ_REGISTERED_BUF); + if (buf == NULL) { + return -EINVAL; + } + ret = do_aio_zone_append(blk, &qiov, &offset, flags, &total); + if (ret < 0) { + printf("zone append failed: %s\n", strerror(-ret)); + goto out; + } + + if (pflag) { + printf("After zap done, the append sector is 0x%" PRIx64 "\n", + tosector(offset)); + } + +out: + qemu_io_free(blk, buf, qiov.size, + flags & BDRV_REQ_REGISTERED_BUF); + qemu_iovec_destroy(&qiov); + return ret; +} + +static const cmdinfo_t zone_append_cmd = { + .name = "zone_append", + .altname = "zap", + .cfunc = zone_append_f, + .argmin = 3, + .argmax = 4, + .args = "offset len [len..]", + .oneline = "append write a number of bytes at a specified offset", +}; + static int truncate_f(BlockBackend *blk, int argc, char **argv); static const cmdinfo_t truncate_cmd = { .name = "truncate", @@ -2672,6 +2746,7 @@ static void __attribute((constructor)) init_qemuio_commands(void) qemuio_add_command(&zone_close_cmd); qemuio_add_command(&zone_finish_cmd); qemuio_add_command(&zone_reset_cmd); + qemuio_add_command(&zone_append_cmd); qemuio_add_command(&truncate_cmd); qemuio_add_command(&length_cmd); qemuio_add_command(&info_cmd); diff --git a/tests/qemu-iotests/tests/zoned b/tests/qemu-iotests/tests/zoned index 56f60616b5..3d23ce9cc1 100755 --- a/tests/qemu-iotests/tests/zoned +++ b/tests/qemu-iotests/tests/zoned @@ -82,6 +82,22 @@ echo "(5) resetting the second zone" $QEMU_IO $IMG -c "zrs 268435456 268435456" echo "After resetting a zone:" $QEMU_IO $IMG -c "zrp 268435456 1" +echo +echo +echo "(6) append write" # the physical block size of the device is 4096 +$QEMU_IO $IMG -c "zrp 0 1" +$QEMU_IO $IMG -c "zap -p 0 0x1000 0x2000" +echo "After appending the first zone firstly:" +$QEMU_IO $IMG -c "zrp 0 1" +$QEMU_IO $IMG -c "zap -p 0 0x1000 0x2000" +echo "After appending the first zone secondly:" +$QEMU_IO $IMG -c "zrp 0 1" +$QEMU_IO $IMG -c "zap -p 268435456 0x1000 0x2000" +echo "After appending the second zone firstly:" +$QEMU_IO $IMG -c "zrp 268435456 1" +$QEMU_IO $IMG -c "zap -p 268435456 0x1000 0x2000" +echo "After appending the second zone secondly:" +$QEMU_IO $IMG -c "zrp 268435456 1" # success, all done echo "*** done" diff --git a/tests/qemu-iotests/tests/zoned.out b/tests/qemu-iotests/tests/zoned.out index b2d061da49..fe53ba4744 100644 --- a/tests/qemu-iotests/tests/zoned.out +++ b/tests/qemu-iotests/tests/zoned.out @@ -50,4 +50,20 @@ start: 0x80000, len 0x80000, cap 0x80000, wptr 0x100000, zcond:14, [type: 2] (5) resetting the second zone After resetting a zone: start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80000, zcond:1, [type: 2] + + +(6) append write +start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:1, [type: 2] +After zap done, the append sector is 0x0 +After appending the first zone firstly: +start: 0x0, len 0x80000, cap 0x80000, wptr 0x18, zcond:2, [type: 2] +After zap done, the append sector is 0x18 +After appending the first zone secondly: +start: 0x0, len 0x80000, cap 0x80000, wptr 0x30, zcond:2, [type: 2] +After zap done, the append sector is 0x80000 +After appending the second zone firstly: +start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80018, zcond:2, [type: 2] +After zap done, the append sector is 0x80018 +After appending the second zone secondly: +start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80030, zcond:2, [type: 2] *** done From patchwork Thu Apr 20 12:09:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 13218664 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 85534C77B73 for ; Thu, 20 Apr 2023 12:11:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234696AbjDTMLa (ORCPT ); Thu, 20 Apr 2023 08:11:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60820 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234710AbjDTMLY (ORCPT ); Thu, 20 Apr 2023 08:11:24 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E8B1549F5 for ; Thu, 20 Apr 2023 05:10:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681992632; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=aTDl/XEmmC9HnI59iodIbZpxsMJGKoy7U6ldilOesf4=; b=XX3kTrFnDbDgXQLp/CkWkKVff7MHKtULV7T3mF0debYwxrKwHd9ivwlFI21eXx3mafGzGB 51zUUa+L69Vkl8VUyjWfIyiwco4fyntLdcEhTOnvKZpJ5BWbnVeWHBtMr8B85mkUJpW5o2 qW8tHZoHYw3KjFFkgC9eXDKXtUlKlCU= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-543-8s4IBVsQPYyMBPLLJYS2aQ-1; Thu, 20 Apr 2023 08:10:31 -0400 X-MC-Unique: 8s4IBVsQPYyMBPLLJYS2aQ-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B8C723802AC0; Thu, 20 Apr 2023 12:10:30 +0000 (UTC) Received: from localhost (unknown [10.39.193.254]) by smtp.corp.redhat.com (Postfix) with ESMTP id DFC572166B33; Thu, 20 Apr 2023 12:10:29 +0000 (UTC) From: Stefan Hajnoczi To: qemu-devel@nongnu.org Cc: Raphael Norwitz , Kevin Wolf , Markus Armbruster , Julia Suvorova , Eric Blake , Paolo Bonzini , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , =?utf-8?q?Marc?= =?utf-8?q?-Andr=C3=A9_Lureau?= , "Michael S. Tsirkin" , Thomas Huth , qemu-block@nongnu.org, Cornelia Huck , "Dr. David Alan Gilbert" , =?utf-8?q?Daniel_P=2E_Berra?= =?utf-8?q?ng=C3=A9?= , Peter Maydell , Stefano Garzarella , kvm@vger.kernel.org, virtio-fs@redhat.com, Stefan Hajnoczi , Hanna Reitz , Fam Zheng , Aarushi Mehta , Sam Li , Dmitry Fomichev Subject: [PULL 14/20] block: add some trace events for zone append Date: Thu, 20 Apr 2023 08:09:42 -0400 Message-Id: <20230420120948.436661-15-stefanha@redhat.com> In-Reply-To: <20230420120948.436661-1-stefanha@redhat.com> References: <20230420120948.436661-1-stefanha@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Sam Li Signed-off-by: Sam Li Reviewed-by: Dmitry Fomichev Reviewed-by: Stefan Hajnoczi Message-id: 20230407081657.17947-5-faithilikerun@gmail.com Signed-off-by: Stefan Hajnoczi --- block/file-posix.c | 3 +++ block/trace-events | 2 ++ 2 files changed, 5 insertions(+) diff --git a/block/file-posix.c b/block/file-posix.c index 444b34dad2..972d641538 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -2504,6 +2504,8 @@ out: if (!BDRV_ZT_IS_CONV(*wp)) { if (type & QEMU_AIO_ZONE_APPEND) { *s->offset = *wp; + trace_zbd_zone_append_complete(bs, *s->offset + >> BDRV_SECTOR_BITS); } /* Advance the wp if needed */ if (offset + bytes > *wp) { @@ -3550,6 +3552,7 @@ static int coroutine_fn raw_co_zone_append(BlockDriverState *bs, len += iov_len; } + trace_zbd_zone_append(bs, *offset >> BDRV_SECTOR_BITS); return raw_co_prw(bs, *offset, len, qiov, QEMU_AIO_ZONE_APPEND); } #endif diff --git a/block/trace-events b/block/trace-events index 3f4e1d088a..32665158d6 100644 --- a/block/trace-events +++ b/block/trace-events @@ -211,6 +211,8 @@ file_hdev_is_sg(int type, int version) "SG device found: type=%d, version=%d" file_flush_fdatasync_failed(int err) "errno %d" zbd_zone_report(void *bs, unsigned int nr_zones, int64_t sector) "bs %p report %d zones starting at sector offset 0x%" PRIx64 "" zbd_zone_mgmt(void *bs, const char *op_name, int64_t sector, int64_t len) "bs %p %s starts at sector offset 0x%" PRIx64 " over a range of 0x%" PRIx64 " sectors" +zbd_zone_append(void *bs, int64_t sector) "bs %p append at sector offset 0x%" PRIx64 "" +zbd_zone_append_complete(void *bs, int64_t sector) "bs %p returns append sector 0x%" PRIx64 "" # ssh.c sftp_error(const char *op, const char *ssh_err, int ssh_err_code, int sftp_err_code) "%s failed: %s (libssh error code: %d, sftp error code: %d)" From patchwork Thu Apr 20 12:09:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 13218663 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D8B42C77B76 for ; Thu, 20 Apr 2023 12:11:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234704AbjDTML2 (ORCPT ); Thu, 20 Apr 2023 08:11:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60920 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234690AbjDTMLY (ORCPT ); Thu, 20 Apr 2023 08:11:24 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 71E2C4C01 for ; Thu, 20 Apr 2023 05:10:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681992637; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=quYXhHxcxIcDHlpEHXCQIjhLBFyQYRnb0E1OqXiaFx4=; b=glKPWa76TQBYzo8L95K6AD66OWUOLqZLXl2M2gpUA6UpD7Ql/Who4vDAbwniuWWrFtr0Cl cxapK9h9FVgBvUQu1avs70nRs1uVeMPXJ2sHz+DbYbCwNkr+0kLVtS8Ffz8h17Wsvu5gVE 8uBcNHJknq7PN9qmhtLAbQjuIHkc9SA= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-416-DPp2nj3nPySjgjoIVrpUgQ-1; Thu, 20 Apr 2023 08:10:34 -0400 X-MC-Unique: DPp2nj3nPySjgjoIVrpUgQ-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B4416800047; Thu, 20 Apr 2023 12:10:33 +0000 (UTC) Received: from localhost (unknown [10.39.193.254]) by smtp.corp.redhat.com (Postfix) with ESMTP id D98D31121315; Thu, 20 Apr 2023 12:10:32 +0000 (UTC) From: Stefan Hajnoczi To: qemu-devel@nongnu.org Cc: Raphael Norwitz , Kevin Wolf , Markus Armbruster , Julia Suvorova , Eric Blake , Paolo Bonzini , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , =?utf-8?q?Marc?= =?utf-8?q?-Andr=C3=A9_Lureau?= , "Michael S. Tsirkin" , Thomas Huth , qemu-block@nongnu.org, Cornelia Huck , "Dr. David Alan Gilbert" , =?utf-8?q?Daniel_P=2E_Berra?= =?utf-8?q?ng=C3=A9?= , Peter Maydell , Stefano Garzarella , kvm@vger.kernel.org, virtio-fs@redhat.com, Stefan Hajnoczi , Hanna Reitz , Fam Zheng , Aarushi Mehta , Sam Li , Dmitry Fomichev Subject: [PULL 15/20] include: update virtio_blk headers to v6.3-rc1 Date: Thu, 20 Apr 2023 08:09:43 -0400 Message-Id: <20230420120948.436661-16-stefanha@redhat.com> In-Reply-To: <20230420120948.436661-1-stefanha@redhat.com> References: <20230420120948.436661-1-stefanha@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Sam Li Use scripts/update-linux-headers.sh to update headers to 6.3-rc1. Signed-off-by: Sam Li Reviewed-by: Stefan Hajnoczi Reviewed-by: Dmitry Fomichev Message-id: 20230407082528.18841-2-faithilikerun@gmail.com Signed-off-by: Stefan Hajnoczi --- include/standard-headers/drm/drm_fourcc.h | 12 +++ include/standard-headers/linux/ethtool.h | 48 ++++++++- include/standard-headers/linux/fuse.h | 45 +++++++- include/standard-headers/linux/pci_regs.h | 1 + include/standard-headers/linux/vhost_types.h | 2 + include/standard-headers/linux/virtio_blk.h | 105 +++++++++++++++++++ linux-headers/asm-arm64/kvm.h | 1 + linux-headers/asm-x86/kvm.h | 34 +++++- linux-headers/linux/kvm.h | 9 ++ linux-headers/linux/vfio.h | 15 +-- linux-headers/linux/vhost.h | 8 ++ 11 files changed, 270 insertions(+), 10 deletions(-) diff --git a/include/standard-headers/drm/drm_fourcc.h b/include/standard-headers/drm/drm_fourcc.h index 69cab17b38..dc3e6112c1 100644 --- a/include/standard-headers/drm/drm_fourcc.h +++ b/include/standard-headers/drm/drm_fourcc.h @@ -87,6 +87,18 @@ extern "C" { * * The authoritative list of format modifier codes is found in * `include/uapi/drm/drm_fourcc.h` + * + * Open Source User Waiver + * ----------------------- + * + * Because this is the authoritative source for pixel formats and modifiers + * referenced by GL, Vulkan extensions and other standards and hence used both + * by open source and closed source driver stacks, the usual requirement for an + * upstream in-kernel or open source userspace user does not apply. + * + * To ensure, as much as feasible, compatibility across stacks and avoid + * confusion with incompatible enumerations stakeholders for all relevant driver + * stacks should approve additions. */ #define fourcc_code(a, b, c, d) ((uint32_t)(a) | ((uint32_t)(b) << 8) | \ diff --git a/include/standard-headers/linux/ethtool.h b/include/standard-headers/linux/ethtool.h index 87176ab075..99fcddf04f 100644 --- a/include/standard-headers/linux/ethtool.h +++ b/include/standard-headers/linux/ethtool.h @@ -711,6 +711,24 @@ enum ethtool_stringset { ETH_SS_COUNT }; +/** + * enum ethtool_mac_stats_src - source of ethtool MAC statistics + * @ETHTOOL_MAC_STATS_SRC_AGGREGATE: + * if device supports a MAC merge layer, this retrieves the aggregate + * statistics of the eMAC and pMAC. Otherwise, it retrieves just the + * statistics of the single (express) MAC. + * @ETHTOOL_MAC_STATS_SRC_EMAC: + * if device supports a MM layer, this retrieves the eMAC statistics. + * Otherwise, it retrieves the statistics of the single (express) MAC. + * @ETHTOOL_MAC_STATS_SRC_PMAC: + * if device supports a MM layer, this retrieves the pMAC statistics. + */ +enum ethtool_mac_stats_src { + ETHTOOL_MAC_STATS_SRC_AGGREGATE, + ETHTOOL_MAC_STATS_SRC_EMAC, + ETHTOOL_MAC_STATS_SRC_PMAC, +}; + /** * enum ethtool_module_power_mode_policy - plug-in module power mode policy * @ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH: Module is always in high power mode. @@ -779,6 +797,31 @@ enum ethtool_podl_pse_pw_d_status { ETHTOOL_PODL_PSE_PW_D_STATUS_ERROR, }; +/** + * enum ethtool_mm_verify_status - status of MAC Merge Verify function + * @ETHTOOL_MM_VERIFY_STATUS_UNKNOWN: + * verification status is unknown + * @ETHTOOL_MM_VERIFY_STATUS_INITIAL: + * the 802.3 Verify State diagram is in the state INIT_VERIFICATION + * @ETHTOOL_MM_VERIFY_STATUS_VERIFYING: + * the Verify State diagram is in the state VERIFICATION_IDLE, + * SEND_VERIFY or WAIT_FOR_RESPONSE + * @ETHTOOL_MM_VERIFY_STATUS_SUCCEEDED: + * indicates that the Verify State diagram is in the state VERIFIED + * @ETHTOOL_MM_VERIFY_STATUS_FAILED: + * the Verify State diagram is in the state VERIFY_FAIL + * @ETHTOOL_MM_VERIFY_STATUS_DISABLED: + * verification of preemption operation is disabled + */ +enum ethtool_mm_verify_status { + ETHTOOL_MM_VERIFY_STATUS_UNKNOWN, + ETHTOOL_MM_VERIFY_STATUS_INITIAL, + ETHTOOL_MM_VERIFY_STATUS_VERIFYING, + ETHTOOL_MM_VERIFY_STATUS_SUCCEEDED, + ETHTOOL_MM_VERIFY_STATUS_FAILED, + ETHTOOL_MM_VERIFY_STATUS_DISABLED, +}; + /** * struct ethtool_gstrings - string set for data tagging * @cmd: Command number = %ETHTOOL_GSTRINGS @@ -1183,7 +1226,7 @@ struct ethtool_rxnfc { uint32_t rule_cnt; uint32_t rss_context; }; - uint32_t rule_locs[0]; + uint32_t rule_locs[]; }; @@ -1741,6 +1784,9 @@ enum ethtool_link_mode_bit_indices { ETHTOOL_LINK_MODE_800000baseDR8_2_Full_BIT = 96, ETHTOOL_LINK_MODE_800000baseSR8_Full_BIT = 97, ETHTOOL_LINK_MODE_800000baseVR8_Full_BIT = 98, + ETHTOOL_LINK_MODE_10baseT1S_Full_BIT = 99, + ETHTOOL_LINK_MODE_10baseT1S_Half_BIT = 100, + ETHTOOL_LINK_MODE_10baseT1S_P2MP_Half_BIT = 101, /* must be last entry */ __ETHTOOL_LINK_MODE_MASK_NBITS diff --git a/include/standard-headers/linux/fuse.h b/include/standard-headers/linux/fuse.h index a1af78d989..35c131a107 100644 --- a/include/standard-headers/linux/fuse.h +++ b/include/standard-headers/linux/fuse.h @@ -201,6 +201,11 @@ * 7.38 * - add FUSE_EXPIRE_ONLY flag to fuse_notify_inval_entry * - add FOPEN_PARALLEL_DIRECT_WRITES + * - add total_extlen to fuse_in_header + * - add FUSE_MAX_NR_SECCTX + * - add extension header + * - add FUSE_EXT_GROUPS + * - add FUSE_CREATE_SUPP_GROUP */ #ifndef _LINUX_FUSE_H @@ -358,6 +363,8 @@ struct fuse_file_lock { * FUSE_SECURITY_CTX: add security context to create, mkdir, symlink, and * mknod * FUSE_HAS_INODE_DAX: use per inode DAX + * FUSE_CREATE_SUPP_GROUP: add supplementary group info to create, mkdir, + * symlink and mknod (single group that matches parent) */ #define FUSE_ASYNC_READ (1 << 0) #define FUSE_POSIX_LOCKS (1 << 1) @@ -394,6 +401,7 @@ struct fuse_file_lock { /* bits 32..63 get shifted down 32 bits into the flags2 field */ #define FUSE_SECURITY_CTX (1ULL << 32) #define FUSE_HAS_INODE_DAX (1ULL << 33) +#define FUSE_CREATE_SUPP_GROUP (1ULL << 34) /** * CUSE INIT request/reply flags @@ -499,6 +507,17 @@ struct fuse_file_lock { */ #define FUSE_EXPIRE_ONLY (1 << 0) +/** + * extension type + * FUSE_MAX_NR_SECCTX: maximum value of &fuse_secctx_header.nr_secctx + * FUSE_EXT_GROUPS: &fuse_supp_groups extension + */ +enum fuse_ext_type { + /* Types 0..31 are reserved for fuse_secctx_header */ + FUSE_MAX_NR_SECCTX = 31, + FUSE_EXT_GROUPS = 32, +}; + enum fuse_opcode { FUSE_LOOKUP = 1, FUSE_FORGET = 2, /* no reply */ @@ -882,7 +901,8 @@ struct fuse_in_header { uint32_t uid; uint32_t gid; uint32_t pid; - uint32_t padding; + uint16_t total_extlen; /* length of extensions in 8byte units */ + uint16_t padding; }; struct fuse_out_header { @@ -1043,4 +1063,27 @@ struct fuse_secctx_header { uint32_t nr_secctx; }; +/** + * struct fuse_ext_header - extension header + * @size: total size of this extension including this header + * @type: type of extension + * + * This is made compatible with fuse_secctx_header by using type values > + * FUSE_MAX_NR_SECCTX + */ +struct fuse_ext_header { + uint32_t size; + uint32_t type; +}; + +/** + * struct fuse_supp_groups - Supplementary group extension + * @nr_groups: number of supplementary groups + * @groups: flexible array of group IDs + */ +struct fuse_supp_groups { + uint32_t nr_groups; + uint32_t groups[]; +}; + #endif /* _LINUX_FUSE_H */ diff --git a/include/standard-headers/linux/pci_regs.h b/include/standard-headers/linux/pci_regs.h index 85ab127881..dc2000e0fe 100644 --- a/include/standard-headers/linux/pci_regs.h +++ b/include/standard-headers/linux/pci_regs.h @@ -693,6 +693,7 @@ #define PCI_EXP_LNKCTL2_TX_MARGIN 0x0380 /* Transmit Margin */ #define PCI_EXP_LNKCTL2_HASD 0x0020 /* HW Autonomous Speed Disable */ #define PCI_EXP_LNKSTA2 0x32 /* Link Status 2 */ +#define PCI_EXP_LNKSTA2_FLIT 0x0400 /* Flit Mode Status */ #define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 0x32 /* end of v2 EPs w/ link */ #define PCI_EXP_SLTCAP2 0x34 /* Slot Capabilities 2 */ #define PCI_EXP_SLTCAP2_IBPD 0x00000001 /* In-band PD Disable Supported */ diff --git a/include/standard-headers/linux/vhost_types.h b/include/standard-headers/linux/vhost_types.h index c41a73fe36..88600e2d9f 100644 --- a/include/standard-headers/linux/vhost_types.h +++ b/include/standard-headers/linux/vhost_types.h @@ -163,5 +163,7 @@ struct vhost_vdpa_iova_range { #define VHOST_BACKEND_F_IOTLB_ASID 0x3 /* Device can be suspended */ #define VHOST_BACKEND_F_SUSPEND 0x4 +/* Device can be resumed */ +#define VHOST_BACKEND_F_RESUME 0x5 #endif diff --git a/include/standard-headers/linux/virtio_blk.h b/include/standard-headers/linux/virtio_blk.h index e81715cd70..7155b1a470 100644 --- a/include/standard-headers/linux/virtio_blk.h +++ b/include/standard-headers/linux/virtio_blk.h @@ -41,6 +41,7 @@ #define VIRTIO_BLK_F_DISCARD 13 /* DISCARD is supported */ #define VIRTIO_BLK_F_WRITE_ZEROES 14 /* WRITE ZEROES is supported */ #define VIRTIO_BLK_F_SECURE_ERASE 16 /* Secure Erase is supported */ +#define VIRTIO_BLK_F_ZONED 17 /* Zoned block device */ /* Legacy feature bits */ #ifndef VIRTIO_BLK_NO_LEGACY @@ -135,6 +136,16 @@ struct virtio_blk_config { /* Secure erase commands must be aligned to this number of sectors. */ __virtio32 secure_erase_sector_alignment; + /* Zoned block device characteristics (if VIRTIO_BLK_F_ZONED) */ + struct virtio_blk_zoned_characteristics { + uint32_t zone_sectors; + uint32_t max_open_zones; + uint32_t max_active_zones; + uint32_t max_append_sectors; + uint32_t write_granularity; + uint8_t model; + uint8_t unused2[3]; + } zoned; } QEMU_PACKED; /* @@ -172,6 +183,27 @@ struct virtio_blk_config { /* Secure erase command */ #define VIRTIO_BLK_T_SECURE_ERASE 14 +/* Zone append command */ +#define VIRTIO_BLK_T_ZONE_APPEND 15 + +/* Report zones command */ +#define VIRTIO_BLK_T_ZONE_REPORT 16 + +/* Open zone command */ +#define VIRTIO_BLK_T_ZONE_OPEN 18 + +/* Close zone command */ +#define VIRTIO_BLK_T_ZONE_CLOSE 20 + +/* Finish zone command */ +#define VIRTIO_BLK_T_ZONE_FINISH 22 + +/* Reset zone command */ +#define VIRTIO_BLK_T_ZONE_RESET 24 + +/* Reset All zones command */ +#define VIRTIO_BLK_T_ZONE_RESET_ALL 26 + #ifndef VIRTIO_BLK_NO_LEGACY /* Barrier before this op. */ #define VIRTIO_BLK_T_BARRIER 0x80000000 @@ -191,6 +223,72 @@ struct virtio_blk_outhdr { __virtio64 sector; }; +/* + * Supported zoned device models. + */ + +/* Regular block device */ +#define VIRTIO_BLK_Z_NONE 0 +/* Host-managed zoned device */ +#define VIRTIO_BLK_Z_HM 1 +/* Host-aware zoned device */ +#define VIRTIO_BLK_Z_HA 2 + +/* + * Zone descriptor. A part of VIRTIO_BLK_T_ZONE_REPORT command reply. + */ +struct virtio_blk_zone_descriptor { + /* Zone capacity */ + uint64_t z_cap; + /* The starting sector of the zone */ + uint64_t z_start; + /* Zone write pointer position in sectors */ + uint64_t z_wp; + /* Zone type */ + uint8_t z_type; + /* Zone state */ + uint8_t z_state; + uint8_t reserved[38]; +}; + +struct virtio_blk_zone_report { + uint64_t nr_zones; + uint8_t reserved[56]; + struct virtio_blk_zone_descriptor zones[]; +}; + +/* + * Supported zone types. + */ + +/* Conventional zone */ +#define VIRTIO_BLK_ZT_CONV 1 +/* Sequential Write Required zone */ +#define VIRTIO_BLK_ZT_SWR 2 +/* Sequential Write Preferred zone */ +#define VIRTIO_BLK_ZT_SWP 3 + +/* + * Zone states that are available for zones of all types. + */ + +/* Not a write pointer (conventional zones only) */ +#define VIRTIO_BLK_ZS_NOT_WP 0 +/* Empty */ +#define VIRTIO_BLK_ZS_EMPTY 1 +/* Implicitly Open */ +#define VIRTIO_BLK_ZS_IOPEN 2 +/* Explicitly Open */ +#define VIRTIO_BLK_ZS_EOPEN 3 +/* Closed */ +#define VIRTIO_BLK_ZS_CLOSED 4 +/* Read-Only */ +#define VIRTIO_BLK_ZS_RDONLY 13 +/* Full */ +#define VIRTIO_BLK_ZS_FULL 14 +/* Offline */ +#define VIRTIO_BLK_ZS_OFFLINE 15 + /* Unmap this range (only valid for write zeroes command) */ #define VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP 0x00000001 @@ -217,4 +315,11 @@ struct virtio_scsi_inhdr { #define VIRTIO_BLK_S_OK 0 #define VIRTIO_BLK_S_IOERR 1 #define VIRTIO_BLK_S_UNSUPP 2 + +/* Error codes that are specific to zoned block devices */ +#define VIRTIO_BLK_S_ZONE_INVALID_CMD 3 +#define VIRTIO_BLK_S_ZONE_UNALIGNED_WP 4 +#define VIRTIO_BLK_S_ZONE_OPEN_RESOURCE 5 +#define VIRTIO_BLK_S_ZONE_ACTIVE_RESOURCE 6 + #endif /* _LINUX_VIRTIO_BLK_H */ diff --git a/linux-headers/asm-arm64/kvm.h b/linux-headers/asm-arm64/kvm.h index a7cfefb3a8..d7e7bb885e 100644 --- a/linux-headers/asm-arm64/kvm.h +++ b/linux-headers/asm-arm64/kvm.h @@ -109,6 +109,7 @@ struct kvm_regs { #define KVM_ARM_VCPU_SVE 4 /* enable SVE for this CPU */ #define KVM_ARM_VCPU_PTRAUTH_ADDRESS 5 /* VCPU uses address authentication */ #define KVM_ARM_VCPU_PTRAUTH_GENERIC 6 /* VCPU uses generic authentication */ +#define KVM_ARM_VCPU_HAS_EL2 7 /* Support nested virtualization */ struct kvm_vcpu_init { __u32 target; diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h index 2747d2ce14..2937e7bf69 100644 --- a/linux-headers/asm-x86/kvm.h +++ b/linux-headers/asm-x86/kvm.h @@ -9,6 +9,7 @@ #include #include +#include #define KVM_PIO_PAGE_OFFSET 1 #define KVM_COALESCED_MMIO_PAGE_OFFSET 2 @@ -505,8 +506,8 @@ struct kvm_nested_state { * KVM_{GET,PUT}_NESTED_STATE ioctl values. */ union { - struct kvm_vmx_nested_state_data vmx[0]; - struct kvm_svm_nested_state_data svm[0]; + __DECLARE_FLEX_ARRAY(struct kvm_vmx_nested_state_data, vmx); + __DECLARE_FLEX_ARRAY(struct kvm_svm_nested_state_data, svm); } data; }; @@ -523,6 +524,35 @@ struct kvm_pmu_event_filter { #define KVM_PMU_EVENT_ALLOW 0 #define KVM_PMU_EVENT_DENY 1 +#define KVM_PMU_EVENT_FLAG_MASKED_EVENTS BIT(0) +#define KVM_PMU_EVENT_FLAGS_VALID_MASK (KVM_PMU_EVENT_FLAG_MASKED_EVENTS) + +/* + * Masked event layout. + * Bits Description + * ---- ----------- + * 7:0 event select (low bits) + * 15:8 umask match + * 31:16 unused + * 35:32 event select (high bits) + * 36:54 unused + * 55 exclude bit + * 63:56 umask mask + */ + +#define KVM_PMU_ENCODE_MASKED_ENTRY(event_select, mask, match, exclude) \ + (((event_select) & 0xFFULL) | (((event_select) & 0XF00ULL) << 24) | \ + (((mask) & 0xFFULL) << 56) | \ + (((match) & 0xFFULL) << 8) | \ + ((__u64)(!!(exclude)) << 55)) + +#define KVM_PMU_MASKED_ENTRY_EVENT_SELECT \ + (GENMASK_ULL(7, 0) | GENMASK_ULL(35, 32)) +#define KVM_PMU_MASKED_ENTRY_UMASK_MASK (GENMASK_ULL(63, 56)) +#define KVM_PMU_MASKED_ENTRY_UMASK_MATCH (GENMASK_ULL(15, 8)) +#define KVM_PMU_MASKED_ENTRY_EXCLUDE (BIT_ULL(55)) +#define KVM_PMU_MASKED_ENTRY_UMASK_MASK_SHIFT (56) + /* for KVM_{GET,SET,HAS}_DEVICE_ATTR */ #define KVM_VCPU_TSC_CTRL 0 /* control group for the timestamp counter (TSC) */ #define KVM_VCPU_TSC_OFFSET 0 /* attribute for the TSC offset */ diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h index 1e2c16cfe3..599de3c6e3 100644 --- a/linux-headers/linux/kvm.h +++ b/linux-headers/linux/kvm.h @@ -581,6 +581,8 @@ struct kvm_s390_mem_op { struct { __u8 ar; /* the access register number */ __u8 key; /* access key, ignored if flag unset */ + __u8 pad1[6]; /* ignored */ + __u64 old_addr; /* ignored if cmpxchg flag unset */ }; __u32 sida_offset; /* offset into the sida */ __u8 reserved[32]; /* ignored */ @@ -593,11 +595,17 @@ struct kvm_s390_mem_op { #define KVM_S390_MEMOP_SIDA_WRITE 3 #define KVM_S390_MEMOP_ABSOLUTE_READ 4 #define KVM_S390_MEMOP_ABSOLUTE_WRITE 5 +#define KVM_S390_MEMOP_ABSOLUTE_CMPXCHG 6 + /* flags for kvm_s390_mem_op->flags */ #define KVM_S390_MEMOP_F_CHECK_ONLY (1ULL << 0) #define KVM_S390_MEMOP_F_INJECT_EXCEPTION (1ULL << 1) #define KVM_S390_MEMOP_F_SKEY_PROTECTION (1ULL << 2) +/* flags specifying extension support via KVM_CAP_S390_MEM_OP_EXTENSION */ +#define KVM_S390_MEMOP_EXTENSION_CAP_BASE (1 << 0) +#define KVM_S390_MEMOP_EXTENSION_CAP_CMPXCHG (1 << 1) + /* for KVM_INTERRUPT */ struct kvm_interrupt { /* in */ @@ -1173,6 +1181,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223 #define KVM_CAP_S390_PROTECTED_ASYNC_DISABLE 224 #define KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP 225 +#define KVM_CAP_PMU_EVENT_MASKED_EVENTS 226 #ifdef KVM_CAP_IRQ_ROUTING diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h index c59692ce0b..4a534edbdc 100644 --- a/linux-headers/linux/vfio.h +++ b/linux-headers/linux/vfio.h @@ -49,7 +49,11 @@ /* Supports VFIO_DMA_UNMAP_FLAG_ALL */ #define VFIO_UNMAP_ALL 9 -/* Supports the vaddr flag for DMA map and unmap */ +/* + * Supports the vaddr flag for DMA map and unmap. Not supported for mediated + * devices, so this capability is subject to change as groups are added or + * removed. + */ #define VFIO_UPDATE_VADDR 10 /* @@ -1343,8 +1347,7 @@ struct vfio_iommu_type1_info_dma_avail { * Map process virtual addresses to IO virtual addresses using the * provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required. * - * If flags & VFIO_DMA_MAP_FLAG_VADDR, update the base vaddr for iova, and - * unblock translation of host virtual addresses in the iova range. The vaddr + * If flags & VFIO_DMA_MAP_FLAG_VADDR, update the base vaddr for iova. The vaddr * must have previously been invalidated with VFIO_DMA_UNMAP_FLAG_VADDR. To * maintain memory consistency within the user application, the updated vaddr * must address the same memory object as originally mapped. Failure to do so @@ -1395,9 +1398,9 @@ struct vfio_bitmap { * must be 0. This cannot be combined with the get-dirty-bitmap flag. * * If flags & VFIO_DMA_UNMAP_FLAG_VADDR, do not unmap, but invalidate host - * virtual addresses in the iova range. Tasks that attempt to translate an - * iova's vaddr will block. DMA to already-mapped pages continues. This - * cannot be combined with the get-dirty-bitmap flag. + * virtual addresses in the iova range. DMA to already-mapped pages continues. + * Groups may not be added to the container while any addresses are invalid. + * This cannot be combined with the get-dirty-bitmap flag. */ struct vfio_iommu_type1_dma_unmap { __u32 argsz; diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h index f9f115a7c7..92e1b700b5 100644 --- a/linux-headers/linux/vhost.h +++ b/linux-headers/linux/vhost.h @@ -180,4 +180,12 @@ */ #define VHOST_VDPA_SUSPEND _IO(VHOST_VIRTIO, 0x7D) +/* Resume a device so it can resume processing virtqueue requests + * + * After the return of this ioctl the device will have restored all the + * necessary states and it is fully operational to continue processing the + * virtqueue descriptors. + */ +#define VHOST_VDPA_RESUME _IO(VHOST_VIRTIO, 0x7E) + #endif From patchwork Thu Apr 20 12:09:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 13218665 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 41793C77B72 for ; Thu, 20 Apr 2023 12:11:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234675AbjDTMLc (ORCPT ); Thu, 20 Apr 2023 08:11:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60944 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234591AbjDTMLa (ORCPT ); Thu, 20 Apr 2023 08:11:30 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3802D4EFF for ; Thu, 20 Apr 2023 05:10:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681992639; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=k626FdPdGfdDIBs+vVn6qAz8If5+bEd5obmRigwZQtE=; b=TWMQ0DetFRsqcvWLaRH4L/kxhg63BnP/UN41xySplT0N1QPgeoZ9Sz8xobhb9Tpxy4Vw9G WddDEt85e7i8yocT8oUYEp+nebYIJfpNfanXOk3V0xqqQ+zjybNp0P07W7EhqyCKijPozB /yVZPbUQQoIgrkFUa4nLf8wFEhHZUco= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-651-bmw3oYQeNnyJCu3OXqpLcw-1; Thu, 20 Apr 2023 08:10:37 -0400 X-MC-Unique: bmw3oYQeNnyJCu3OXqpLcw-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 91E132823811; Thu, 20 Apr 2023 12:10:36 +0000 (UTC) Received: from localhost (unknown [10.39.193.254]) by smtp.corp.redhat.com (Postfix) with ESMTP id A02BF440BC; Thu, 20 Apr 2023 12:10:35 +0000 (UTC) From: Stefan Hajnoczi To: qemu-devel@nongnu.org Cc: Raphael Norwitz , Kevin Wolf , Markus Armbruster , Julia Suvorova , Eric Blake , Paolo Bonzini , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , =?utf-8?q?Marc?= =?utf-8?q?-Andr=C3=A9_Lureau?= , "Michael S. Tsirkin" , Thomas Huth , qemu-block@nongnu.org, Cornelia Huck , "Dr. David Alan Gilbert" , =?utf-8?q?Daniel_P=2E_Berra?= =?utf-8?q?ng=C3=A9?= , Peter Maydell , Stefano Garzarella , kvm@vger.kernel.org, virtio-fs@redhat.com, Stefan Hajnoczi , Hanna Reitz , Fam Zheng , Aarushi Mehta , Sam Li Subject: [PULL 16/20] virtio-blk: add zoned storage emulation for zoned devices Date: Thu, 20 Apr 2023 08:09:44 -0400 Message-Id: <20230420120948.436661-17-stefanha@redhat.com> In-Reply-To: <20230420120948.436661-1-stefanha@redhat.com> References: <20230420120948.436661-1-stefanha@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Sam Li This patch extends virtio-blk emulation to handle zoned device commands by calling the new block layer APIs to perform zoned device I/O on behalf of the guest. It supports Report Zone, four zone oparations (open, close, finish, reset), and Append Zone. The VIRTIO_BLK_F_ZONED feature bit will only be set if the host does support zoned block devices. Regular block devices(conventional zones) will not be set. The guest os can use blktests, fio to test those commands on zoned devices. Furthermore, using zonefs to test zone append write is also supported. Signed-off-by: Sam Li Message-id: 20230407082528.18841-3-faithilikerun@gmail.com Signed-off-by: Stefan Hajnoczi --- hw/block/virtio-blk-common.c | 2 + hw/block/virtio-blk.c | 389 +++++++++++++++++++++++++++++++++++ hw/virtio/virtio-qmp.c | 2 + 3 files changed, 393 insertions(+) diff --git a/hw/block/virtio-blk-common.c b/hw/block/virtio-blk-common.c index ac52d7c176..e2f8e2f6da 100644 --- a/hw/block/virtio-blk-common.c +++ b/hw/block/virtio-blk-common.c @@ -29,6 +29,8 @@ static const VirtIOFeature feature_sizes[] = { .end = endof(struct virtio_blk_config, discard_sector_alignment)}, {.flags = 1ULL << VIRTIO_BLK_F_WRITE_ZEROES, .end = endof(struct virtio_blk_config, write_zeroes_may_unmap)}, + {.flags = 1ULL << VIRTIO_BLK_F_ZONED, + .end = endof(struct virtio_blk_config, zoned)}, {} }; diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index cefca93b31..8b6030b1a5 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -17,6 +17,7 @@ #include "qemu/module.h" #include "qemu/error-report.h" #include "qemu/main-loop.h" +#include "block/block_int.h" #include "trace.h" #include "hw/block/block.h" #include "hw/qdev-properties.h" @@ -601,6 +602,335 @@ err: return err_status; } +typedef struct ZoneCmdData { + VirtIOBlockReq *req; + struct iovec *in_iov; + unsigned in_num; + union { + struct { + unsigned int nr_zones; + BlockZoneDescriptor *zones; + } zone_report_data; + struct { + int64_t offset; + } zone_append_data; + }; +} ZoneCmdData; + +/* + * check zoned_request: error checking before issuing requests. If all checks + * passed, return true. + * append: true if only zone append requests issued. + */ +static bool check_zoned_request(VirtIOBlock *s, int64_t offset, int64_t len, + bool append, uint8_t *status) { + BlockDriverState *bs = blk_bs(s->blk); + int index; + + if (!virtio_has_feature(s->host_features, VIRTIO_BLK_F_ZONED)) { + *status = VIRTIO_BLK_S_UNSUPP; + return false; + } + + if (offset < 0 || len < 0 || len > (bs->total_sectors << BDRV_SECTOR_BITS) + || offset > (bs->total_sectors << BDRV_SECTOR_BITS) - len) { + *status = VIRTIO_BLK_S_ZONE_INVALID_CMD; + return false; + } + + if (append) { + if (bs->bl.write_granularity) { + if ((offset % bs->bl.write_granularity) != 0) { + *status = VIRTIO_BLK_S_ZONE_UNALIGNED_WP; + return false; + } + } + + index = offset / bs->bl.zone_size; + if (BDRV_ZT_IS_CONV(bs->wps->wp[index])) { + *status = VIRTIO_BLK_S_ZONE_INVALID_CMD; + return false; + } + + if (len / 512 > bs->bl.max_append_sectors) { + if (bs->bl.max_append_sectors == 0) { + *status = VIRTIO_BLK_S_UNSUPP; + } else { + *status = VIRTIO_BLK_S_ZONE_INVALID_CMD; + } + return false; + } + } + return true; +} + +static void virtio_blk_zone_report_complete(void *opaque, int ret) +{ + ZoneCmdData *data = opaque; + VirtIOBlockReq *req = data->req; + VirtIOBlock *s = req->dev; + VirtIODevice *vdev = VIRTIO_DEVICE(req->dev); + struct iovec *in_iov = data->in_iov; + unsigned in_num = data->in_num; + int64_t zrp_size, n, j = 0; + int64_t nz = data->zone_report_data.nr_zones; + int8_t err_status = VIRTIO_BLK_S_OK; + + if (ret) { + err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD; + goto out; + } + + struct virtio_blk_zone_report zrp_hdr = (struct virtio_blk_zone_report) { + .nr_zones = cpu_to_le64(nz), + }; + zrp_size = sizeof(struct virtio_blk_zone_report) + + sizeof(struct virtio_blk_zone_descriptor) * nz; + n = iov_from_buf(in_iov, in_num, 0, &zrp_hdr, sizeof(zrp_hdr)); + if (n != sizeof(zrp_hdr)) { + virtio_error(vdev, "Driver provided input buffer that is too small!"); + err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD; + goto out; + } + + for (size_t i = sizeof(zrp_hdr); i < zrp_size; + i += sizeof(struct virtio_blk_zone_descriptor), ++j) { + struct virtio_blk_zone_descriptor desc = + (struct virtio_blk_zone_descriptor) { + .z_start = cpu_to_le64(data->zone_report_data.zones[j].start + >> BDRV_SECTOR_BITS), + .z_cap = cpu_to_le64(data->zone_report_data.zones[j].cap + >> BDRV_SECTOR_BITS), + .z_wp = cpu_to_le64(data->zone_report_data.zones[j].wp + >> BDRV_SECTOR_BITS), + }; + + switch (data->zone_report_data.zones[j].type) { + case BLK_ZT_CONV: + desc.z_type = VIRTIO_BLK_ZT_CONV; + break; + case BLK_ZT_SWR: + desc.z_type = VIRTIO_BLK_ZT_SWR; + break; + case BLK_ZT_SWP: + desc.z_type = VIRTIO_BLK_ZT_SWP; + break; + default: + g_assert_not_reached(); + } + + switch (data->zone_report_data.zones[j].state) { + case BLK_ZS_RDONLY: + desc.z_state = VIRTIO_BLK_ZS_RDONLY; + break; + case BLK_ZS_OFFLINE: + desc.z_state = VIRTIO_BLK_ZS_OFFLINE; + break; + case BLK_ZS_EMPTY: + desc.z_state = VIRTIO_BLK_ZS_EMPTY; + break; + case BLK_ZS_CLOSED: + desc.z_state = VIRTIO_BLK_ZS_CLOSED; + break; + case BLK_ZS_FULL: + desc.z_state = VIRTIO_BLK_ZS_FULL; + break; + case BLK_ZS_EOPEN: + desc.z_state = VIRTIO_BLK_ZS_EOPEN; + break; + case BLK_ZS_IOPEN: + desc.z_state = VIRTIO_BLK_ZS_IOPEN; + break; + case BLK_ZS_NOT_WP: + desc.z_state = VIRTIO_BLK_ZS_NOT_WP; + break; + default: + g_assert_not_reached(); + } + + /* TODO: it takes O(n^2) time complexity. Optimizations required. */ + n = iov_from_buf(in_iov, in_num, i, &desc, sizeof(desc)); + if (n != sizeof(desc)) { + virtio_error(vdev, "Driver provided input buffer " + "for descriptors that is too small!"); + err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD; + } + } + +out: + aio_context_acquire(blk_get_aio_context(s->conf.conf.blk)); + virtio_blk_req_complete(req, err_status); + virtio_blk_free_request(req); + aio_context_release(blk_get_aio_context(s->conf.conf.blk)); + g_free(data->zone_report_data.zones); + g_free(data); +} + +static void virtio_blk_handle_zone_report(VirtIOBlockReq *req, + struct iovec *in_iov, + unsigned in_num) +{ + VirtIOBlock *s = req->dev; + VirtIODevice *vdev = VIRTIO_DEVICE(s); + unsigned int nr_zones; + ZoneCmdData *data; + int64_t zone_size, offset; + uint8_t err_status; + + if (req->in_len < sizeof(struct virtio_blk_inhdr) + + sizeof(struct virtio_blk_zone_report) + + sizeof(struct virtio_blk_zone_descriptor)) { + virtio_error(vdev, "in buffer too small for zone report"); + return; + } + + /* start byte offset of the zone report */ + offset = virtio_ldq_p(vdev, &req->out.sector) << BDRV_SECTOR_BITS; + if (!check_zoned_request(s, offset, 0, false, &err_status)) { + goto out; + } + nr_zones = (req->in_len - sizeof(struct virtio_blk_inhdr) - + sizeof(struct virtio_blk_zone_report)) / + sizeof(struct virtio_blk_zone_descriptor); + + zone_size = sizeof(BlockZoneDescriptor) * nr_zones; + data = g_malloc(sizeof(ZoneCmdData)); + data->req = req; + data->in_iov = in_iov; + data->in_num = in_num; + data->zone_report_data.nr_zones = nr_zones; + data->zone_report_data.zones = g_malloc(zone_size), + + blk_aio_zone_report(s->blk, offset, &data->zone_report_data.nr_zones, + data->zone_report_data.zones, + virtio_blk_zone_report_complete, data); + return; +out: + virtio_blk_req_complete(req, err_status); + virtio_blk_free_request(req); +} + +static void virtio_blk_zone_mgmt_complete(void *opaque, int ret) +{ + VirtIOBlockReq *req = opaque; + VirtIOBlock *s = req->dev; + int8_t err_status = VIRTIO_BLK_S_OK; + + if (ret) { + err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD; + } + + aio_context_acquire(blk_get_aio_context(s->conf.conf.blk)); + virtio_blk_req_complete(req, err_status); + virtio_blk_free_request(req); + aio_context_release(blk_get_aio_context(s->conf.conf.blk)); +} + +static int virtio_blk_handle_zone_mgmt(VirtIOBlockReq *req, BlockZoneOp op) +{ + VirtIOBlock *s = req->dev; + VirtIODevice *vdev = VIRTIO_DEVICE(s); + BlockDriverState *bs = blk_bs(s->blk); + int64_t offset = virtio_ldq_p(vdev, &req->out.sector) << BDRV_SECTOR_BITS; + uint64_t len; + uint64_t capacity = bs->total_sectors << BDRV_SECTOR_BITS; + uint8_t err_status = VIRTIO_BLK_S_OK; + + uint32_t type = virtio_ldl_p(vdev, &req->out.type); + if (type == VIRTIO_BLK_T_ZONE_RESET_ALL) { + /* Entire drive capacity */ + offset = 0; + len = capacity; + } else { + if (bs->bl.zone_size > capacity - offset) { + /* The zoned device allows the last smaller zone. */ + len = capacity - bs->bl.zone_size * (bs->bl.nr_zones - 1); + } else { + len = bs->bl.zone_size; + } + } + + if (!check_zoned_request(s, offset, len, false, &err_status)) { + goto out; + } + + blk_aio_zone_mgmt(s->blk, op, offset, len, + virtio_blk_zone_mgmt_complete, req); + + return 0; +out: + virtio_blk_req_complete(req, err_status); + virtio_blk_free_request(req); + return err_status; +} + +static void virtio_blk_zone_append_complete(void *opaque, int ret) +{ + ZoneCmdData *data = opaque; + VirtIOBlockReq *req = data->req; + VirtIOBlock *s = req->dev; + VirtIODevice *vdev = VIRTIO_DEVICE(req->dev); + int64_t append_sector, n; + uint8_t err_status = VIRTIO_BLK_S_OK; + + if (ret) { + err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD; + goto out; + } + + virtio_stq_p(vdev, &append_sector, + data->zone_append_data.offset >> BDRV_SECTOR_BITS); + n = iov_from_buf(data->in_iov, data->in_num, 0, &append_sector, + sizeof(append_sector)); + if (n != sizeof(append_sector)) { + virtio_error(vdev, "Driver provided input buffer less than size of " + "append_sector"); + err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD; + goto out; + } + +out: + aio_context_acquire(blk_get_aio_context(s->conf.conf.blk)); + virtio_blk_req_complete(req, err_status); + virtio_blk_free_request(req); + aio_context_release(blk_get_aio_context(s->conf.conf.blk)); + g_free(data); +} + +static int virtio_blk_handle_zone_append(VirtIOBlockReq *req, + struct iovec *out_iov, + struct iovec *in_iov, + uint64_t out_num, + unsigned in_num) { + VirtIOBlock *s = req->dev; + VirtIODevice *vdev = VIRTIO_DEVICE(s); + uint8_t err_status = VIRTIO_BLK_S_OK; + + int64_t offset = virtio_ldq_p(vdev, &req->out.sector) << BDRV_SECTOR_BITS; + int64_t len = iov_size(out_iov, out_num); + + if (!check_zoned_request(s, offset, len, true, &err_status)) { + goto out; + } + + ZoneCmdData *data = g_malloc(sizeof(ZoneCmdData)); + data->req = req; + data->in_iov = in_iov; + data->in_num = in_num; + data->zone_append_data.offset = offset; + qemu_iovec_init_external(&req->qiov, out_iov, out_num); + blk_aio_zone_append(s->blk, &data->zone_append_data.offset, &req->qiov, 0, + virtio_blk_zone_append_complete, data); + return 0; + +out: + aio_context_acquire(blk_get_aio_context(s->conf.conf.blk)); + virtio_blk_req_complete(req, err_status); + virtio_blk_free_request(req); + aio_context_release(blk_get_aio_context(s->conf.conf.blk)); + return err_status; +} + static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb) { uint32_t type; @@ -687,6 +1017,24 @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb) case VIRTIO_BLK_T_FLUSH: virtio_blk_handle_flush(req, mrb); break; + case VIRTIO_BLK_T_ZONE_REPORT: + virtio_blk_handle_zone_report(req, in_iov, in_num); + break; + case VIRTIO_BLK_T_ZONE_OPEN: + virtio_blk_handle_zone_mgmt(req, BLK_ZO_OPEN); + break; + case VIRTIO_BLK_T_ZONE_CLOSE: + virtio_blk_handle_zone_mgmt(req, BLK_ZO_CLOSE); + break; + case VIRTIO_BLK_T_ZONE_FINISH: + virtio_blk_handle_zone_mgmt(req, BLK_ZO_FINISH); + break; + case VIRTIO_BLK_T_ZONE_RESET: + virtio_blk_handle_zone_mgmt(req, BLK_ZO_RESET); + break; + case VIRTIO_BLK_T_ZONE_RESET_ALL: + virtio_blk_handle_zone_mgmt(req, BLK_ZO_RESET); + break; case VIRTIO_BLK_T_SCSI_CMD: virtio_blk_handle_scsi(req); break; @@ -705,6 +1053,14 @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb) virtio_blk_free_request(req); break; } + case VIRTIO_BLK_T_ZONE_APPEND & ~VIRTIO_BLK_T_OUT: + /* + * Passing out_iov/out_num and in_iov/in_num is not safe + * to access req->elem.out_sg directly because it may be + * modified by virtio_blk_handle_request(). + */ + virtio_blk_handle_zone_append(req, out_iov, in_iov, out_num, in_num); + break; /* * VIRTIO_BLK_T_DISCARD and VIRTIO_BLK_T_WRITE_ZEROES are defined with * VIRTIO_BLK_T_OUT flag set. We masked this flag in the switch statement, @@ -890,6 +1246,7 @@ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config) { VirtIOBlock *s = VIRTIO_BLK(vdev); BlockConf *conf = &s->conf.conf; + BlockDriverState *bs = blk_bs(s->blk); struct virtio_blk_config blkcfg; uint64_t capacity; int64_t length; @@ -954,6 +1311,30 @@ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config) blkcfg.write_zeroes_may_unmap = 1; virtio_stl_p(vdev, &blkcfg.max_write_zeroes_seg, 1); } + if (bs->bl.zoned != BLK_Z_NONE) { + switch (bs->bl.zoned) { + case BLK_Z_HM: + blkcfg.zoned.model = VIRTIO_BLK_Z_HM; + break; + case BLK_Z_HA: + blkcfg.zoned.model = VIRTIO_BLK_Z_HA; + break; + default: + g_assert_not_reached(); + } + + virtio_stl_p(vdev, &blkcfg.zoned.zone_sectors, + bs->bl.zone_size / 512); + virtio_stl_p(vdev, &blkcfg.zoned.max_active_zones, + bs->bl.max_active_zones); + virtio_stl_p(vdev, &blkcfg.zoned.max_open_zones, + bs->bl.max_open_zones); + virtio_stl_p(vdev, &blkcfg.zoned.write_granularity, blk_size); + virtio_stl_p(vdev, &blkcfg.zoned.max_append_sectors, + bs->bl.max_append_sectors); + } else { + blkcfg.zoned.model = VIRTIO_BLK_Z_NONE; + } memcpy(config, &blkcfg, s->config_size); } @@ -1118,6 +1499,7 @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp) VirtIODevice *vdev = VIRTIO_DEVICE(dev); VirtIOBlock *s = VIRTIO_BLK(dev); VirtIOBlkConf *conf = &s->conf; + BlockDriverState *bs = blk_bs(conf->conf.blk); Error *err = NULL; unsigned i; @@ -1163,6 +1545,13 @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp) return; } + if (bs->bl.zoned != BLK_Z_NONE) { + virtio_add_feature(&s->host_features, VIRTIO_BLK_F_ZONED); + if (bs->bl.zoned == BLK_Z_HM) { + virtio_clear_feature(&s->host_features, VIRTIO_BLK_F_DISCARD); + } + } + if (virtio_has_feature(s->host_features, VIRTIO_BLK_F_DISCARD) && (!conf->max_discard_sectors || conf->max_discard_sectors > BDRV_REQUEST_MAX_SECTORS)) { diff --git a/hw/virtio/virtio-qmp.c b/hw/virtio/virtio-qmp.c index b70148aba9..e84316dcfd 100644 --- a/hw/virtio/virtio-qmp.c +++ b/hw/virtio/virtio-qmp.c @@ -176,6 +176,8 @@ static const qmp_virtio_feature_map_t virtio_blk_feature_map[] = { "VIRTIO_BLK_F_DISCARD: Discard command supported"), FEATURE_ENTRY(VIRTIO_BLK_F_WRITE_ZEROES, \ "VIRTIO_BLK_F_WRITE_ZEROES: Write zeroes command supported"), + FEATURE_ENTRY(VIRTIO_BLK_F_ZONED, \ + "VIRTIO_BLK_F_ZONED: Zoned block devices"), #ifndef VIRTIO_BLK_NO_LEGACY FEATURE_ENTRY(VIRTIO_BLK_F_BARRIER, \ "VIRTIO_BLK_F_BARRIER: Request barriers supported"), From patchwork Thu Apr 20 12:09:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 13218666 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3CCD7C77B73 for ; Thu, 20 Apr 2023 12:11:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234725AbjDTMLe (ORCPT ); Thu, 20 Apr 2023 08:11:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60946 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234662AbjDTMLc (ORCPT ); Thu, 20 Apr 2023 08:11:32 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 008984C1F for ; Thu, 20 Apr 2023 05:10:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681992643; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SX5USox1u1F1uI9//zgqx+HUZRsDEp22iB4vE6fWErk=; b=PlgXSaWi8uShH7h5CbV7zdFPeCi2iEqSGLqmZRlTt4pJElIA7IQpYdb2cIbBHRgkhHqKj/ 13PI4GGcp0kz3XgRyj8GTG2xy1usBuGonsWlpWz2/xR56mbHMMLyWAYfMq7Cta5GpSNHo7 OalOQTz3pHzhZj9WcXwx6TXxl5c6W5k= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-74-5WzItbzlMIukE2mMZ-FIUA-1; Thu, 20 Apr 2023 08:10:40 -0400 X-MC-Unique: 5WzItbzlMIukE2mMZ-FIUA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 8C03288B776; Thu, 20 Apr 2023 12:10:39 +0000 (UTC) Received: from localhost (unknown [10.39.193.254]) by smtp.corp.redhat.com (Postfix) with ESMTP id 634C52026DFD; Thu, 20 Apr 2023 12:10:38 +0000 (UTC) From: Stefan Hajnoczi To: qemu-devel@nongnu.org Cc: Raphael Norwitz , Kevin Wolf , Markus Armbruster , Julia Suvorova , Eric Blake , Paolo Bonzini , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , =?utf-8?q?Marc?= =?utf-8?q?-Andr=C3=A9_Lureau?= , "Michael S. Tsirkin" , Thomas Huth , qemu-block@nongnu.org, Cornelia Huck , "Dr. David Alan Gilbert" , =?utf-8?q?Daniel_P=2E_Berra?= =?utf-8?q?ng=C3=A9?= , Peter Maydell , Stefano Garzarella , kvm@vger.kernel.org, virtio-fs@redhat.com, Stefan Hajnoczi , Hanna Reitz , Fam Zheng , Aarushi Mehta , Sam Li Subject: [PULL 17/20] block: add accounting for zone append operation Date: Thu, 20 Apr 2023 08:09:45 -0400 Message-Id: <20230420120948.436661-18-stefanha@redhat.com> In-Reply-To: <20230420120948.436661-1-stefanha@redhat.com> References: <20230420120948.436661-1-stefanha@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Sam Li Taking account of the new zone append write operation for zoned devices, BLOCK_ACCT_ZONE_APPEND enum is introduced as other I/O request type (read, write, flush). Signed-off-by: Sam Li Message-id: 20230407082528.18841-4-faithilikerun@gmail.com Signed-off-by: Stefan Hajnoczi --- qapi/block-core.json | 68 ++++++++++++++++++++++++++++++++------ qapi/block.json | 4 +++ include/block/accounting.h | 1 + block/qapi-sysemu.c | 11 ++++++ block/qapi.c | 18 ++++++++++ hw/block/virtio-blk.c | 4 +++ 6 files changed, 95 insertions(+), 11 deletions(-) diff --git a/qapi/block-core.json b/qapi/block-core.json index c05ad0c07e..44a70aad21 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -849,6 +849,10 @@ # @min_wr_latency_ns: Minimum latency of write operations in the # defined interval, in nanoseconds. # +# @min_zone_append_latency_ns: Minimum latency of zone append operations +# in the defined interval, in nanoseconds +# (since 8.1) +# # @min_flush_latency_ns: Minimum latency of flush operations in the # defined interval, in nanoseconds. # @@ -858,6 +862,10 @@ # @max_wr_latency_ns: Maximum latency of write operations in the # defined interval, in nanoseconds. # +# @max_zone_append_latency_ns: Maximum latency of zone append operations +# in the defined interval, in nanoseconds +# (since 8.1) +# # @max_flush_latency_ns: Maximum latency of flush operations in the # defined interval, in nanoseconds. # @@ -867,6 +875,10 @@ # @avg_wr_latency_ns: Average latency of write operations in the # defined interval, in nanoseconds. # +# @avg_zone_append_latency_ns: Average latency of zone append operations +# in the defined interval, in nanoseconds +# (since 8.1) +# # @avg_flush_latency_ns: Average latency of flush operations in the # defined interval, in nanoseconds. # @@ -876,15 +888,23 @@ # @avg_wr_queue_depth: Average number of pending write operations # in the defined interval. # +# @avg_zone_append_queue_depth: Average number of pending zone append +# operations in the defined interval +# (since 8.1). +# # Since: 2.5 ## { 'struct': 'BlockDeviceTimedStats', 'data': { 'interval_length': 'int', 'min_rd_latency_ns': 'int', 'max_rd_latency_ns': 'int', 'avg_rd_latency_ns': 'int', 'min_wr_latency_ns': 'int', 'max_wr_latency_ns': 'int', - 'avg_wr_latency_ns': 'int', 'min_flush_latency_ns': 'int', - 'max_flush_latency_ns': 'int', 'avg_flush_latency_ns': 'int', - 'avg_rd_queue_depth': 'number', 'avg_wr_queue_depth': 'number' } } + 'avg_wr_latency_ns': 'int', 'min_zone_append_latency_ns': 'int', + 'max_zone_append_latency_ns': 'int', + 'avg_zone_append_latency_ns': 'int', + 'min_flush_latency_ns': 'int', 'max_flush_latency_ns': 'int', + 'avg_flush_latency_ns': 'int', 'avg_rd_queue_depth': 'number', + 'avg_wr_queue_depth': 'number', + 'avg_zone_append_queue_depth': 'number' } } ## # @BlockDeviceStats: @@ -895,12 +915,18 @@ # # @wr_bytes: The number of bytes written by the device. # +# @zone_append_bytes: The number of bytes appended by the zoned devices +# (since 8.1) +# # @unmap_bytes: The number of bytes unmapped by the device (Since 4.2) # # @rd_operations: The number of read operations performed by the device. # # @wr_operations: The number of write operations performed by the device. # +# @zone_append_operations: The number of zone append operations performed +# by the zoned devices (since 8.1) +# # @flush_operations: The number of cache flush operations performed by the # device (since 0.15) # @@ -911,6 +937,9 @@ # # @wr_total_time_ns: Total time spent on writes in nanoseconds (since 0.15). # +# @zone_append_total_time_ns: Total time spent on zone append writes +# in nanoseconds (since 8.1) +# # @flush_total_time_ns: Total time spent on cache flushes in nanoseconds # (since 0.15). # @@ -928,6 +957,9 @@ # @wr_merged: Number of write requests that have been merged into another # request (Since 2.3). # +# @zone_append_merged: Number of zone append requests that have been merged +# into another request (since 8.1) +# # @unmap_merged: Number of unmap requests that have been merged into another # request (Since 4.2) # @@ -941,6 +973,10 @@ # @failed_wr_operations: The number of failed write operations # performed by the device (Since 2.5) # +# @failed_zone_append_operations: The number of failed zone append write +# operations performed by the zoned devices +# (since 8.1) +# # @failed_flush_operations: The number of failed flush operations # performed by the device (Since 2.5) # @@ -953,6 +989,9 @@ # @invalid_wr_operations: The number of invalid write operations # performed by the device (Since 2.5) # +# @invalid_zone_append_operations: The number of invalid zone append operations +# performed by the zoned device (since 8.1) +# # @invalid_flush_operations: The number of invalid flush operations # performed by the device (Since 2.5) # @@ -972,27 +1011,34 @@ # # @wr_latency_histogram: @BlockLatencyHistogramInfo. (Since 4.0) # +# @zone_append_latency_histogram: @BlockLatencyHistogramInfo. (since 8.1) +# # @flush_latency_histogram: @BlockLatencyHistogramInfo. (Since 4.0) # # Since: 0.14 ## { 'struct': 'BlockDeviceStats', - 'data': {'rd_bytes': 'int', 'wr_bytes': 'int', 'unmap_bytes' : 'int', - 'rd_operations': 'int', 'wr_operations': 'int', + 'data': {'rd_bytes': 'int', 'wr_bytes': 'int', 'zone_append_bytes': 'int', + 'unmap_bytes' : 'int', 'rd_operations': 'int', + 'wr_operations': 'int', 'zone_append_operations': 'int', 'flush_operations': 'int', 'unmap_operations': 'int', 'rd_total_time_ns': 'int', 'wr_total_time_ns': 'int', - 'flush_total_time_ns': 'int', 'unmap_total_time_ns': 'int', - 'wr_highest_offset': 'int', - 'rd_merged': 'int', 'wr_merged': 'int', 'unmap_merged': 'int', - '*idle_time_ns': 'int', + 'zone_append_total_time_ns': 'int', 'flush_total_time_ns': 'int', + 'unmap_total_time_ns': 'int', 'wr_highest_offset': 'int', + 'rd_merged': 'int', 'wr_merged': 'int', 'zone_append_merged': 'int', + 'unmap_merged': 'int', '*idle_time_ns': 'int', 'failed_rd_operations': 'int', 'failed_wr_operations': 'int', - 'failed_flush_operations': 'int', 'failed_unmap_operations': 'int', - 'invalid_rd_operations': 'int', 'invalid_wr_operations': 'int', + 'failed_zone_append_operations': 'int', + 'failed_flush_operations': 'int', + 'failed_unmap_operations': 'int', 'invalid_rd_operations': 'int', + 'invalid_wr_operations': 'int', + 'invalid_zone_append_operations': 'int', 'invalid_flush_operations': 'int', 'invalid_unmap_operations': 'int', 'account_invalid': 'bool', 'account_failed': 'bool', 'timed_stats': ['BlockDeviceTimedStats'], '*rd_latency_histogram': 'BlockLatencyHistogramInfo', '*wr_latency_histogram': 'BlockLatencyHistogramInfo', + '*zone_append_latency_histogram': 'BlockLatencyHistogramInfo', '*flush_latency_histogram': 'BlockLatencyHistogramInfo' } } ## diff --git a/qapi/block.json b/qapi/block.json index 5fe068f903..5a57ef4a9f 100644 --- a/qapi/block.json +++ b/qapi/block.json @@ -525,6 +525,9 @@ # @boundaries-write: list of interval boundary values for write latency # histogram. # +# @boundaries-zap: list of interval boundary values for zone append write +# latency histogram. +# # @boundaries-flush: list of interval boundary values for flush latency # histogram. # @@ -573,5 +576,6 @@ '*boundaries': ['uint64'], '*boundaries-read': ['uint64'], '*boundaries-write': ['uint64'], + '*boundaries-zap': ['uint64'], '*boundaries-flush': ['uint64'] }, 'allow-preconfig': true } diff --git a/include/block/accounting.h b/include/block/accounting.h index b9caad60d5..a59e39f49d 100644 --- a/include/block/accounting.h +++ b/include/block/accounting.h @@ -37,6 +37,7 @@ enum BlockAcctType { BLOCK_ACCT_READ, BLOCK_ACCT_WRITE, BLOCK_ACCT_FLUSH, + BLOCK_ACCT_ZONE_APPEND, BLOCK_ACCT_UNMAP, BLOCK_MAX_IOTYPE, }; diff --git a/block/qapi-sysemu.c b/block/qapi-sysemu.c index 7bd7554150..cec3c1afb4 100644 --- a/block/qapi-sysemu.c +++ b/block/qapi-sysemu.c @@ -517,6 +517,7 @@ void qmp_block_latency_histogram_set( bool has_boundaries, uint64List *boundaries, bool has_boundaries_read, uint64List *boundaries_read, bool has_boundaries_write, uint64List *boundaries_write, + bool has_boundaries_append, uint64List *boundaries_append, bool has_boundaries_flush, uint64List *boundaries_flush, Error **errp) { @@ -557,6 +558,16 @@ void qmp_block_latency_histogram_set( } } + if (has_boundaries || has_boundaries_append) { + ret = block_latency_histogram_set( + stats, BLOCK_ACCT_ZONE_APPEND, + has_boundaries_append ? boundaries_append : boundaries); + if (ret) { + error_setg(errp, "Device '%s' set append write boundaries fail", id); + return; + } + } + if (has_boundaries || has_boundaries_flush) { ret = block_latency_histogram_set( stats, BLOCK_ACCT_FLUSH, diff --git a/block/qapi.c b/block/qapi.c index c84147849d..2684484e9d 100644 --- a/block/qapi.c +++ b/block/qapi.c @@ -533,27 +533,36 @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk) ds->rd_bytes = stats->nr_bytes[BLOCK_ACCT_READ]; ds->wr_bytes = stats->nr_bytes[BLOCK_ACCT_WRITE]; + ds->zone_append_bytes = stats->nr_bytes[BLOCK_ACCT_ZONE_APPEND]; ds->unmap_bytes = stats->nr_bytes[BLOCK_ACCT_UNMAP]; ds->rd_operations = stats->nr_ops[BLOCK_ACCT_READ]; ds->wr_operations = stats->nr_ops[BLOCK_ACCT_WRITE]; + ds->zone_append_operations = stats->nr_ops[BLOCK_ACCT_ZONE_APPEND]; ds->unmap_operations = stats->nr_ops[BLOCK_ACCT_UNMAP]; ds->failed_rd_operations = stats->failed_ops[BLOCK_ACCT_READ]; ds->failed_wr_operations = stats->failed_ops[BLOCK_ACCT_WRITE]; + ds->failed_zone_append_operations = + stats->failed_ops[BLOCK_ACCT_ZONE_APPEND]; ds->failed_flush_operations = stats->failed_ops[BLOCK_ACCT_FLUSH]; ds->failed_unmap_operations = stats->failed_ops[BLOCK_ACCT_UNMAP]; ds->invalid_rd_operations = stats->invalid_ops[BLOCK_ACCT_READ]; ds->invalid_wr_operations = stats->invalid_ops[BLOCK_ACCT_WRITE]; + ds->invalid_zone_append_operations = + stats->invalid_ops[BLOCK_ACCT_ZONE_APPEND]; ds->invalid_flush_operations = stats->invalid_ops[BLOCK_ACCT_FLUSH]; ds->invalid_unmap_operations = stats->invalid_ops[BLOCK_ACCT_UNMAP]; ds->rd_merged = stats->merged[BLOCK_ACCT_READ]; ds->wr_merged = stats->merged[BLOCK_ACCT_WRITE]; + ds->zone_append_merged = stats->merged[BLOCK_ACCT_ZONE_APPEND]; ds->unmap_merged = stats->merged[BLOCK_ACCT_UNMAP]; ds->flush_operations = stats->nr_ops[BLOCK_ACCT_FLUSH]; ds->wr_total_time_ns = stats->total_time_ns[BLOCK_ACCT_WRITE]; + ds->zone_append_total_time_ns = + stats->total_time_ns[BLOCK_ACCT_ZONE_APPEND]; ds->rd_total_time_ns = stats->total_time_ns[BLOCK_ACCT_READ]; ds->flush_total_time_ns = stats->total_time_ns[BLOCK_ACCT_FLUSH]; ds->unmap_total_time_ns = stats->total_time_ns[BLOCK_ACCT_UNMAP]; @@ -571,6 +580,7 @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk) TimedAverage *rd = &ts->latency[BLOCK_ACCT_READ]; TimedAverage *wr = &ts->latency[BLOCK_ACCT_WRITE]; + TimedAverage *zap = &ts->latency[BLOCK_ACCT_ZONE_APPEND]; TimedAverage *fl = &ts->latency[BLOCK_ACCT_FLUSH]; dev_stats->interval_length = ts->interval_length; @@ -583,6 +593,10 @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk) dev_stats->max_wr_latency_ns = timed_average_max(wr); dev_stats->avg_wr_latency_ns = timed_average_avg(wr); + dev_stats->min_zone_append_latency_ns = timed_average_min(zap); + dev_stats->max_zone_append_latency_ns = timed_average_max(zap); + dev_stats->avg_zone_append_latency_ns = timed_average_avg(zap); + dev_stats->min_flush_latency_ns = timed_average_min(fl); dev_stats->max_flush_latency_ns = timed_average_max(fl); dev_stats->avg_flush_latency_ns = timed_average_avg(fl); @@ -591,6 +605,8 @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk) block_acct_queue_depth(ts, BLOCK_ACCT_READ); dev_stats->avg_wr_queue_depth = block_acct_queue_depth(ts, BLOCK_ACCT_WRITE); + dev_stats->avg_zone_append_queue_depth = + block_acct_queue_depth(ts, BLOCK_ACCT_ZONE_APPEND); QAPI_LIST_PREPEND(ds->timed_stats, dev_stats); } @@ -600,6 +616,8 @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk) = bdrv_latency_histogram_stats(&hgram[BLOCK_ACCT_READ]); ds->wr_latency_histogram = bdrv_latency_histogram_stats(&hgram[BLOCK_ACCT_WRITE]); + ds->zone_append_latency_histogram + = bdrv_latency_histogram_stats(&hgram[BLOCK_ACCT_ZONE_APPEND]); ds->flush_latency_histogram = bdrv_latency_histogram_stats(&hgram[BLOCK_ACCT_FLUSH]); } diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 8b6030b1a5..a9d3168770 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -919,6 +919,10 @@ static int virtio_blk_handle_zone_append(VirtIOBlockReq *req, data->in_num = in_num; data->zone_append_data.offset = offset; qemu_iovec_init_external(&req->qiov, out_iov, out_num); + + block_acct_start(blk_get_stats(s->blk), &req->acct, len, + BLOCK_ACCT_ZONE_APPEND); + blk_aio_zone_append(s->blk, &data->zone_append_data.offset, &req->qiov, 0, virtio_blk_zone_append_complete, data); return 0; From patchwork Thu Apr 20 12:09:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 13218667 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D0B33C77B72 for ; Thu, 20 Apr 2023 12:11:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234747AbjDTMLk (ORCPT ); Thu, 20 Apr 2023 08:11:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60996 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234728AbjDTMLf (ORCPT ); Thu, 20 Apr 2023 08:11:35 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 39DCA4C18 for ; Thu, 20 Apr 2023 05:10:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681992646; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZKzhXNubl/VONaMQB/a+AhnMCtpdzNWj/7QJoWHG/H0=; b=QkInuw6myqAMlNd/aJTugHZ1SiDa/DFgPydYQkWyqexkWPypLs0c6w9qk0v1Hbwe7phbUZ FrgyySo0qq8PWH3pqlFjVMxzmswSlO97p1KkDbWH42gO8cM/QBhpMIB3Ey67pga6C0cSSE aERnEQbJhSrRkMEEfxfowUcf9EFj/+A= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-375-Ae-w_a5SNLatU1O_DQANrg-1; Thu, 20 Apr 2023 08:10:43 -0400 X-MC-Unique: Ae-w_a5SNLatU1O_DQANrg-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 273653802123; Thu, 20 Apr 2023 12:10:42 +0000 (UTC) Received: from localhost (unknown [10.39.193.254]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8DF1D20239E0; Thu, 20 Apr 2023 12:10:41 +0000 (UTC) From: Stefan Hajnoczi To: qemu-devel@nongnu.org Cc: Raphael Norwitz , Kevin Wolf , Markus Armbruster , Julia Suvorova , Eric Blake , Paolo Bonzini , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , =?utf-8?q?Marc?= =?utf-8?q?-Andr=C3=A9_Lureau?= , "Michael S. Tsirkin" , Thomas Huth , qemu-block@nongnu.org, Cornelia Huck , "Dr. David Alan Gilbert" , =?utf-8?q?Daniel_P=2E_Berra?= =?utf-8?q?ng=C3=A9?= , Peter Maydell , Stefano Garzarella , kvm@vger.kernel.org, virtio-fs@redhat.com, Stefan Hajnoczi , Hanna Reitz , Fam Zheng , Aarushi Mehta , Sam Li Subject: [PULL 18/20] virtio-blk: add some trace events for zoned emulation Date: Thu, 20 Apr 2023 08:09:46 -0400 Message-Id: <20230420120948.436661-19-stefanha@redhat.com> In-Reply-To: <20230420120948.436661-1-stefanha@redhat.com> References: <20230420120948.436661-1-stefanha@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Sam Li Signed-off-by: Sam Li Reviewed-by: Stefan Hajnoczi Message-id: 20230407082528.18841-5-faithilikerun@gmail.com Signed-off-by: Stefan Hajnoczi --- hw/block/virtio-blk.c | 12 ++++++++++++ hw/block/trace-events | 7 +++++++ 2 files changed, 19 insertions(+) diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index a9d3168770..7a66056c71 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -676,6 +676,7 @@ static void virtio_blk_zone_report_complete(void *opaque, int ret) int64_t nz = data->zone_report_data.nr_zones; int8_t err_status = VIRTIO_BLK_S_OK; + trace_virtio_blk_zone_report_complete(vdev, req, nz, ret); if (ret) { err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD; goto out; @@ -792,6 +793,8 @@ static void virtio_blk_handle_zone_report(VirtIOBlockReq *req, nr_zones = (req->in_len - sizeof(struct virtio_blk_inhdr) - sizeof(struct virtio_blk_zone_report)) / sizeof(struct virtio_blk_zone_descriptor); + trace_virtio_blk_handle_zone_report(vdev, req, + offset >> BDRV_SECTOR_BITS, nr_zones); zone_size = sizeof(BlockZoneDescriptor) * nr_zones; data = g_malloc(sizeof(ZoneCmdData)); @@ -814,7 +817,9 @@ static void virtio_blk_zone_mgmt_complete(void *opaque, int ret) { VirtIOBlockReq *req = opaque; VirtIOBlock *s = req->dev; + VirtIODevice *vdev = VIRTIO_DEVICE(s); int8_t err_status = VIRTIO_BLK_S_OK; + trace_virtio_blk_zone_mgmt_complete(vdev, req,ret); if (ret) { err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD; @@ -841,6 +846,8 @@ static int virtio_blk_handle_zone_mgmt(VirtIOBlockReq *req, BlockZoneOp op) /* Entire drive capacity */ offset = 0; len = capacity; + trace_virtio_blk_handle_zone_reset_all(vdev, req, 0, + bs->total_sectors); } else { if (bs->bl.zone_size > capacity - offset) { /* The zoned device allows the last smaller zone. */ @@ -848,6 +855,9 @@ static int virtio_blk_handle_zone_mgmt(VirtIOBlockReq *req, BlockZoneOp op) } else { len = bs->bl.zone_size; } + trace_virtio_blk_handle_zone_mgmt(vdev, req, op, + offset >> BDRV_SECTOR_BITS, + len >> BDRV_SECTOR_BITS); } if (!check_zoned_request(s, offset, len, false, &err_status)) { @@ -888,6 +898,7 @@ static void virtio_blk_zone_append_complete(void *opaque, int ret) err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD; goto out; } + trace_virtio_blk_zone_append_complete(vdev, req, append_sector, ret); out: aio_context_acquire(blk_get_aio_context(s->conf.conf.blk)); @@ -909,6 +920,7 @@ static int virtio_blk_handle_zone_append(VirtIOBlockReq *req, int64_t offset = virtio_ldq_p(vdev, &req->out.sector) << BDRV_SECTOR_BITS; int64_t len = iov_size(out_iov, out_num); + trace_virtio_blk_handle_zone_append(vdev, req, offset >> BDRV_SECTOR_BITS); if (!check_zoned_request(s, offset, len, true, &err_status)) { goto out; } diff --git a/hw/block/trace-events b/hw/block/trace-events index 2c45a62bd5..34be8b9135 100644 --- a/hw/block/trace-events +++ b/hw/block/trace-events @@ -44,9 +44,16 @@ pflash_write_unknown(const char *name, uint8_t cmd) "%s: unknown command 0x%02x" # virtio-blk.c virtio_blk_req_complete(void *vdev, void *req, int status) "vdev %p req %p status %d" virtio_blk_rw_complete(void *vdev, void *req, int ret) "vdev %p req %p ret %d" +virtio_blk_zone_report_complete(void *vdev, void *req, unsigned int nr_zones, int ret) "vdev %p req %p nr_zones %u ret %d" +virtio_blk_zone_mgmt_complete(void *vdev, void *req, int ret) "vdev %p req %p ret %d" +virtio_blk_zone_append_complete(void *vdev, void *req, int64_t sector, int ret) "vdev %p req %p, append sector 0x%" PRIx64 " ret %d" virtio_blk_handle_write(void *vdev, void *req, uint64_t sector, size_t nsectors) "vdev %p req %p sector %"PRIu64" nsectors %zu" virtio_blk_handle_read(void *vdev, void *req, uint64_t sector, size_t nsectors) "vdev %p req %p sector %"PRIu64" nsectors %zu" virtio_blk_submit_multireq(void *vdev, void *mrb, int start, int num_reqs, uint64_t offset, size_t size, bool is_write) "vdev %p mrb %p start %d num_reqs %d offset %"PRIu64" size %zu is_write %d" +virtio_blk_handle_zone_report(void *vdev, void *req, int64_t sector, unsigned int nr_zones) "vdev %p req %p sector 0x%" PRIx64 " nr_zones %u" +virtio_blk_handle_zone_mgmt(void *vdev, void *req, uint8_t op, int64_t sector, int64_t len) "vdev %p req %p op 0x%x sector 0x%" PRIx64 " len 0x%" PRIx64 "" +virtio_blk_handle_zone_reset_all(void *vdev, void *req, int64_t sector, int64_t len) "vdev %p req %p sector 0x%" PRIx64 " cap 0x%" PRIx64 "" +virtio_blk_handle_zone_append(void *vdev, void *req, int64_t sector) "vdev %p req %p, append sector 0x%" PRIx64 "" # hd-geometry.c hd_geometry_lchs_guess(void *blk, int cyls, int heads, int secs) "blk %p LCHS %d %d %d" From patchwork Thu Apr 20 12:09:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 13218668 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E221C77B73 for ; Thu, 20 Apr 2023 12:11:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234745AbjDTMLp (ORCPT ); Thu, 20 Apr 2023 08:11:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32800 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234127AbjDTMLj (ORCPT ); Thu, 20 Apr 2023 08:11:39 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C56F94C10 for ; Thu, 20 Apr 2023 05:10:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681992649; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vHb8qGSkfOOzR1UvPl9jTJBSrgP0gJxbai8FbuzD3Cc=; b=FJh5lnej7rzn5er/FTL/1xJsMwn4kWLv1vEvR0+jXM4QtFVDH3VXFH9hyvHZdZSeHhLf2P slDQ4tr5GRGv6PGMIw3OMvqpg8cTALDLqfk0pi5xRxTflE1Muv1M8h1uVCR8K06XOWIH50 1FKJ+04Sj3Xv+T89eIzkcGWUBLr15Q8= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-67-SM41RbkZMESFzP3lJGc7QQ-1; Thu, 20 Apr 2023 08:10:45 -0400 X-MC-Unique: SM41RbkZMESFzP3lJGc7QQ-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 7489985A5A3; Thu, 20 Apr 2023 12:10:44 +0000 (UTC) Received: from localhost (unknown [10.39.193.254]) by smtp.corp.redhat.com (Postfix) with ESMTP id B3645492C3E; Thu, 20 Apr 2023 12:10:43 +0000 (UTC) From: Stefan Hajnoczi To: qemu-devel@nongnu.org Cc: Raphael Norwitz , Kevin Wolf , Markus Armbruster , Julia Suvorova , Eric Blake , Paolo Bonzini , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , =?utf-8?q?Marc?= =?utf-8?q?-Andr=C3=A9_Lureau?= , "Michael S. Tsirkin" , Thomas Huth , qemu-block@nongnu.org, Cornelia Huck , "Dr. David Alan Gilbert" , =?utf-8?q?Daniel_P=2E_Berra?= =?utf-8?q?ng=C3=A9?= , Peter Maydell , Stefano Garzarella , kvm@vger.kernel.org, virtio-fs@redhat.com, Stefan Hajnoczi , Hanna Reitz , Fam Zheng , Aarushi Mehta , Sam Li Subject: [PULL 19/20] docs/zoned-storage:add zoned emulation use case Date: Thu, 20 Apr 2023 08:09:47 -0400 Message-Id: <20230420120948.436661-20-stefanha@redhat.com> In-Reply-To: <20230420120948.436661-1-stefanha@redhat.com> References: <20230420120948.436661-1-stefanha@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Sam Li Add the documentation about the example of using virtio-blk driver to pass the zoned block devices through to the guest. Signed-off-by: Sam Li Message-id: 20230407082528.18841-6-faithilikerun@gmail.com [Fix Sphinx indentation error by turning command-lines into pre-formatted text. --Stefan] Signed-off-by: Stefan Hajnoczi --- docs/devel/zoned-storage.rst | 25 ++++++++++++++++++++++--- 1 file changed, 22 insertions(+), 3 deletions(-) diff --git a/docs/devel/zoned-storage.rst b/docs/devel/zoned-storage.rst index 6a36133e51..e02500c8a3 100644 --- a/docs/devel/zoned-storage.rst +++ b/docs/devel/zoned-storage.rst @@ -38,6 +38,25 @@ When the BlockBackend's BlockLimits model reports a zoned storage device, users like the virtio-blk emulation or the qemu-io-cmds.c utility can use block layer APIs for zoned storage emulation or testing. -For example, to test zone_report on a null_blk device using qemu-io is: -$ path/to/qemu-io --image-opts -n driver=host_device,filename=/dev/nullb0 --c "zrp offset nr_zones" +For example, to test zone_report on a null_blk device using qemu-io is:: + + $ path/to/qemu-io --image-opts -n driver=host_device,filename=/dev/nullb0 -c "zrp offset nr_zones" + +To expose the host's zoned block device through virtio-blk, the command line +can be (includes the -device parameter):: + + -blockdev node-name=drive0,driver=host_device,filename=/dev/nullb0,cache.direct=on \ + -device virtio-blk-pci,drive=drive0 + +Or only use the -drive parameter:: + + -driver driver=host_device,file=/dev/nullb0,if=virtio,cache.direct=on + +Additionally, QEMU has several ways of supporting zoned storage, including: +(1) Using virtio-scsi: --device scsi-block allows for the passing through of +SCSI ZBC devices, enabling the attachment of ZBC or ZAC HDDs to QEMU. +(2) PCI device pass-through: While NVMe ZNS emulation is available for testing +purposes, it cannot yet pass through a zoned device from the host. To pass on +the NVMe ZNS device to the guest, use VFIO PCI pass the entire NVMe PCI adapter +through to the guest. Likewise, an HDD HBA can be passed on to QEMU all HDDs +attached to the HBA. From patchwork Thu Apr 20 12:09:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 13218669 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A6D84C77B73 for ; Thu, 20 Apr 2023 12:11:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234738AbjDTMLs (ORCPT ); Thu, 20 Apr 2023 08:11:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32892 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234708AbjDTMLn (ORCPT ); Thu, 20 Apr 2023 08:11:43 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9573B5241 for ; Thu, 20 Apr 2023 05:10:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681992651; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HlE1Bhcf8AscNeNwVi4LhqLJ90wm+veoJgHNC6/KUQM=; b=herfggU5qf2pGLsqqgxtdwkcvvu6cODJ2wZURTxamiBg2kNJsh4o655AzjloGz33S/gRQ2 Xo4LoAgfUg1Uulnpj/jKI+j24Tllbrtk/xrWddgw1O8nTs3XZKYrfmNGMCXQd4lKu4tfx+ 8w653K0GHXet+bCG8+wEcZ0g1LAmxQY= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-1-tXWZQmEjOai0yK4aPntzkg-1; Thu, 20 Apr 2023 08:10:47 -0400 X-MC-Unique: tXWZQmEjOai0yK4aPntzkg-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E4926282380E; Thu, 20 Apr 2023 12:10:46 +0000 (UTC) Received: from localhost (unknown [10.39.193.254]) by smtp.corp.redhat.com (Postfix) with ESMTP id 243341410F1C; Thu, 20 Apr 2023 12:10:45 +0000 (UTC) From: Stefan Hajnoczi To: qemu-devel@nongnu.org Cc: Raphael Norwitz , Kevin Wolf , Markus Armbruster , Julia Suvorova , Eric Blake , Paolo Bonzini , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , =?utf-8?q?Marc?= =?utf-8?q?-Andr=C3=A9_Lureau?= , "Michael S. Tsirkin" , Thomas Huth , qemu-block@nongnu.org, Cornelia Huck , "Dr. David Alan Gilbert" , =?utf-8?q?Daniel_P=2E_Berra?= =?utf-8?q?ng=C3=A9?= , Peter Maydell , Stefano Garzarella , kvm@vger.kernel.org, virtio-fs@redhat.com, Stefan Hajnoczi , Hanna Reitz , Fam Zheng , Aarushi Mehta , Carlos Santos Subject: [PULL 20/20] tracing: install trace events file only if necessary Date: Thu, 20 Apr 2023 08:09:48 -0400 Message-Id: <20230420120948.436661-21-stefanha@redhat.com> In-Reply-To: <20230420120948.436661-1-stefanha@redhat.com> References: <20230420120948.436661-1-stefanha@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Carlos Santos It is not useful when configuring with --enable-trace-backends=nop. Signed-off-by: Carlos Santos Signed-off-by: Stefan Hajnoczi Message-Id: <20230408010410.281263-1-casantos@redhat.com> --- trace/meson.build | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/trace/meson.build b/trace/meson.build index 8e80be895c..30b1d942eb 100644 --- a/trace/meson.build +++ b/trace/meson.build @@ -64,7 +64,7 @@ trace_events_all = custom_target('trace-events-all', input: trace_events_files, command: [ 'cat', '@INPUT@' ], capture: true, - install: true, + install: get_option('trace_backends') != [ 'nop' ], install_dir: qemu_datadir) if 'ust' in get_option('trace_backends')