From patchwork Mon Nov 27 08:56:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sam Li X-Patchwork-Id: 13469268 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 66D42C4167B for ; Mon, 27 Nov 2023 08:58:51 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1r7XRS-0002if-4S; Mon, 27 Nov 2023 03:58:10 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1r7XRL-0002fu-5N; Mon, 27 Nov 2023 03:58:03 -0500 Received: from mail-pj1-x1032.google.com ([2607:f8b0:4864:20::1032]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1r7XRA-0000JC-AE; Mon, 27 Nov 2023 03:58:01 -0500 Received: by mail-pj1-x1032.google.com with SMTP id 98e67ed59e1d1-285c3512f37so721256a91.3; Mon, 27 Nov 2023 00:57:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701075453; x=1701680253; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=7K9aluml347AFFauR7NohNELKkT/1POWxw6ovnr1s2Q=; b=iL5OnR2/CQgLn4HVjgVLQp5220PxMpcPEdGkCqs8Twit0GHXkSBLGjWyu2/DVWSv41 5c+riCc3OSSN0nKSCGp1n1LgAkWvJjYCaPNbf4+LhAFkkd8xH+pt6B4M0tnaTLbmauxM fxBtBrZKH538k6v02VTvtVY7+wX/33HYk4TM9c0d0QU5UVxjnER+UuKfBm2U5PBEGAVB wxDaFtCJWknG4zT90rEYAI5PBxS8CE1FSbnfbQETy+m5GSCnw++6lacn4oxLqVkJi0Ax pzNRn8N90j0YB5WghYq1wq1G7ZbmI71RrlYRc+XKjqXc8e9WO4uH5QuBimiAQtJlAN30 U59g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701075453; x=1701680253; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7K9aluml347AFFauR7NohNELKkT/1POWxw6ovnr1s2Q=; b=wpI3sQhc2pE2ErU2rCrFwMSUfaYxL77z/CcH3J4+00F9gbpIPmudgjBNzaUrreOw6U sp0Z2hlNs2e6pip7KuxVTSQc6D5+jkv8opKMc6ebsOYNB7MB7kyamuW3mI+3tE/nJlLw ailxSxhCIWjlnoGNHoQfYwu99xG5en8XJm6Hvwf1FUs6c8z700jXiUwHahPPgd83i7an uEpiJD4TFZNbXPAzRSEmn38Cw7DkUdk4BuWtMYfUl5sZKRXZNsRSk//lTnpLcLgeHb1p 2lVWreswqVFxCmmK9Fgo5M78BNevBVkc9ytUhhtkFvax4GNIoQqtQkNWyJ6Wa/eoWUTF BW/g== X-Gm-Message-State: AOJu0Ywkjysp23nDmPgk6wKVkFru0fO7lrDpi+HhVcsDfASHdHV6/Mqg plB45u6We2eSBwRWTz4DyTdrcL5N2KkwqA== X-Google-Smtp-Source: AGHT+IG4oNYfIjtkSzUq5mITf2w29DwgqozfzS4iu1Dl+txOQoH7uNuINdM9Om8Q+E7pxoCy7KrujQ== X-Received: by 2002:a17:90b:180c:b0:285:adb0:de3e with SMTP id lw12-20020a17090b180c00b00285adb0de3emr6218988pjb.34.1701075452780; Mon, 27 Nov 2023 00:57:32 -0800 (PST) Received: from fedlinux.. ([106.84.128.244]) by smtp.gmail.com with ESMTPSA id ci8-20020a17090afc8800b0027d0adf653bsm6906901pjb.7.2023.11.27.00.57.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 27 Nov 2023 00:57:32 -0800 (PST) From: Sam Li To: qemu-devel@nongnu.org Cc: stefanha@redhat.com, Klaus Jensen , qemu-block@nongnu.org, hare@suse.de, David Hildenbrand , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Keith Busch , Hanna Reitz , dmitry.fomichev@wdc.com, Kevin Wolf , Markus Armbruster , Eric Blake , Peter Xu , Paolo Bonzini , dlemoal@kernel.org, Sam Li Subject: [RFC v2 1/7] docs/qcow2: add zd_extension_size option to the zoned format feature Date: Mon, 27 Nov 2023 16:56:35 +0800 Message-Id: <20231127085641.3729-2-faithilikerun@gmail.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20231127085641.3729-1-faithilikerun@gmail.com> References: <20231127085641.3729-1-faithilikerun@gmail.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::1032; envelope-from=faithilikerun@gmail.com; helo=mail-pj1-x1032.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Signed-off-by: Sam Li --- docs/interop/qcow2.txt | 3 +++ 1 file changed, 3 insertions(+) diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt index 0f1938f056..458d05371a 100644 --- a/docs/interop/qcow2.txt +++ b/docs/interop/qcow2.txt @@ -428,6 +428,9 @@ The fields of the zoned extension are: The offset of zoned metadata structure in the contained image, in bytes. + 44 - 51: zd_extension_size + The size of zone descriptor extension data in bytes. + == Full disk encryption header pointer == The full disk encryption header must be present if, and only if, the From patchwork Mon Nov 27 08:56:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sam Li X-Patchwork-Id: 13469270 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 35FB5C4167B for ; Mon, 27 Nov 2023 08:59:30 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1r7XRW-0002lZ-Um; Mon, 27 Nov 2023 03:58:14 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1r7XRK-0002g2-Jt; Mon, 27 Nov 2023 03:58:03 -0500 Received: from mail-pg1-x534.google.com ([2607:f8b0:4864:20::534]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1r7XRA-0000Je-AR; Mon, 27 Nov 2023 03:58:02 -0500 Received: by mail-pg1-x534.google.com with SMTP id 41be03b00d2f7-5bcfc508d14so2803268a12.3; Mon, 27 Nov 2023 00:57:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701075461; x=1701680261; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Wxu8fnOR3bRYjaWrjbiEIYAc8Ovilupp+SR6wMFCXwI=; b=WXXgysobnQ86E06iUSKcd1Zl+lR+nI7jqX/1DTLeMfZC7vKH+do5oVrjTjXI4MxUiO U09D6zhE/bWkSrmZ890UvTRcP5ydqvV94DBMeKwOolI5c8Nkk/YO1v7JxsGgjRJSBeAH /VRoHvOZHdp9Gy9fPpms44hWeGH9iVYe/HbJwg6V/mlLO9eGkEO2jEGNFkcHrpw8+iJl RSkxXxsuJHY8YnHqpcx10/IXOGL7GKAWaiXRjCblKaP7oCCwxS+6QKjXyfe74emiPZ3M cH2FdQ5lbdUxWqXzRpZwy/tzIcBSDHEwActizDzkasckq2wolp46IaqnKx151hjd+bsD aU+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701075461; x=1701680261; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Wxu8fnOR3bRYjaWrjbiEIYAc8Ovilupp+SR6wMFCXwI=; b=ph80i/sZSTFSikT1KGdQg2ymo4/TMLy9pYT/ZJ9StA6kSa6anFvo5ygyKy3yeZXhT7 1mxesUvuJkxHHQH5xyedI5tsimEf9uqKiUdPVW+auYcxcYqFYVgxTRZNTwPF77pq6nP1 UozqtKbMDzugBxp/MVSJHMU1ncG1vYKmjmDVNysJgNn0xsJyq6GHRz0e7hCXJ8U77Cxk q1hvu6dIRhT+8BqKJSYDfHkUbYsVuTNvM+OseeZ2V6eNTmBuJOpj/EJpFb9/551ExnXD DOS4Djfppebb+oZ87dkGOmz6GsDnE2+nKFYl8E4fNV9rdQCkPN87Oxq5W/zkW+7PaCb9 38UA== X-Gm-Message-State: AOJu0YwKjXjcg5+KlL/AWfB6pH+CWTDb0LwZ0gSaRVj7Z45hJZ/OIuAn pepycJ5fEy0FX6RMI3vXM+OfwnBvVPlolQ== X-Google-Smtp-Source: AGHT+IF89SnoeL4yLPhkcef0IyoXE135ulceZtGHIxdwDpRHQC8dN837B6PwWnYS8ijHGxtv3bpdBQ== X-Received: by 2002:a17:90b:4a0b:b0:283:2873:8882 with SMTP id kk11-20020a17090b4a0b00b0028328738882mr12455604pjb.10.1701075460817; Mon, 27 Nov 2023 00:57:40 -0800 (PST) Received: from fedlinux.. ([106.84.128.244]) by smtp.gmail.com with ESMTPSA id ci8-20020a17090afc8800b0027d0adf653bsm6906901pjb.7.2023.11.27.00.57.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 27 Nov 2023 00:57:40 -0800 (PST) From: Sam Li To: qemu-devel@nongnu.org Cc: stefanha@redhat.com, Klaus Jensen , qemu-block@nongnu.org, hare@suse.de, David Hildenbrand , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Keith Busch , Hanna Reitz , dmitry.fomichev@wdc.com, Kevin Wolf , Markus Armbruster , Eric Blake , Peter Xu , Paolo Bonzini , dlemoal@kernel.org, Sam Li Subject: [RFC v2 2/7] qcow2: add zd_extension configurations to zoned metadata Date: Mon, 27 Nov 2023 16:56:36 +0800 Message-Id: <20231127085641.3729-3-faithilikerun@gmail.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20231127085641.3729-1-faithilikerun@gmail.com> References: <20231127085641.3729-1-faithilikerun@gmail.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::534; envelope-from=faithilikerun@gmail.com; helo=mail-pg1-x534.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Zone descriptor data is host definied data that is associated with each zone. Add zone descriptor extensions to zonedmeta struct. Signed-off-by: Sam Li --- block/qcow2.c | 69 +++++++++++++++++++++++++++++--- block/qcow2.h | 2 + include/block/block_int-common.h | 6 +++ qapi/block-core.json | 4 ++ 4 files changed, 76 insertions(+), 5 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index 26f2bb4a87..75dff27216 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -354,7 +354,8 @@ static inline int qcow2_refresh_zonedmeta(BlockDriverState *bs) { int ret; BDRVQcow2State *s = bs->opaque; - uint64_t wps_size = s->zoned_header.zonedmeta_size; + uint64_t wps_size = s->zoned_header.zonedmeta_size - + s->zded_size; g_autofree uint64_t *temp = NULL; temp = g_new(uint64_t, wps_size); ret = bdrv_pread(bs->file, s->zoned_header.zonedmeta_offset, @@ -364,7 +365,17 @@ static inline int qcow2_refresh_zonedmeta(BlockDriverState *bs) return ret; } + g_autofree uint8_t *zded = NULL; + zded = g_try_malloc0(s->zded_size); + ret = bdrv_pread(bs->file, s->zoned_header.zonedmeta_offset + wps_size, + s->zded_size, zded, 0); + if (ret < 0) { + error_report("Can not read zded"); + return ret; + } + memcpy(bs->wps->wp, temp, wps_size); + memcpy(bs->zd_extensions, zded, s->zded_size); return 0; } @@ -390,6 +401,19 @@ qcow2_check_zone_options(Qcow2ZonedHeaderExtension *zone_opt) return false; } + if (zone_opt->zd_extension_size) { + if (zone_opt->zd_extension_size & 0x3f) { + error_report("zone descriptor extension size must be a " + "multiple of 64B"); + return false; + } + + if ((zone_opt->zd_extension_size >> 6) > 0xff) { + error_report("Zone descriptor extension size is too large"); + return false; + } + } + if (zone_opt->max_active_zones > zone_opt->nr_zones) { error_report("Max_active_zones %" PRIu32 " exceeds " "nr_zones %" PRIu32". Set it to nr_zones.", @@ -676,6 +700,8 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset, zoned_ext.conventional_zones = be32_to_cpu(zoned_ext.conventional_zones); zoned_ext.nr_zones = be32_to_cpu(zoned_ext.nr_zones); + zoned_ext.zd_extension_size = + be32_to_cpu(zoned_ext.zd_extension_size); zoned_ext.max_open_zones = be32_to_cpu(zoned_ext.max_open_zones); zoned_ext.max_active_zones = be32_to_cpu(zoned_ext.max_active_zones); @@ -686,7 +712,8 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset, zoned_ext.zonedmeta_size = be64_to_cpu(zoned_ext.zonedmeta_size); s->zoned_header = zoned_ext; bs->wps = g_malloc(sizeof(BlockZoneWps) - + s->zoned_header.zonedmeta_size); + + zoned_ext.zonedmeta_size - s->zded_size); + bs->zd_extensions = g_malloc0(s->zded_size); ret = qcow2_refresh_zonedmeta(bs); if (ret < 0) { error_setg_errno(errp, -ret, "zonedmeta: " @@ -2264,6 +2291,7 @@ static void qcow2_refresh_limits(BlockDriverState *bs, Error **errp) bs->bl.zone_size = s->zoned_header.zone_size; bs->bl.zone_capacity = s->zoned_header.zone_capacity; bs->bl.write_granularity = BDRV_SECTOR_SIZE; + bs->bl.zd_extension_size = s->zoned_header.zd_extension_size; } static int GRAPH_UNLOCKED @@ -3534,6 +3562,8 @@ int qcow2_update_header(BlockDriverState *bs) .conventional_zones = cpu_to_be32(s->zoned_header.conventional_zones), .nr_zones = cpu_to_be32(s->zoned_header.nr_zones), + .zd_extension_size = + cpu_to_be32(s->zoned_header.zd_extension_size), .max_open_zones = cpu_to_be32(s->zoned_header.max_open_zones), .max_active_zones = cpu_to_be32(s->zoned_header.max_active_zones), @@ -4287,6 +4317,15 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp) } s->zoned_header.max_append_bytes = zone_host_managed->max_append_bytes; + uint64_t zded_size = 0; + if (zone_host_managed->has_descriptor_extension_size) { + s->zoned_header.zd_extension_size = + zone_host_managed->descriptor_extension_size; + zded_size = s->zoned_header.zd_extension_size * + bs->bl.nr_zones; + } + s->zded_size = zded_size; + if (!qcow2_check_zone_options(&s->zoned_header)) { s->zoned_header.zoned = BLK_Z_NONE; ret = -EINVAL; @@ -4294,7 +4333,7 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp) } uint32_t nrz = s->zoned_header.nr_zones; - zoned_meta_size = sizeof(uint64_t) * nrz; + zoned_meta_size = sizeof(uint64_t) * nrz + zded_size; g_autofree uint64_t *meta = NULL; meta = g_new0(uint64_t, nrz); @@ -4326,11 +4365,24 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp) error_setg_errno(errp, -ret, "Could not zero fill zoned metadata"); goto out; } - ret = bdrv_pwrite(blk_bs(blk)->file, offset, zoned_meta_size, meta, 0); + + ret = bdrv_pwrite(blk_bs(blk)->file, offset, + zoned_meta_size - zded_size, meta, 0); if (ret < 0) { error_setg_errno(errp, -ret, "Could not write zoned metadata " "to disk"); } + + if (zone_host_managed->has_descriptor_extension_size) { + /* Initialize zone descriptor extensions */ + ret = bdrv_co_pwrite_zeroes(blk_bs(blk)->file, offset + zded_size, + zded_size, 0); + if (ret < 0) { + error_setg_errno(errp, -ret, "Could not write zone descriptor" + "extensions to disk"); + goto out; + } + } } else { s->zoned_header.zoned = BLK_Z_NONE; } @@ -4472,6 +4524,7 @@ qcow2_co_create_opts(BlockDriver *drv, const char *filename, QemuOpts *opts, { BLOCK_OPT_MAX_OPEN_ZONES, "zone.max-open-zones" }, { BLOCK_OPT_MAX_ACTIVE_ZONES, "zone.max-active-zones" }, { BLOCK_OPT_MAX_APPEND_BYTES, "zone.max-append-bytes" }, + { BLOCK_OPT_ZD_EXT_SIZE, "zone.descriptor-extension-size" }, { NULL, NULL }, }; @@ -7061,7 +7114,13 @@ static QemuOptsList qcow2_create_opts = { .name = BLOCK_OPT_MAX_OPEN_ZONES, \ .type = QEMU_OPT_NUMBER, \ .help = "max open zones", \ - }, + }, \ + { \ + .name = BLOCK_OPT_ZD_EXT_SIZE, \ + .type = QEMU_OPT_SIZE, \ + .help = "zone descriptor extension size (defaults " \ + "to 0, must be a multiple of 64 bytes)", \ + }, \ QCOW_COMMON_OPTIONS, { /* end of list */ } } diff --git a/block/qcow2.h b/block/qcow2.h index 7f37bb4034..b7a8f4f4b6 100644 --- a/block/qcow2.h +++ b/block/qcow2.h @@ -249,6 +249,7 @@ typedef struct Qcow2ZonedHeaderExtension { uint32_t max_append_bytes; uint64_t zonedmeta_size; uint64_t zonedmeta_offset; + uint32_t zd_extension_size; /* must be multiple of 64 B */ } QEMU_PACKED Qcow2ZonedHeaderExtension; typedef struct Qcow2ZoneListEntry { @@ -456,6 +457,7 @@ typedef struct BDRVQcow2State { uint32_t nr_zones_exp_open; uint32_t nr_zones_imp_open; uint32_t nr_zones_closed; + uint64_t zded_size; } BDRVQcow2State; typedef struct Qcow2COWRegion { diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h index 0d231bd1f7..c649f1ca75 100644 --- a/include/block/block_int-common.h +++ b/include/block/block_int-common.h @@ -64,6 +64,7 @@ #define BLOCK_OPT_MAX_APPEND_BYTES "zone.max_append_bytes" #define BLOCK_OPT_MAX_ACTIVE_ZONES "zone.max_active_zones" #define BLOCK_OPT_MAX_OPEN_ZONES "zone.max_open_zones" +#define BLOCK_OPT_ZD_EXT_SIZE "zd_extension_size" #define BLOCK_PROBE_BUF_SIZE 512 @@ -912,6 +913,9 @@ typedef struct BlockLimits { uint32_t max_active_zones; uint32_t write_granularity; + + /* size of data that is associated with a zone in bytes */ + uint32_t zd_extension_size; } BlockLimits; typedef struct BdrvOpBlocker BdrvOpBlocker; @@ -1270,6 +1274,8 @@ struct BlockDriverState { /* array of write pointers' location of each zone in the zoned device. */ BlockZoneWps *wps; + + uint8_t *zd_extensions; }; struct BlockBackendRootState { diff --git a/qapi/block-core.json b/qapi/block-core.json index ef98dc83a0..a7f238371c 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -5074,12 +5074,16 @@ # append request that can be issued to the device. It must be # 512-byte aligned # +# @descriptor-extension-size: The size of zone descriptor extension +# data. Must be a multiple of 64 bytes (since 8.2) +# # Since 8.2 ## { 'struct': 'Qcow2ZoneHostManaged', 'data': { '*size': 'size', '*capacity': 'size', '*conventional-zones': 'uint32', + '*descriptor-extension-size': 'size', '*max-open-zones': 'uint32', '*max-active-zones': 'uint32', '*max-append-bytes': 'uint32' } } From patchwork Mon Nov 27 08:56:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sam Li X-Patchwork-Id: 13469272 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 722FBC4167B for ; Mon, 27 Nov 2023 09:00:27 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1r7XRR-0002i5-I6; Mon, 27 Nov 2023 03:58:10 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1r7XRK-0002g6-Qz; Mon, 27 Nov 2023 03:58:03 -0500 Received: from mail-pj1-x1035.google.com ([2607:f8b0:4864:20::1035]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1r7XRH-0000K4-NB; Mon, 27 Nov 2023 03:58:02 -0500 Received: by mail-pj1-x1035.google.com with SMTP id 98e67ed59e1d1-2859551886cso1817908a91.2; Mon, 27 Nov 2023 00:57:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701075470; x=1701680270; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=190vkjgCxGMhvXwETkOEV/Shk2N4Lwz/fIZUv3lxyK8=; b=Y3lJ+cOZoz1d+TVyH3DWz0hTwquAu3F+QrKQzrDabBKuop/Hn009v+vrwQxH8FeEh4 hXy7yQbBdj6SVPxt3nIFxhF0O1r5yeGTzXgpIaM5YDJJwcOSmWwUAKHZZ2l84gspdfnP VfEcCphlu87QeQhQ30FN44EB2ydEuPLF+AP9obMprpr79rok67AtjejnsZtpR1I29fUa hZELq0p13OBuG44wSmblZjlHTZqnBRiQ4t23pgJIB+wFXnVg8u+pAgWuU1qGK6jPPd7v lAqCBDGNfCcYd4ed/HlijQ5brkNisl5Yky391qeiFXpKQJJIG9tGYG+XRWgoNNB+paMy G+JA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701075470; x=1701680270; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=190vkjgCxGMhvXwETkOEV/Shk2N4Lwz/fIZUv3lxyK8=; b=BhCaJWspg90jY3iJFZDkI0GknXE7XG4RuvwRqsfsDHJWc5IYMrHEf8qsPIms3qbM5v AqT0YXrdZUpSuOA8u/sWrOVWnRXbrlYd0OrK7yaCpMbntx7kXX8J+NRYTyzG8emYHRZz Zqt78PbXZAgtjwNm7IIvZVjd15mvwTGf1goHiC703MJ386iTKANWgRDh+P3Ogqn/BQRy ApuFSMwXRIq9uUqgnKgn/gbo0o2snIexidnwU4wso4+3pwRNOWZ7wFmkV5UQh/MySmFz fJ60CSJ6qWX3a7CTCinNAgg5KoTk/6mNNg88E3tWEXchHhvCLDfa7uZEku6rRsoVhvBY sg1w== X-Gm-Message-State: AOJu0YyHevGPLpfjGRWQXvZIuM+XG4e/AuCNTmYONEqo5McENcpqiDjg uXS+GiZ1Lfd5VoQTbNMsn05j0fY2MnzcCw== X-Google-Smtp-Source: AGHT+IFRFaMMkn00TNvh0M/whhFsJIjwsEuFxpgDEZTh97b8yfX8TSbKoP3XnQUZCi6TpxGT2yw0EQ== X-Received: by 2002:a17:90b:1e10:b0:285:b800:fbe0 with SMTP id pg16-20020a17090b1e1000b00285b800fbe0mr5929466pjb.16.1701075469406; Mon, 27 Nov 2023 00:57:49 -0800 (PST) Received: from fedlinux.. ([106.84.128.244]) by smtp.gmail.com with ESMTPSA id ci8-20020a17090afc8800b0027d0adf653bsm6906901pjb.7.2023.11.27.00.57.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 27 Nov 2023 00:57:48 -0800 (PST) From: Sam Li To: qemu-devel@nongnu.org Cc: stefanha@redhat.com, Klaus Jensen , qemu-block@nongnu.org, hare@suse.de, David Hildenbrand , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Keith Busch , Hanna Reitz , dmitry.fomichev@wdc.com, Kevin Wolf , Markus Armbruster , Eric Blake , Peter Xu , Paolo Bonzini , dlemoal@kernel.org, Sam Li Subject: [RFC v2 3/7] hw/nvme: use blk_get_*() to access zone info in the block layer Date: Mon, 27 Nov 2023 16:56:37 +0800 Message-Id: <20231127085641.3729-4-faithilikerun@gmail.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20231127085641.3729-1-faithilikerun@gmail.com> References: <20231127085641.3729-1-faithilikerun@gmail.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::1035; envelope-from=faithilikerun@gmail.com; helo=mail-pj1-x1035.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org The zone information is contained in the BlockLimits fileds. Add blk_get_*() functions to access the block layer and update zone info accessing in the NVMe device emulation. Signed-off-by: Sam Li --- block/block-backend.c | 72 +++++++++++++++++++++++++++++++ hw/nvme/ctrl.c | 34 +++++---------- hw/nvme/ns.c | 61 ++++++++------------------ hw/nvme/nvme.h | 3 -- include/sysemu/block-backend-io.h | 9 ++++ 5 files changed, 111 insertions(+), 68 deletions(-) diff --git a/block/block-backend.c b/block/block-backend.c index ec21148806..666df9cfea 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -2380,6 +2380,78 @@ int blk_get_max_iov(BlockBackend *blk) return blk->root->bs->bl.max_iov; } +uint8_t blk_get_zone_model(BlockBackend *blk) +{ + BlockDriverState *bs = blk_bs(blk); + IO_CODE(); + return bs ? bs->bl.zoned: 0; + +} + +uint32_t blk_get_zone_size(BlockBackend *blk) +{ + BlockDriverState *bs = blk_bs(blk); + IO_CODE(); + + return bs ? bs->bl.zone_size : 0; +} + +uint32_t blk_get_zone_capacity(BlockBackend *blk) +{ + BlockDriverState *bs = blk_bs(blk); + IO_CODE(); + + return bs ? bs->bl.zone_capacity : 0; +} + +uint32_t blk_get_max_open_zones(BlockBackend *blk) +{ + BlockDriverState *bs = blk_bs(blk); + IO_CODE(); + + return bs ? bs->bl.max_open_zones : 0; +} + +uint32_t blk_get_max_active_zones(BlockBackend *blk) +{ + BlockDriverState *bs = blk_bs(blk); + IO_CODE(); + + return bs ? bs->bl.max_active_zones : 0; +} + +uint32_t blk_get_max_append_sectors(BlockBackend *blk) +{ + BlockDriverState *bs = blk_bs(blk); + IO_CODE(); + + return bs ? bs->bl.max_append_sectors : 0; +} + +uint32_t blk_get_nr_zones(BlockBackend *blk) +{ + BlockDriverState *bs = blk_bs(blk); + IO_CODE(); + + return bs ? bs->bl.nr_zones : 0; +} + +uint32_t blk_get_write_granularity(BlockBackend *blk) +{ + BlockDriverState *bs = blk_bs(blk); + IO_CODE(); + + return bs ? bs->bl.write_granularity : 0; +} + +BlockZoneWps *blk_get_zone_wps(BlockBackend *blk) +{ + BlockDriverState *bs = blk_bs(blk); + IO_CODE(); + + return bs ? bs->wps : NULL; +} + void *blk_try_blockalign(BlockBackend *blk, size_t size) { IO_CODE(); diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index f026245d1e..e64b021454 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -417,18 +417,6 @@ static void nvme_assign_zone_state(NvmeNamespace *ns, NvmeZone *zone, static uint16_t nvme_zns_check_resources(NvmeNamespace *ns, uint32_t act, uint32_t opn, uint32_t zrwa) { - if (ns->params.max_active_zones != 0 && - ns->nr_active_zones + act > ns->params.max_active_zones) { - trace_pci_nvme_err_insuff_active_res(ns->params.max_active_zones); - return NVME_ZONE_TOO_MANY_ACTIVE | NVME_DNR; - } - - if (ns->params.max_open_zones != 0 && - ns->nr_open_zones + opn > ns->params.max_open_zones) { - trace_pci_nvme_err_insuff_open_res(ns->params.max_open_zones); - return NVME_ZONE_TOO_MANY_OPEN | NVME_DNR; - } - if (zrwa > ns->zns.numzrwa) { return NVME_NOZRWA | NVME_DNR; } @@ -1988,9 +1976,9 @@ static uint16_t nvme_zrm_reset(NvmeNamespace *ns, NvmeZone *zone) static void nvme_zrm_auto_transition_zone(NvmeNamespace *ns) { NvmeZone *zone; + int moz = blk_get_max_open_zones(ns->blkconf.blk); - if (ns->params.max_open_zones && - ns->nr_open_zones == ns->params.max_open_zones) { + if (moz && ns->nr_open_zones == moz) { zone = QTAILQ_FIRST(&ns->imp_open_zones); if (zone) { /* @@ -2160,7 +2148,7 @@ void nvme_rw_complete_cb(void *opaque, int ret) block_acct_done(stats, acct); } - if (ns->params.zoned && nvme_is_write(req)) { + if (blk_get_zone_model(blk) && nvme_is_write(req)) { nvme_finalize_zoned_write(ns, req); } @@ -2882,7 +2870,7 @@ static void nvme_copy_out_completed_cb(void *opaque, int ret) goto out; } - if (ns->params.zoned) { + if (blk_get_zone_model(ns->blkconf.blk)) { nvme_advance_zone_wp(ns, iocb->zone, nlb); } @@ -2994,7 +2982,7 @@ static void nvme_copy_in_completed_cb(void *opaque, int ret) goto invalid; } - if (ns->params.zoned) { + if (blk_get_zone_model(ns->blkconf.blk)) { status = nvme_check_zone_write(ns, iocb->zone, iocb->slba, nlb); if (status) { goto invalid; @@ -3088,7 +3076,7 @@ static void nvme_do_copy(NvmeCopyAIOCB *iocb) } } - if (ns->params.zoned) { + if (blk_get_zone_model(ns->blkconf.blk)) { status = nvme_check_zone_read(ns, slba, nlb); if (status) { goto invalid; @@ -3164,7 +3152,7 @@ static uint16_t nvme_copy(NvmeCtrl *n, NvmeRequest *req) iocb->slba = le64_to_cpu(copy->sdlba); - if (ns->params.zoned) { + if (blk_get_zone_model(ns->blkconf.blk)) { iocb->zone = nvme_get_zone_by_slba(ns, iocb->slba); if (!iocb->zone) { status = NVME_LBA_RANGE | NVME_DNR; @@ -3434,7 +3422,7 @@ static uint16_t nvme_read(NvmeCtrl *n, NvmeRequest *req) goto invalid; } - if (ns->params.zoned) { + if (blk_get_zone_model(blk)) { status = nvme_check_zone_read(ns, slba, nlb); if (status) { trace_pci_nvme_err_zone_read_not_ok(slba, nlb, status); @@ -3549,7 +3537,7 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest *req, bool append, goto invalid; } - if (ns->params.zoned) { + if (blk_get_zone_model(blk)) { zone = nvme_get_zone_by_slba(ns, slba); assert(zone); @@ -3667,7 +3655,7 @@ static uint16_t nvme_get_mgmt_zone_slba_idx(NvmeNamespace *ns, NvmeCmd *c, uint32_t dw10 = le32_to_cpu(c->cdw10); uint32_t dw11 = le32_to_cpu(c->cdw11); - if (!ns->params.zoned) { + if (blk_get_zone_model(ns->blkconf.blk)) { trace_pci_nvme_err_invalid_opc(c->opcode); return NVME_INVALID_OPCODE | NVME_DNR; } @@ -6527,7 +6515,7 @@ done: static uint16_t nvme_format_check(NvmeNamespace *ns, uint8_t lbaf, uint8_t pi) { - if (ns->params.zoned) { + if (blk_get_zone_model(ns->blkconf.blk)) { return NVME_INVALID_FORMAT | NVME_DNR; } diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c index 0eabcf5cf5..82d4f7932d 100644 --- a/hw/nvme/ns.c +++ b/hw/nvme/ns.c @@ -25,7 +25,6 @@ #include "trace.h" #define MIN_DISCARD_GRANULARITY (4 * KiB) -#define NVME_DEFAULT_ZONE_SIZE (128 * MiB) void nvme_ns_init_format(NvmeNamespace *ns) { @@ -177,19 +176,11 @@ static int nvme_ns_init_blk(NvmeNamespace *ns, Error **errp) static int nvme_ns_zoned_check_calc_geometry(NvmeNamespace *ns, Error **errp) { - uint64_t zone_size, zone_cap; + BlockBackend *blk = ns->blkconf.blk; + uint64_t zone_size = blk_get_zone_size(blk); + uint64_t zone_cap = blk_get_zone_capacity(blk); /* Make sure that the values of ZNS properties are sane */ - if (ns->params.zone_size_bs) { - zone_size = ns->params.zone_size_bs; - } else { - zone_size = NVME_DEFAULT_ZONE_SIZE; - } - if (ns->params.zone_cap_bs) { - zone_cap = ns->params.zone_cap_bs; - } else { - zone_cap = zone_size; - } if (zone_cap > zone_size) { error_setg(errp, "zone capacity %"PRIu64"B exceeds " "zone size %"PRIu64"B", zone_cap, zone_size); @@ -266,6 +257,7 @@ static void nvme_ns_zoned_init_state(NvmeNamespace *ns) static void nvme_ns_init_zoned(NvmeNamespace *ns) { + BlockBackend *blk = ns->blkconf.blk; NvmeIdNsZoned *id_ns_z; int i; @@ -274,8 +266,8 @@ static void nvme_ns_init_zoned(NvmeNamespace *ns) id_ns_z = g_new0(NvmeIdNsZoned, 1); /* MAR/MOR are zeroes-based, FFFFFFFFFh means no limit */ - id_ns_z->mar = cpu_to_le32(ns->params.max_active_zones - 1); - id_ns_z->mor = cpu_to_le32(ns->params.max_open_zones - 1); + id_ns_z->mar = cpu_to_le32(blk_get_max_active_zones(blk) - 1); + id_ns_z->mor = cpu_to_le32(blk_get_max_open_zones(blk) - 1); id_ns_z->zoc = 0; id_ns_z->ozcs = ns->params.cross_zone_read ? NVME_ID_NS_ZONED_OZCS_RAZB : 0x00; @@ -539,6 +531,7 @@ static bool nvme_ns_init_fdp(NvmeNamespace *ns, Error **errp) static int nvme_ns_check_constraints(NvmeNamespace *ns, Error **errp) { + BlockBackend *blk = ns->blkconf.blk; unsigned int pi_size; if (!ns->blkconf.blk) { @@ -577,25 +570,12 @@ static int nvme_ns_check_constraints(NvmeNamespace *ns, Error **errp) return -1; } - if (ns->params.zoned && ns->endgrp && ns->endgrp->fdp.enabled) { + if (blk_get_zone_model(blk) && ns->endgrp && ns->endgrp->fdp.enabled) { error_setg(errp, "cannot be a zoned- in an FDP configuration"); return -1; } - if (ns->params.zoned) { - if (ns->params.max_active_zones) { - if (ns->params.max_open_zones > ns->params.max_active_zones) { - error_setg(errp, "max_open_zones (%u) exceeds " - "max_active_zones (%u)", ns->params.max_open_zones, - ns->params.max_active_zones); - return -1; - } - - if (!ns->params.max_open_zones) { - ns->params.max_open_zones = ns->params.max_active_zones; - } - } - + if (blk_get_zone_model(blk)) { if (ns->params.zd_extension_size) { if (ns->params.zd_extension_size & 0x3f) { error_setg(errp, "zone descriptor extension size must be a " @@ -630,14 +610,14 @@ static int nvme_ns_check_constraints(NvmeNamespace *ns, Error **errp) return -1; } - if (ns->params.max_active_zones) { - if (ns->params.numzrwa > ns->params.max_active_zones) { + int maz = blk_get_max_active_zones(blk); + if (maz) { + if (ns->params.numzrwa > maz) { error_setg(errp, "number of zone random write area " "resources (zoned.numzrwa, %d) must be less " "than or equal to maximum active resources " "(zoned.max_active_zones, %d)", - ns->params.numzrwa, - ns->params.max_active_zones); + ns->params.numzrwa, maz); return -1; } } @@ -660,7 +640,7 @@ int nvme_ns_setup(NvmeNamespace *ns, Error **errp) if (nvme_ns_init(ns, errp)) { return -1; } - if (ns->params.zoned) { + if (blk_get_zone_model(ns->blkconf.blk)) { if (nvme_ns_zoned_check_calc_geometry(ns, errp) != 0) { return -1; } @@ -683,15 +663,17 @@ void nvme_ns_drain(NvmeNamespace *ns) void nvme_ns_shutdown(NvmeNamespace *ns) { - blk_flush(ns->blkconf.blk); - if (ns->params.zoned) { + + BlockBackend *blk = ns->blkconf.blk; + blk_flush(blk); + if (blk_get_zone_model(blk)) { nvme_zoned_ns_shutdown(ns); } } void nvme_ns_cleanup(NvmeNamespace *ns) { - if (ns->params.zoned) { + if (blk_get_zone_model(ns->blkconf.blk)) { g_free(ns->id_ns_zoned); g_free(ns->zone_array); g_free(ns->zd_extensions); @@ -806,11 +788,6 @@ static Property nvme_ns_props[] = { DEFINE_PROP_UINT16("mssrl", NvmeNamespace, params.mssrl, 128), DEFINE_PROP_UINT32("mcl", NvmeNamespace, params.mcl, 128), DEFINE_PROP_UINT8("msrc", NvmeNamespace, params.msrc, 127), - DEFINE_PROP_BOOL("zoned", NvmeNamespace, params.zoned, false), - DEFINE_PROP_SIZE("zoned.zone_size", NvmeNamespace, params.zone_size_bs, - NVME_DEFAULT_ZONE_SIZE), - DEFINE_PROP_SIZE("zoned.zone_capacity", NvmeNamespace, params.zone_cap_bs, - 0), DEFINE_PROP_BOOL("zoned.cross_read", NvmeNamespace, params.cross_zone_read, false), DEFINE_PROP_UINT32("zoned.max_active", NvmeNamespace, diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h index 5f2ae7b28b..76677a86e9 100644 --- a/hw/nvme/nvme.h +++ b/hw/nvme/nvme.h @@ -189,10 +189,7 @@ typedef struct NvmeNamespaceParams { uint32_t mcl; uint8_t msrc; - bool zoned; bool cross_zone_read; - uint64_t zone_size_bs; - uint64_t zone_cap_bs; uint32_t max_active_zones; uint32_t max_open_zones; uint32_t zd_extension_size; diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-io.h index d174275a5c..44e44954fa 100644 --- a/include/sysemu/block-backend-io.h +++ b/include/sysemu/block-backend-io.h @@ -99,6 +99,15 @@ void blk_error_action(BlockBackend *blk, BlockErrorAction action, void blk_iostatus_set_err(BlockBackend *blk, int error); int blk_get_max_iov(BlockBackend *blk); int blk_get_max_hw_iov(BlockBackend *blk); +uint8_t blk_get_zone_model(BlockBackend *blk); +uint32_t blk_get_zone_size(BlockBackend *blk); +uint32_t blk_get_zone_capacity(BlockBackend *blk); +uint32_t blk_get_max_open_zones(BlockBackend *blk); +uint32_t blk_get_max_active_zones(BlockBackend *blk); +uint32_t blk_get_max_append_sectors(BlockBackend *blk); +uint32_t blk_get_nr_zones(BlockBackend *blk); +uint32_t blk_get_write_granularity(BlockBackend *blk); +BlockZoneWps *blk_get_zone_wps(BlockBackend *blk); AioContext *blk_get_aio_context(BlockBackend *blk); BlockAcctStats *blk_get_stats(BlockBackend *blk); From patchwork Mon Nov 27 08:56:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sam Li X-Patchwork-Id: 13469275 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D940EC4167B for ; Mon, 27 Nov 2023 09:01:09 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1r7XRQ-0002hX-AP; Mon, 27 Nov 2023 03:58:08 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1r7XRL-0002gI-GN; Mon, 27 Nov 2023 03:58:03 -0500 Received: from mail-pg1-x532.google.com ([2607:f8b0:4864:20::532]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1r7XRI-0000Ka-AW; Mon, 27 Nov 2023 03:58:03 -0500 Received: by mail-pg1-x532.google.com with SMTP id 41be03b00d2f7-5c1a75a4b6cso2087469a12.2; Mon, 27 Nov 2023 00:57:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701075478; x=1701680278; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ch81WmdoCU80VhQYuy/fZ682t/gWb6AufUA3SdAdaV0=; b=nX/HHJZP0UbeiKAVX0zefAb2qadMl4T5pXjzlGFPKiOcRDYSX4SPY6ixEPSEPVLshl a2xVBTsSlf4Epu0tXrucWNaMAgBOqLVpAMNchCBMhj7W4Q0FyWAgtsPg2Esb4M/+ZyMh UQIQ4HdtH53NkntaIk0m9c1DU4G106RJ1uwSnJv47/HZqpedHcu5OsU0v6sykeW/aHwL l16Ehlg0ssIYRCdx2L6pTj6MagvLy9TVqyAH2uo7t7wDAGXFnbLxRlkCkRF+oRbpcu7q sm3pPXNbycxS2j0PR0BxtYwJV7lQnqK8TVfP6nkRsypfuBTDoUQTFo0NN/jK2lR5iQ+G BQtg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701075478; x=1701680278; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ch81WmdoCU80VhQYuy/fZ682t/gWb6AufUA3SdAdaV0=; b=GqNJ8Jm07O54Fpi2qKhtun6uRjdhMq4r938DdrShIRglxjliCERoW1dcrt0R4NHl+B Z8cve+RMj8rr01nVTJ2xk1l9QRgwNYWbF4zaGe430LIdtZiPTy1ZXjy0muO2jLYGg2Rr ewVIjmJt5fjWb4nQhG70IJHqqQb8stnCXh/i30iMkuS6zanADQBeflPjymT32hOrML6P um3aWQBpinu3mlYUvndcQ2bmso6eeF/UXK0VDN6cZnEk9uiJvFyuXvt16U4pwSi0OC6+ 8oHe6Os5HDpExYwC7TcMNn4HU0+IfZzCDjlzWyWOXHSyv4qgfSuJ32pwPFapLKntBrIP 2ftQ== X-Gm-Message-State: AOJu0YxCogfIRTF664JDL9nYgjH3Sp7rX235ZO8EPfIKE7uM1zhjESZm UmDz7pkJylIZiz/pe6FI4sHtUhDiEy9iLQ== X-Google-Smtp-Source: AGHT+IGx1CsvU1uNUAIYI90qb1rdcmLRZyfvIU2khleiLn/0kqnrsxEFu0qh+JSGW4G0CYjnCaRLpg== X-Received: by 2002:a17:90a:1a5d:b0:280:24c7:509 with SMTP id 29-20020a17090a1a5d00b0028024c70509mr8864273pjl.46.1701075477259; Mon, 27 Nov 2023 00:57:57 -0800 (PST) Received: from fedlinux.. ([106.84.128.244]) by smtp.gmail.com with ESMTPSA id ci8-20020a17090afc8800b0027d0adf653bsm6906901pjb.7.2023.11.27.00.57.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 27 Nov 2023 00:57:56 -0800 (PST) From: Sam Li To: qemu-devel@nongnu.org Cc: stefanha@redhat.com, Klaus Jensen , qemu-block@nongnu.org, hare@suse.de, David Hildenbrand , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Keith Busch , Hanna Reitz , dmitry.fomichev@wdc.com, Kevin Wolf , Markus Armbruster , Eric Blake , Peter Xu , Paolo Bonzini , dlemoal@kernel.org, Sam Li Subject: [RFC v2 4/7] hw/nvme: add blk_get_zone_extension to access zd_extensions Date: Mon, 27 Nov 2023 16:56:38 +0800 Message-Id: <20231127085641.3729-5-faithilikerun@gmail.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20231127085641.3729-1-faithilikerun@gmail.com> References: <20231127085641.3729-1-faithilikerun@gmail.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::532; envelope-from=faithilikerun@gmail.com; helo=mail-pg1-x532.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Signed-off-by: Sam Li --- block/block-backend.c | 16 ++++++++++++++++ hw/nvme/ctrl.c | 20 ++++++++++++++------ hw/nvme/ns.c | 24 ++++-------------------- hw/nvme/nvme.h | 7 ------- include/sysemu/block-backend-io.h | 2 ++ 5 files changed, 36 insertions(+), 33 deletions(-) diff --git a/block/block-backend.c b/block/block-backend.c index 666df9cfea..fcdcbe28bf 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -2452,6 +2452,22 @@ BlockZoneWps *blk_get_zone_wps(BlockBackend *blk) return bs ? bs->wps : NULL; } +uint8_t *blk_get_zone_extension(BlockBackend *blk) +{ + BlockDriverState *bs = blk_bs(blk); + IO_CODE(); + + return bs ? bs->zd_extensions : NULL; +} + +uint32_t blk_get_zd_ext_size(BlockBackend *blk) +{ + BlockDriverState *bs = blk_bs(blk); + IO_CODE(); + + return bs ? bs->bl.zd_extension_size : 0; +} + void *blk_try_blockalign(BlockBackend *blk, size_t size) { IO_CODE(); diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index e64b021454..dae6f00e4f 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -4004,6 +4004,12 @@ static uint16_t nvme_zone_mgmt_send_zrwa_flush(NvmeCtrl *n, NvmeZone *zone, return NVME_SUCCESS; } +static inline uint8_t *nvme_get_zd_extension(NvmeNamespace *ns, + uint32_t zone_idx) +{ + return &ns->zd_extensions[zone_idx * blk_get_zd_ext_size(ns->blkconf.blk)]; +} + static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, NvmeRequest *req) { NvmeZoneSendCmd *cmd = (NvmeZoneSendCmd *)&req->cmd; @@ -4088,11 +4094,11 @@ static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, NvmeRequest *req) case NVME_ZONE_ACTION_SET_ZD_EXT: trace_pci_nvme_set_descriptor_extension(slba, zone_idx); - if (all || !ns->params.zd_extension_size) { + if (all || !blk_get_zd_ext_size(ns->blkconf.blk)) { return NVME_INVALID_FIELD | NVME_DNR; } zd_ext = nvme_get_zd_extension(ns, zone_idx); - status = nvme_h2c(n, zd_ext, ns->params.zd_extension_size, req); + status = nvme_h2c(n, zd_ext, blk_get_zd_ext_size(ns->blkconf.blk), req); if (status) { trace_pci_nvme_err_zd_extension_map_error(zone_idx); return status; @@ -4183,7 +4189,8 @@ static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, NvmeRequest *req) if (zra != NVME_ZONE_REPORT && zra != NVME_ZONE_REPORT_EXTENDED) { return NVME_INVALID_FIELD | NVME_DNR; } - if (zra == NVME_ZONE_REPORT_EXTENDED && !ns->params.zd_extension_size) { + if (zra == NVME_ZONE_REPORT_EXTENDED && + !blk_get_zd_ext_size(ns->blkconf.blk)) { return NVME_INVALID_FIELD | NVME_DNR; } @@ -4205,7 +4212,7 @@ static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, NvmeRequest *req) zone_entry_sz = sizeof(NvmeZoneDescr); if (zra == NVME_ZONE_REPORT_EXTENDED) { - zone_entry_sz += ns->params.zd_extension_size; + zone_entry_sz += blk_get_zd_ext_size(ns->blkconf.blk) ; } max_zones = (data_size - sizeof(NvmeZoneReportHeader)) / zone_entry_sz; @@ -4243,11 +4250,12 @@ static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, NvmeRequest *req) } if (zra == NVME_ZONE_REPORT_EXTENDED) { + int zd_ext_size = blk_get_zd_ext_size(ns->blkconf.blk); if (zone->d.za & NVME_ZA_ZD_EXT_VALID) { memcpy(buf_p, nvme_get_zd_extension(ns, zone_idx), - ns->params.zd_extension_size); + zd_ext_size); } - buf_p += ns->params.zd_extension_size; + buf_p += zd_ext_size; } max_zones--; diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c index 82d4f7932d..45c08391f5 100644 --- a/hw/nvme/ns.c +++ b/hw/nvme/ns.c @@ -218,15 +218,15 @@ static int nvme_ns_zoned_check_calc_geometry(NvmeNamespace *ns, Error **errp) static void nvme_ns_zoned_init_state(NvmeNamespace *ns) { + BlockBackend *blk = ns->blkconf.blk; uint64_t start = 0, zone_size = ns->zone_size; uint64_t capacity = ns->num_zones * zone_size; NvmeZone *zone; int i; ns->zone_array = g_new0(NvmeZone, ns->num_zones); - if (ns->params.zd_extension_size) { - ns->zd_extensions = g_malloc0(ns->params.zd_extension_size * - ns->num_zones); + if (blk_get_zone_extension(blk)) { + ns->zd_extensions = blk_get_zone_extension(blk); } QTAILQ_INIT(&ns->exp_open_zones); @@ -275,7 +275,7 @@ static void nvme_ns_init_zoned(NvmeNamespace *ns) for (i = 0; i <= ns->id_ns.nlbaf; i++) { id_ns_z->lbafe[i].zsze = cpu_to_le64(ns->zone_size); id_ns_z->lbafe[i].zdes = - ns->params.zd_extension_size >> 6; /* Units of 64B */ + blk_get_zd_ext_size(blk) >> 6; /* Units of 64B */ } if (ns->params.zrwas) { @@ -576,19 +576,6 @@ static int nvme_ns_check_constraints(NvmeNamespace *ns, Error **errp) } if (blk_get_zone_model(blk)) { - if (ns->params.zd_extension_size) { - if (ns->params.zd_extension_size & 0x3f) { - error_setg(errp, "zone descriptor extension size must be a " - "multiple of 64B"); - return -1; - } - if ((ns->params.zd_extension_size >> 6) > 0xff) { - error_setg(errp, - "zone descriptor extension size is too large"); - return -1; - } - } - if (ns->params.zrwas) { if (ns->params.zrwas % ns->blkconf.logical_block_size) { error_setg(errp, "zone random write area size (zoned.zrwas " @@ -676,7 +663,6 @@ void nvme_ns_cleanup(NvmeNamespace *ns) if (blk_get_zone_model(ns->blkconf.blk)) { g_free(ns->id_ns_zoned); g_free(ns->zone_array); - g_free(ns->zd_extensions); } if (ns->endgrp && ns->endgrp->fdp.enabled) { @@ -794,8 +780,6 @@ static Property nvme_ns_props[] = { params.max_active_zones, 0), DEFINE_PROP_UINT32("zoned.max_open", NvmeNamespace, params.max_open_zones, 0), - DEFINE_PROP_UINT32("zoned.descr_ext_size", NvmeNamespace, - params.zd_extension_size, 0), DEFINE_PROP_UINT32("zoned.numzrwa", NvmeNamespace, params.numzrwa, 0), DEFINE_PROP_SIZE("zoned.zrwas", NvmeNamespace, params.zrwas, 0), DEFINE_PROP_SIZE("zoned.zrwafg", NvmeNamespace, params.zrwafg, -1), diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h index 76677a86e9..37007952fc 100644 --- a/hw/nvme/nvme.h +++ b/hw/nvme/nvme.h @@ -192,7 +192,6 @@ typedef struct NvmeNamespaceParams { bool cross_zone_read; uint32_t max_active_zones; uint32_t max_open_zones; - uint32_t zd_extension_size; uint32_t numzrwa; uint64_t zrwas; @@ -315,12 +314,6 @@ static inline bool nvme_wp_is_valid(NvmeZone *zone) st != NVME_ZONE_STATE_OFFLINE; } -static inline uint8_t *nvme_get_zd_extension(NvmeNamespace *ns, - uint32_t zone_idx) -{ - return &ns->zd_extensions[zone_idx * ns->params.zd_extension_size]; -} - static inline void nvme_aor_inc_open(NvmeNamespace *ns) { assert(ns->nr_open_zones >= 0); diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-io.h index 44e44954fa..ab388801b1 100644 --- a/include/sysemu/block-backend-io.h +++ b/include/sysemu/block-backend-io.h @@ -108,6 +108,8 @@ uint32_t blk_get_max_append_sectors(BlockBackend *blk); uint32_t blk_get_nr_zones(BlockBackend *blk); uint32_t blk_get_write_granularity(BlockBackend *blk); BlockZoneWps *blk_get_zone_wps(BlockBackend *blk); +uint8_t *blk_get_zone_extension(BlockBackend *blk); +uint32_t blk_get_zd_ext_size(BlockBackend *blk); AioContext *blk_get_aio_context(BlockBackend *blk); BlockAcctStats *blk_get_stats(BlockBackend *blk); From patchwork Mon Nov 27 08:56:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sam Li X-Patchwork-Id: 13469271 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 869D8C4167B for ; Mon, 27 Nov 2023 09:00:06 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1r7XRd-0002rM-DH; Mon, 27 Nov 2023 03:58:22 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1r7XRX-0002oB-Ac; Mon, 27 Nov 2023 03:58:17 -0500 Received: from mail-pj1-x102b.google.com ([2607:f8b0:4864:20::102b]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1r7XRS-0000OS-85; Mon, 27 Nov 2023 03:58:14 -0500 Received: by mail-pj1-x102b.google.com with SMTP id 98e67ed59e1d1-285abb82925so853689a91.3; Mon, 27 Nov 2023 00:58:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701075488; x=1701680288; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=93ax9FinCJPHwVOLQ6sECwX9Kg7aVNcGSVjy3aRgKPw=; b=WKkPmwsHZENSRR959icE0LtfdOp0TDfJpPSSKLoBKCsSZT+xg2v3lHkiUzkEz5HW5g S78BBFoUpMINf1RPQNe8xBQHnc2CEQrXiiGh3vlGsh7Ru+rrmLc3iYleU7v1nb3Nsswi fZYyHn12HtpoTAkztxlOyXdiblDlIFL/e8ex0qcexG9KtIKxQ+8Ao2IrFi3DL/xCtVwB a62NJubllmuY9E0sVQKijTccOg7r0YTIDPQczNbprQvfA4tcfnEP0rxTOzm9hVklewAM 8mDKoBl7S5mRYi9qsarlH9W0EtrNrVuZc9wxz+fwvFTzH86cjEK9I82I0U8vqpH1mnI9 uuaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701075488; x=1701680288; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=93ax9FinCJPHwVOLQ6sECwX9Kg7aVNcGSVjy3aRgKPw=; b=SV4r/Pz4Fot2pFs0D1R4Io5PzxIapZKMA7MC+ck1KSwY04XUvmrTvlMA1/oPOMTHK+ lGjKef8L30n7IAcTM62jVIcqLjBlB1NMQ6wXOsubdhQvHqwAJZItWd1e9Tp8vIZDsaIS 4v0AEo0SKXuURdFQV+fX15PT1ICznkGkTCaxfvR+rQhDBOzuJrebBRQi35aP1F23c8Ba R4Tnq3T/CwqDoFvgN7KSP99xuNBGHEtgx7Gm8Lt7/KZMQ/l/ljUSWWthMv4LfVmWkhpf fR4gkO8E8dR8CYUKM9dNtyPQMlQsq3y6/wraDDHbWzd6VoikLOJoq+ma0neFlf+slktx DAgQ== X-Gm-Message-State: AOJu0Yxci46lN94QX388gUrcwl0bhXfBEqESaoxC3I9+rPUNE/MLsiL9 Ru3KkdwJFQ/oBBNvhMEgusgo0F17xp3fNw== X-Google-Smtp-Source: AGHT+IEyhx1I+ezRKvHS0OAyPBn1cXCzcodLzCMWr/ODbz1h3X8A6fFGCKDtDgtxAfbhTYOzc4k7ug== X-Received: by 2002:a17:90a:d982:b0:285:3444:94e7 with SMTP id d2-20020a17090ad98200b00285344494e7mr10210187pjv.28.1701075487329; Mon, 27 Nov 2023 00:58:07 -0800 (PST) Received: from fedlinux.. ([106.84.128.244]) by smtp.gmail.com with ESMTPSA id ci8-20020a17090afc8800b0027d0adf653bsm6906901pjb.7.2023.11.27.00.57.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 27 Nov 2023 00:58:06 -0800 (PST) From: Sam Li To: qemu-devel@nongnu.org Cc: stefanha@redhat.com, Klaus Jensen , qemu-block@nongnu.org, hare@suse.de, David Hildenbrand , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Keith Busch , Hanna Reitz , dmitry.fomichev@wdc.com, Kevin Wolf , Markus Armbruster , Eric Blake , Peter Xu , Paolo Bonzini , dlemoal@kernel.org, Sam Li Subject: [RFC v2 5/7] hw/nvme: make the metadata of ZNS emulation persistent Date: Mon, 27 Nov 2023 16:56:39 +0800 Message-Id: <20231127085641.3729-6-faithilikerun@gmail.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20231127085641.3729-1-faithilikerun@gmail.com> References: <20231127085641.3729-1-faithilikerun@gmail.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::102b; envelope-from=faithilikerun@gmail.com; helo=mail-pj1-x102b.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org The NVMe ZNS devices follow NVMe ZNS spec but the state of namespace zones does not persist accross restarts of QEMU. This patch makes the metadata of ZNS emulation persistent by using new block layer APIs. The ZNS device calls zone report and zone mgmt APIs from the block layer which will handle zone state transition and manage zone resources. Signed-off-by: Sam Li --- block/qcow2.c | 3 + hw/nvme/ctrl.c | 1106 +++++++----------------------- hw/nvme/ns.c | 77 +-- hw/nvme/nvme.h | 85 +-- include/block/block-common.h | 8 + include/block/block_int-common.h | 2 + 6 files changed, 264 insertions(+), 1017 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index 75dff27216..dfaf5566e2 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -5043,6 +5043,9 @@ static int coroutine_fn qcow2_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op, case BLK_ZO_RESET: ret = qcow2_reset_zone(bs, index, len); break; + case BLK_ZO_OFFLINE: + /* There are no transitions from the offline state to any other state */ + break; default: error_report("Unsupported zone op: 0x%x", op); ret = -ENOTSUP; diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index dae6f00e4f..b9ed3495e1 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -372,67 +372,6 @@ static inline bool nvme_parse_pid(NvmeNamespace *ns, uint16_t pid, return nvme_ph_valid(ns, *ph) && nvme_rg_valid(ns->endgrp, *rg); } -static void nvme_assign_zone_state(NvmeNamespace *ns, NvmeZone *zone, - NvmeZoneState state) -{ - if (QTAILQ_IN_USE(zone, entry)) { - switch (nvme_get_zone_state(zone)) { - case NVME_ZONE_STATE_EXPLICITLY_OPEN: - QTAILQ_REMOVE(&ns->exp_open_zones, zone, entry); - break; - case NVME_ZONE_STATE_IMPLICITLY_OPEN: - QTAILQ_REMOVE(&ns->imp_open_zones, zone, entry); - break; - case NVME_ZONE_STATE_CLOSED: - QTAILQ_REMOVE(&ns->closed_zones, zone, entry); - break; - case NVME_ZONE_STATE_FULL: - QTAILQ_REMOVE(&ns->full_zones, zone, entry); - default: - ; - } - } - - nvme_set_zone_state(zone, state); - - switch (state) { - case NVME_ZONE_STATE_EXPLICITLY_OPEN: - QTAILQ_INSERT_TAIL(&ns->exp_open_zones, zone, entry); - break; - case NVME_ZONE_STATE_IMPLICITLY_OPEN: - QTAILQ_INSERT_TAIL(&ns->imp_open_zones, zone, entry); - break; - case NVME_ZONE_STATE_CLOSED: - QTAILQ_INSERT_TAIL(&ns->closed_zones, zone, entry); - break; - case NVME_ZONE_STATE_FULL: - QTAILQ_INSERT_TAIL(&ns->full_zones, zone, entry); - case NVME_ZONE_STATE_READ_ONLY: - break; - default: - zone->d.za = 0; - } -} - -static uint16_t nvme_zns_check_resources(NvmeNamespace *ns, uint32_t act, - uint32_t opn, uint32_t zrwa) -{ - if (zrwa > ns->zns.numzrwa) { - return NVME_NOZRWA | NVME_DNR; - } - - return NVME_SUCCESS; -} - -/* - * Check if we can open a zone without exceeding open/active limits. - * AOR stands for "Active and Open Resources" (see TP 4053 section 2.5). - */ -static uint16_t nvme_aor_check(NvmeNamespace *ns, uint32_t act, uint32_t opn) -{ - return nvme_zns_check_resources(ns, act, opn, 0); -} - static NvmeFdpEvent *nvme_fdp_alloc_event(NvmeCtrl *n, NvmeFdpEventBuffer *ebuf) { NvmeFdpEvent *ret = NULL; @@ -1769,346 +1708,11 @@ static inline uint32_t nvme_zone_idx(NvmeNamespace *ns, uint64_t slba) slba / ns->zone_size; } -static inline NvmeZone *nvme_get_zone_by_slba(NvmeNamespace *ns, uint64_t slba) -{ - uint32_t zone_idx = nvme_zone_idx(ns, slba); - - if (zone_idx >= ns->num_zones) { - return NULL; - } - - return &ns->zone_array[zone_idx]; -} - -static uint16_t nvme_check_zone_state_for_write(NvmeZone *zone) -{ - uint64_t zslba = zone->d.zslba; - - switch (nvme_get_zone_state(zone)) { - case NVME_ZONE_STATE_EMPTY: - case NVME_ZONE_STATE_IMPLICITLY_OPEN: - case NVME_ZONE_STATE_EXPLICITLY_OPEN: - case NVME_ZONE_STATE_CLOSED: - return NVME_SUCCESS; - case NVME_ZONE_STATE_FULL: - trace_pci_nvme_err_zone_is_full(zslba); - return NVME_ZONE_FULL; - case NVME_ZONE_STATE_OFFLINE: - trace_pci_nvme_err_zone_is_offline(zslba); - return NVME_ZONE_OFFLINE; - case NVME_ZONE_STATE_READ_ONLY: - trace_pci_nvme_err_zone_is_read_only(zslba); - return NVME_ZONE_READ_ONLY; - default: - assert(false); - } - - return NVME_INTERNAL_DEV_ERROR; -} - -static uint16_t nvme_check_zone_write(NvmeNamespace *ns, NvmeZone *zone, - uint64_t slba, uint32_t nlb) -{ - uint64_t zcap = nvme_zone_wr_boundary(zone); - uint16_t status; - - status = nvme_check_zone_state_for_write(zone); - if (status) { - return status; - } - - if (zone->d.za & NVME_ZA_ZRWA_VALID) { - uint64_t ezrwa = zone->w_ptr + 2 * ns->zns.zrwas; - - if (slba < zone->w_ptr || slba + nlb > ezrwa) { - trace_pci_nvme_err_zone_invalid_write(slba, zone->w_ptr); - return NVME_ZONE_INVALID_WRITE; - } - } else { - if (unlikely(slba != zone->w_ptr)) { - trace_pci_nvme_err_write_not_at_wp(slba, zone->d.zslba, - zone->w_ptr); - return NVME_ZONE_INVALID_WRITE; - } - } - - if (unlikely((slba + nlb) > zcap)) { - trace_pci_nvme_err_zone_boundary(slba, nlb, zcap); - return NVME_ZONE_BOUNDARY_ERROR; - } - - return NVME_SUCCESS; -} - -static uint16_t nvme_check_zone_state_for_read(NvmeZone *zone) -{ - switch (nvme_get_zone_state(zone)) { - case NVME_ZONE_STATE_EMPTY: - case NVME_ZONE_STATE_IMPLICITLY_OPEN: - case NVME_ZONE_STATE_EXPLICITLY_OPEN: - case NVME_ZONE_STATE_FULL: - case NVME_ZONE_STATE_CLOSED: - case NVME_ZONE_STATE_READ_ONLY: - return NVME_SUCCESS; - case NVME_ZONE_STATE_OFFLINE: - trace_pci_nvme_err_zone_is_offline(zone->d.zslba); - return NVME_ZONE_OFFLINE; - default: - assert(false); - } - - return NVME_INTERNAL_DEV_ERROR; -} - -static uint16_t nvme_check_zone_read(NvmeNamespace *ns, uint64_t slba, - uint32_t nlb) -{ - NvmeZone *zone; - uint64_t bndry, end; - uint16_t status; - - zone = nvme_get_zone_by_slba(ns, slba); - assert(zone); - - bndry = nvme_zone_rd_boundary(ns, zone); - end = slba + nlb; - - status = nvme_check_zone_state_for_read(zone); - if (status) { - ; - } else if (unlikely(end > bndry)) { - if (!ns->params.cross_zone_read) { - status = NVME_ZONE_BOUNDARY_ERROR; - } else { - /* - * Read across zone boundary - check that all subsequent - * zones that are being read have an appropriate state. - */ - do { - zone++; - status = nvme_check_zone_state_for_read(zone); - if (status) { - break; - } - } while (end > nvme_zone_rd_boundary(ns, zone)); - } - } - - return status; -} - -static uint16_t nvme_zrm_finish(NvmeNamespace *ns, NvmeZone *zone) -{ - switch (nvme_get_zone_state(zone)) { - case NVME_ZONE_STATE_FULL: - return NVME_SUCCESS; - - case NVME_ZONE_STATE_IMPLICITLY_OPEN: - case NVME_ZONE_STATE_EXPLICITLY_OPEN: - nvme_aor_dec_open(ns); - /* fallthrough */ - case NVME_ZONE_STATE_CLOSED: - nvme_aor_dec_active(ns); - - if (zone->d.za & NVME_ZA_ZRWA_VALID) { - zone->d.za &= ~NVME_ZA_ZRWA_VALID; - if (ns->params.numzrwa) { - ns->zns.numzrwa++; - } - } - - /* fallthrough */ - case NVME_ZONE_STATE_EMPTY: - nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_FULL); - return NVME_SUCCESS; - - default: - return NVME_ZONE_INVAL_TRANSITION; - } -} - -static uint16_t nvme_zrm_close(NvmeNamespace *ns, NvmeZone *zone) -{ - switch (nvme_get_zone_state(zone)) { - case NVME_ZONE_STATE_EXPLICITLY_OPEN: - case NVME_ZONE_STATE_IMPLICITLY_OPEN: - nvme_aor_dec_open(ns); - nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_CLOSED); - /* fall through */ - case NVME_ZONE_STATE_CLOSED: - return NVME_SUCCESS; - - default: - return NVME_ZONE_INVAL_TRANSITION; - } -} - -static uint16_t nvme_zrm_reset(NvmeNamespace *ns, NvmeZone *zone) -{ - switch (nvme_get_zone_state(zone)) { - case NVME_ZONE_STATE_EXPLICITLY_OPEN: - case NVME_ZONE_STATE_IMPLICITLY_OPEN: - nvme_aor_dec_open(ns); - /* fallthrough */ - case NVME_ZONE_STATE_CLOSED: - nvme_aor_dec_active(ns); - - if (zone->d.za & NVME_ZA_ZRWA_VALID) { - if (ns->params.numzrwa) { - ns->zns.numzrwa++; - } - } - - /* fallthrough */ - case NVME_ZONE_STATE_FULL: - zone->w_ptr = zone->d.zslba; - zone->d.wp = zone->w_ptr; - nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_EMPTY); - /* fallthrough */ - case NVME_ZONE_STATE_EMPTY: - return NVME_SUCCESS; - - default: - return NVME_ZONE_INVAL_TRANSITION; - } -} - -static void nvme_zrm_auto_transition_zone(NvmeNamespace *ns) -{ - NvmeZone *zone; - int moz = blk_get_max_open_zones(ns->blkconf.blk); - - if (moz && ns->nr_open_zones == moz) { - zone = QTAILQ_FIRST(&ns->imp_open_zones); - if (zone) { - /* - * Automatically close this implicitly open zone. - */ - QTAILQ_REMOVE(&ns->imp_open_zones, zone, entry); - nvme_zrm_close(ns, zone); - } - } -} - enum { NVME_ZRM_AUTO = 1 << 0, NVME_ZRM_ZRWA = 1 << 1, }; -static uint16_t nvme_zrm_open_flags(NvmeCtrl *n, NvmeNamespace *ns, - NvmeZone *zone, int flags) -{ - int act = 0; - uint16_t status; - - switch (nvme_get_zone_state(zone)) { - case NVME_ZONE_STATE_EMPTY: - act = 1; - - /* fallthrough */ - - case NVME_ZONE_STATE_CLOSED: - if (n->params.auto_transition_zones) { - nvme_zrm_auto_transition_zone(ns); - } - status = nvme_zns_check_resources(ns, act, 1, - (flags & NVME_ZRM_ZRWA) ? 1 : 0); - if (status) { - return status; - } - - if (act) { - nvme_aor_inc_active(ns); - } - - nvme_aor_inc_open(ns); - - if (flags & NVME_ZRM_AUTO) { - nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_IMPLICITLY_OPEN); - return NVME_SUCCESS; - } - - /* fallthrough */ - - case NVME_ZONE_STATE_IMPLICITLY_OPEN: - if (flags & NVME_ZRM_AUTO) { - return NVME_SUCCESS; - } - - nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_EXPLICITLY_OPEN); - - /* fallthrough */ - - case NVME_ZONE_STATE_EXPLICITLY_OPEN: - if (flags & NVME_ZRM_ZRWA) { - ns->zns.numzrwa--; - - zone->d.za |= NVME_ZA_ZRWA_VALID; - } - - return NVME_SUCCESS; - - default: - return NVME_ZONE_INVAL_TRANSITION; - } -} - -static inline uint16_t nvme_zrm_auto(NvmeCtrl *n, NvmeNamespace *ns, - NvmeZone *zone) -{ - return nvme_zrm_open_flags(n, ns, zone, NVME_ZRM_AUTO); -} - -static void nvme_advance_zone_wp(NvmeNamespace *ns, NvmeZone *zone, - uint32_t nlb) -{ - zone->d.wp += nlb; - - if (zone->d.wp == nvme_zone_wr_boundary(zone)) { - nvme_zrm_finish(ns, zone); - } -} - -static void nvme_zoned_zrwa_implicit_flush(NvmeNamespace *ns, NvmeZone *zone, - uint32_t nlbc) -{ - uint16_t nzrwafgs = DIV_ROUND_UP(nlbc, ns->zns.zrwafg); - - nlbc = nzrwafgs * ns->zns.zrwafg; - - trace_pci_nvme_zoned_zrwa_implicit_flush(zone->d.zslba, nlbc); - - zone->w_ptr += nlbc; - - nvme_advance_zone_wp(ns, zone, nlbc); -} - -static void nvme_finalize_zoned_write(NvmeNamespace *ns, NvmeRequest *req) -{ - NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd; - NvmeZone *zone; - uint64_t slba; - uint32_t nlb; - - slba = le64_to_cpu(rw->slba); - nlb = le16_to_cpu(rw->nlb) + 1; - zone = nvme_get_zone_by_slba(ns, slba); - assert(zone); - - if (zone->d.za & NVME_ZA_ZRWA_VALID) { - uint64_t ezrwa = zone->w_ptr + ns->zns.zrwas - 1; - uint64_t elba = slba + nlb - 1; - - if (elba > ezrwa) { - nvme_zoned_zrwa_implicit_flush(ns, zone, elba - ezrwa); - } - - return; - } - - nvme_advance_zone_wp(ns, zone, nlb); -} - static inline bool nvme_is_write(NvmeRequest *req) { NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd; @@ -2148,10 +1752,6 @@ void nvme_rw_complete_cb(void *opaque, int ret) block_acct_done(stats, acct); } - if (blk_get_zone_model(blk) && nvme_is_write(req)) { - nvme_finalize_zoned_write(ns, req); - } - nvme_enqueue_req_completion(nvme_cq(req), req); } @@ -2856,8 +2456,6 @@ static inline uint16_t nvme_check_copy_mcl(NvmeNamespace *ns, static void nvme_copy_out_completed_cb(void *opaque, int ret) { NvmeCopyAIOCB *iocb = opaque; - NvmeRequest *req = iocb->req; - NvmeNamespace *ns = req->ns; uint32_t nlb; nvme_copy_source_range_parse(iocb->ranges, iocb->idx, iocb->format, NULL, @@ -2870,10 +2468,6 @@ static void nvme_copy_out_completed_cb(void *opaque, int ret) goto out; } - if (blk_get_zone_model(ns->blkconf.blk)) { - nvme_advance_zone_wp(ns, iocb->zone, nlb); - } - iocb->idx++; iocb->slba += nlb; out: @@ -2982,17 +2576,6 @@ static void nvme_copy_in_completed_cb(void *opaque, int ret) goto invalid; } - if (blk_get_zone_model(ns->blkconf.blk)) { - status = nvme_check_zone_write(ns, iocb->zone, iocb->slba, nlb); - if (status) { - goto invalid; - } - - if (!(iocb->zone->d.za & NVME_ZA_ZRWA_VALID)) { - iocb->zone->w_ptr += nlb; - } - } - qemu_iovec_reset(&iocb->iov); qemu_iovec_add(&iocb->iov, iocb->bounce, len); @@ -3076,13 +2659,6 @@ static void nvme_do_copy(NvmeCopyAIOCB *iocb) } } - if (blk_get_zone_model(ns->blkconf.blk)) { - status = nvme_check_zone_read(ns, slba, nlb); - if (status) { - goto invalid; - } - } - qemu_iovec_reset(&iocb->iov); qemu_iovec_add(&iocb->iov, iocb->bounce, len); @@ -3152,19 +2728,6 @@ static uint16_t nvme_copy(NvmeCtrl *n, NvmeRequest *req) iocb->slba = le64_to_cpu(copy->sdlba); - if (blk_get_zone_model(ns->blkconf.blk)) { - iocb->zone = nvme_get_zone_by_slba(ns, iocb->slba); - if (!iocb->zone) { - status = NVME_LBA_RANGE | NVME_DNR; - goto invalid; - } - - status = nvme_zrm_auto(n, ns, iocb->zone); - if (status) { - goto invalid; - } - } - status = nvme_check_copy_mcl(ns, iocb, nr); if (status) { goto invalid; @@ -3422,14 +2985,6 @@ static uint16_t nvme_read(NvmeCtrl *n, NvmeRequest *req) goto invalid; } - if (blk_get_zone_model(blk)) { - status = nvme_check_zone_read(ns, slba, nlb); - if (status) { - trace_pci_nvme_err_zone_read_not_ok(slba, nlb, status); - goto invalid; - } - } - if (NVME_ERR_REC_DULBE(ns->features.err_rec)) { status = nvme_check_dulbe(ns, slba, nlb); if (status) { @@ -3505,8 +3060,6 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest *req, bool append, uint64_t data_size = nvme_l2b(ns, nlb); uint64_t mapped_size = data_size; uint64_t data_offset; - NvmeZone *zone; - NvmeZonedResult *res = (NvmeZonedResult *)&req->cqe; BlockBackend *blk = ns->blkconf.blk; uint16_t status; @@ -3538,32 +3091,20 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest *req, bool append, } if (blk_get_zone_model(blk)) { - zone = nvme_get_zone_by_slba(ns, slba); - assert(zone); + uint32_t zone_size = blk_get_zone_size(blk); + uint32_t zone_idx = slba / zone_size; + int64_t zone_start = zone_idx * zone_size; if (append) { bool piremap = !!(ctrl & NVME_RW_PIREMAP); - if (unlikely(zone->d.za & NVME_ZA_ZRWA_VALID)) { - return NVME_INVALID_ZONE_OP | NVME_DNR; - } - - if (unlikely(slba != zone->d.zslba)) { - trace_pci_nvme_err_append_not_at_start(slba, zone->d.zslba); - status = NVME_INVALID_FIELD; - goto invalid; - } - if (n->params.zasl && data_size > (uint64_t)n->page_size << n->params.zasl) { trace_pci_nvme_err_zasl(data_size); return NVME_INVALID_FIELD | NVME_DNR; } - slba = zone->w_ptr; rw->slba = cpu_to_le64(slba); - res->slba = cpu_to_le64(slba); - switch (NVME_ID_NS_DPS_TYPE(ns->id_ns.dps)) { case NVME_ID_NS_DPS_TYPE_1: if (!piremap) { @@ -3575,7 +3116,7 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest *req, bool append, case NVME_ID_NS_DPS_TYPE_2: if (piremap) { uint32_t reftag = le32_to_cpu(rw->reftag); - rw->reftag = cpu_to_le32(reftag + (slba - zone->d.zslba)); + rw->reftag = cpu_to_le32(reftag + (slba - zone_start)); } break; @@ -3589,19 +3130,6 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest *req, bool append, } } - status = nvme_check_zone_write(ns, zone, slba, nlb); - if (status) { - goto invalid; - } - - status = nvme_zrm_auto(n, ns, zone); - if (status) { - goto invalid; - } - - if (!(zone->d.za & NVME_ZA_ZRWA_VALID)) { - zone->w_ptr += nlb; - } } else if (ns->endgrp && ns->endgrp->fdp.enabled) { nvme_do_write_fdp(n, req, slba, nlb); } @@ -3644,6 +3172,23 @@ static inline uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeRequest *req) return nvme_do_write(n, req, false, true); } +typedef struct NvmeZoneCmdAIOCB { + NvmeRequest *req; + NvmeCmd *cmd; + NvmeCtrl *n; + + union { + struct { + uint32_t partial; + unsigned int nr_zones; + BlockZoneDescriptor *zones; + } zone_report_data; + struct { + int64_t offset; + } zone_append_data; + }; +} NvmeZoneCmdAIOCB; + static inline uint16_t nvme_zone_append(NvmeCtrl *n, NvmeRequest *req) { return nvme_do_write(n, req, true, false); @@ -3655,7 +3200,7 @@ static uint16_t nvme_get_mgmt_zone_slba_idx(NvmeNamespace *ns, NvmeCmd *c, uint32_t dw10 = le32_to_cpu(c->cdw10); uint32_t dw11 = le32_to_cpu(c->cdw11); - if (blk_get_zone_model(ns->blkconf.blk)) { + if (!blk_get_zone_model(ns->blkconf.blk)) { trace_pci_nvme_err_invalid_opc(c->opcode); return NVME_INVALID_OPCODE | NVME_DNR; } @@ -3673,198 +3218,21 @@ static uint16_t nvme_get_mgmt_zone_slba_idx(NvmeNamespace *ns, NvmeCmd *c, return NVME_SUCCESS; } -typedef uint16_t (*op_handler_t)(NvmeNamespace *, NvmeZone *, NvmeZoneState, - NvmeRequest *); - -enum NvmeZoneProcessingMask { - NVME_PROC_CURRENT_ZONE = 0, - NVME_PROC_OPENED_ZONES = 1 << 0, - NVME_PROC_CLOSED_ZONES = 1 << 1, - NVME_PROC_READ_ONLY_ZONES = 1 << 2, - NVME_PROC_FULL_ZONES = 1 << 3, -}; - -static uint16_t nvme_open_zone(NvmeNamespace *ns, NvmeZone *zone, - NvmeZoneState state, NvmeRequest *req) -{ - NvmeZoneSendCmd *cmd = (NvmeZoneSendCmd *)&req->cmd; - int flags = 0; - - if (cmd->zsflags & NVME_ZSFLAG_ZRWA_ALLOC) { - uint16_t ozcs = le16_to_cpu(ns->id_ns_zoned->ozcs); - - if (!(ozcs & NVME_ID_NS_ZONED_OZCS_ZRWASUP)) { - return NVME_INVALID_ZONE_OP | NVME_DNR; - } - - if (zone->w_ptr % ns->zns.zrwafg) { - return NVME_NOZRWA | NVME_DNR; - } - - flags = NVME_ZRM_ZRWA; - } - - return nvme_zrm_open_flags(nvme_ctrl(req), ns, zone, flags); -} - -static uint16_t nvme_close_zone(NvmeNamespace *ns, NvmeZone *zone, - NvmeZoneState state, NvmeRequest *req) -{ - return nvme_zrm_close(ns, zone); -} - -static uint16_t nvme_finish_zone(NvmeNamespace *ns, NvmeZone *zone, - NvmeZoneState state, NvmeRequest *req) -{ - return nvme_zrm_finish(ns, zone); -} - -static uint16_t nvme_offline_zone(NvmeNamespace *ns, NvmeZone *zone, - NvmeZoneState state, NvmeRequest *req) -{ - switch (state) { - case NVME_ZONE_STATE_READ_ONLY: - nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_OFFLINE); - /* fall through */ - case NVME_ZONE_STATE_OFFLINE: - return NVME_SUCCESS; - default: - return NVME_ZONE_INVAL_TRANSITION; - } -} - -static uint16_t nvme_set_zd_ext(NvmeNamespace *ns, NvmeZone *zone) -{ - uint16_t status; - uint8_t state = nvme_get_zone_state(zone); - - if (state == NVME_ZONE_STATE_EMPTY) { - status = nvme_aor_check(ns, 1, 0); - if (status) { - return status; - } - nvme_aor_inc_active(ns); - zone->d.za |= NVME_ZA_ZD_EXT_VALID; - nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_CLOSED); - return NVME_SUCCESS; - } - - return NVME_ZONE_INVAL_TRANSITION; -} - -static uint16_t nvme_bulk_proc_zone(NvmeNamespace *ns, NvmeZone *zone, - enum NvmeZoneProcessingMask proc_mask, - op_handler_t op_hndlr, NvmeRequest *req) -{ - uint16_t status = NVME_SUCCESS; - NvmeZoneState zs = nvme_get_zone_state(zone); - bool proc_zone; - - switch (zs) { - case NVME_ZONE_STATE_IMPLICITLY_OPEN: - case NVME_ZONE_STATE_EXPLICITLY_OPEN: - proc_zone = proc_mask & NVME_PROC_OPENED_ZONES; - break; - case NVME_ZONE_STATE_CLOSED: - proc_zone = proc_mask & NVME_PROC_CLOSED_ZONES; - break; - case NVME_ZONE_STATE_READ_ONLY: - proc_zone = proc_mask & NVME_PROC_READ_ONLY_ZONES; - break; - case NVME_ZONE_STATE_FULL: - proc_zone = proc_mask & NVME_PROC_FULL_ZONES; - break; - default: - proc_zone = false; - } - - if (proc_zone) { - status = op_hndlr(ns, zone, zs, req); - } - - return status; -} - -static uint16_t nvme_do_zone_op(NvmeNamespace *ns, NvmeZone *zone, - enum NvmeZoneProcessingMask proc_mask, - op_handler_t op_hndlr, NvmeRequest *req) -{ - NvmeZone *next; - uint16_t status = NVME_SUCCESS; - int i; - - if (!proc_mask) { - status = op_hndlr(ns, zone, nvme_get_zone_state(zone), req); - } else { - if (proc_mask & NVME_PROC_CLOSED_ZONES) { - QTAILQ_FOREACH_SAFE(zone, &ns->closed_zones, entry, next) { - status = nvme_bulk_proc_zone(ns, zone, proc_mask, op_hndlr, - req); - if (status && status != NVME_NO_COMPLETE) { - goto out; - } - } - } - if (proc_mask & NVME_PROC_OPENED_ZONES) { - QTAILQ_FOREACH_SAFE(zone, &ns->imp_open_zones, entry, next) { - status = nvme_bulk_proc_zone(ns, zone, proc_mask, op_hndlr, - req); - if (status && status != NVME_NO_COMPLETE) { - goto out; - } - } - - QTAILQ_FOREACH_SAFE(zone, &ns->exp_open_zones, entry, next) { - status = nvme_bulk_proc_zone(ns, zone, proc_mask, op_hndlr, - req); - if (status && status != NVME_NO_COMPLETE) { - goto out; - } - } - } - if (proc_mask & NVME_PROC_FULL_ZONES) { - QTAILQ_FOREACH_SAFE(zone, &ns->full_zones, entry, next) { - status = nvme_bulk_proc_zone(ns, zone, proc_mask, op_hndlr, - req); - if (status && status != NVME_NO_COMPLETE) { - goto out; - } - } - } - - if (proc_mask & NVME_PROC_READ_ONLY_ZONES) { - for (i = 0; i < ns->num_zones; i++, zone++) { - status = nvme_bulk_proc_zone(ns, zone, proc_mask, op_hndlr, - req); - if (status && status != NVME_NO_COMPLETE) { - goto out; - } - } - } - } - -out: - return status; -} - -typedef struct NvmeZoneResetAIOCB { +typedef struct NvmeZoneMgmtAIOCB { BlockAIOCB common; BlockAIOCB *aiocb; NvmeRequest *req; int ret; bool all; - int idx; - NvmeZone *zone; -} NvmeZoneResetAIOCB; + uint64_t offset; + uint64_t len; + BlockZoneOp op; +} NvmeZoneMgmtAIOCB; -static void nvme_zone_reset_cancel(BlockAIOCB *aiocb) +static void nvme_zone_mgmt_send_cancel(BlockAIOCB *aiocb) { - NvmeZoneResetAIOCB *iocb = container_of(aiocb, NvmeZoneResetAIOCB, common); - NvmeRequest *req = iocb->req; - NvmeNamespace *ns = req->ns; - - iocb->idx = ns->num_zones; + NvmeZoneMgmtAIOCB *iocb = container_of(aiocb, NvmeZoneMgmtAIOCB, common); iocb->ret = -ECANCELED; @@ -3874,117 +3242,66 @@ static void nvme_zone_reset_cancel(BlockAIOCB *aiocb) } } -static const AIOCBInfo nvme_zone_reset_aiocb_info = { - .aiocb_size = sizeof(NvmeZoneResetAIOCB), - .cancel_async = nvme_zone_reset_cancel, +static const AIOCBInfo nvme_zone_mgmt_aiocb_info = { + .aiocb_size = sizeof(NvmeZoneMgmtAIOCB), + .cancel_async = nvme_zone_mgmt_send_cancel, }; -static void nvme_zone_reset_cb(void *opaque, int ret); +static void nvme_zone_mgmt_send_cb(void *opaque, int ret); -static void nvme_zone_reset_epilogue_cb(void *opaque, int ret) +static void nvme_zone_mgmt_send_epilogue_cb(void *opaque, int ret) { - NvmeZoneResetAIOCB *iocb = opaque; - NvmeRequest *req = iocb->req; - NvmeNamespace *ns = req->ns; - int64_t moff; - int count; + NvmeZoneMgmtAIOCB *iocb = opaque; + NvmeNamespace *ns = iocb->req->ns; if (ret < 0 || iocb->ret < 0 || !ns->lbaf.ms) { - goto out; + iocb->ret = ret; + error_report("Invalid zone mgmt op %d", ret); + goto done; } - moff = nvme_moff(ns, iocb->zone->d.zslba); - count = nvme_m2b(ns, ns->zone_size); - - iocb->aiocb = blk_aio_pwrite_zeroes(ns->blkconf.blk, moff, count, - BDRV_REQ_MAY_UNMAP, - nvme_zone_reset_cb, iocb); return; -out: - nvme_zone_reset_cb(iocb, ret); +done: + iocb->aiocb = NULL; + iocb->common.cb(iocb->common.opaque, iocb->ret); + qemu_aio_unref(iocb); } -static void nvme_zone_reset_cb(void *opaque, int ret) +static void nvme_zone_mgmt_send_cb(void *opaque, int ret) { - NvmeZoneResetAIOCB *iocb = opaque; + NvmeZoneMgmtAIOCB *iocb = opaque; NvmeRequest *req = iocb->req; NvmeNamespace *ns = req->ns; + BlockBackend *blk = ns->blkconf.blk; - if (iocb->ret < 0) { - goto done; - } else if (ret < 0) { - iocb->ret = ret; - goto done; - } - - if (iocb->zone) { - nvme_zrm_reset(ns, iocb->zone); - - if (!iocb->all) { - goto done; - } - } - - while (iocb->idx < ns->num_zones) { - NvmeZone *zone = &ns->zone_array[iocb->idx++]; - - switch (nvme_get_zone_state(zone)) { - case NVME_ZONE_STATE_EMPTY: - if (!iocb->all) { - goto done; - } - - continue; - - case NVME_ZONE_STATE_EXPLICITLY_OPEN: - case NVME_ZONE_STATE_IMPLICITLY_OPEN: - case NVME_ZONE_STATE_CLOSED: - case NVME_ZONE_STATE_FULL: - iocb->zone = zone; - break; - - default: - continue; - } - - trace_pci_nvme_zns_zone_reset(zone->d.zslba); - - iocb->aiocb = blk_aio_pwrite_zeroes(ns->blkconf.blk, - nvme_l2b(ns, zone->d.zslba), - nvme_l2b(ns, ns->zone_size), - BDRV_REQ_MAY_UNMAP, - nvme_zone_reset_epilogue_cb, - iocb); - return; - } - -done: - iocb->aiocb = NULL; - - iocb->common.cb(iocb->common.opaque, iocb->ret); - qemu_aio_unref(iocb); + iocb->aiocb = blk_aio_zone_mgmt(blk, iocb->op, iocb->offset, + iocb->len, + nvme_zone_mgmt_send_epilogue_cb, iocb); + return; } -static uint16_t nvme_zone_mgmt_send_zrwa_flush(NvmeCtrl *n, NvmeZone *zone, +static uint16_t nvme_zone_mgmt_send_zrwa_flush(NvmeCtrl *n, uint32_t zidx, uint64_t elba, NvmeRequest *req) { NvmeNamespace *ns = req->ns; uint16_t ozcs = le16_to_cpu(ns->id_ns_zoned->ozcs); - uint64_t wp = zone->d.wp; - uint32_t nlb = elba - wp + 1; - uint16_t status; - + BlockZoneWps *wps = blk_get_zone_wps(ns->blkconf.blk); + uint64_t *wp = &wps->wp[zidx]; + uint64_t raw_wpv = BDRV_ZP_GET_WP(*wp); + uint8_t za = BDRV_ZP_GET_ZA(raw_wpv); + uint64_t wpv = BDRV_ZP_GET_WP(raw_wpv); + uint32_t nlb = elba - wpv + 1; if (!(ozcs & NVME_ID_NS_ZONED_OZCS_ZRWASUP)) { return NVME_INVALID_ZONE_OP | NVME_DNR; } - if (!(zone->d.za & NVME_ZA_ZRWA_VALID)) { + if (!(za & NVME_ZA_ZRWA_VALID)) { return NVME_INVALID_FIELD | NVME_DNR; } - if (elba < wp || elba > wp + ns->zns.zrwas) { + if (elba < wpv || elba > wpv + ns->zns.zrwas) { return NVME_ZONE_BOUNDARY_ERROR | NVME_DNR; } @@ -3992,37 +3309,36 @@ static uint16_t nvme_zone_mgmt_send_zrwa_flush(NvmeCtrl *n, NvmeZone *zone, return NVME_INVALID_FIELD | NVME_DNR; } - status = nvme_zrm_auto(n, ns, zone); - if (status) { - return status; - } - - zone->w_ptr += nlb; - - nvme_advance_zone_wp(ns, zone, nlb); + *wp += nlb; return NVME_SUCCESS; } static inline uint8_t *nvme_get_zd_extension(NvmeNamespace *ns, - uint32_t zone_idx) + uint32_t zone_idx) { return &ns->zd_extensions[zone_idx * blk_get_zd_ext_size(ns->blkconf.blk)]; } +#define BLK_ZO_UNSUP 0x22 static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, NvmeRequest *req) { NvmeZoneSendCmd *cmd = (NvmeZoneSendCmd *)&req->cmd; NvmeNamespace *ns = req->ns; - NvmeZone *zone; - NvmeZoneResetAIOCB *iocb; - uint8_t *zd_ext; + NvmeZoneMgmtAIOCB *iocb; uint64_t slba = 0; uint32_t zone_idx = 0; uint16_t status; uint8_t action = cmd->zsa; + uint8_t *zd_ext; + uint64_t offset, len; + BlockBackend *blk = ns->blkconf.blk; + uint32_t zone_size = blk_get_zone_size(blk); + uint64_t size = zone_size * blk_get_nr_zones(blk); + BlockZoneOp op = BLK_ZO_UNSUP; + /* support flag, true when the op is supported */ + bool flag = true; bool all; - enum NvmeZoneProcessingMask proc_mask = NVME_PROC_CURRENT_ZONE; all = cmd->zsflags & NVME_ZSFLAG_SELECT_ALL; @@ -4033,82 +3349,51 @@ static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, NvmeRequest *req) if (status) { return status; } - } - - zone = &ns->zone_array[zone_idx]; - if (slba != zone->d.zslba && action != NVME_ZONE_ACTION_ZRWA_FLUSH) { - trace_pci_nvme_err_unaligned_zone_cmd(action, slba, zone->d.zslba); - return NVME_INVALID_FIELD | NVME_DNR; + len = zone_size; + } else { + len = size; } switch (action) { case NVME_ZONE_ACTION_OPEN: - if (all) { - proc_mask = NVME_PROC_CLOSED_ZONES; - } + op = BLK_ZO_OPEN; trace_pci_nvme_open_zone(slba, zone_idx, all); - status = nvme_do_zone_op(ns, zone, proc_mask, nvme_open_zone, req); break; case NVME_ZONE_ACTION_CLOSE: - if (all) { - proc_mask = NVME_PROC_OPENED_ZONES; - } + op = BLK_ZO_CLOSE; trace_pci_nvme_close_zone(slba, zone_idx, all); - status = nvme_do_zone_op(ns, zone, proc_mask, nvme_close_zone, req); break; case NVME_ZONE_ACTION_FINISH: - if (all) { - proc_mask = NVME_PROC_OPENED_ZONES | NVME_PROC_CLOSED_ZONES; - } + op = BLK_ZO_FINISH; trace_pci_nvme_finish_zone(slba, zone_idx, all); - status = nvme_do_zone_op(ns, zone, proc_mask, nvme_finish_zone, req); break; case NVME_ZONE_ACTION_RESET: + op = BLK_ZO_RESET; trace_pci_nvme_reset_zone(slba, zone_idx, all); - - iocb = blk_aio_get(&nvme_zone_reset_aiocb_info, ns->blkconf.blk, - nvme_misc_cb, req); - - iocb->req = req; - iocb->ret = 0; - iocb->all = all; - iocb->idx = zone_idx; - iocb->zone = NULL; - - req->aiocb = &iocb->common; - nvme_zone_reset_cb(iocb, 0); - - return NVME_NO_COMPLETE; + break; case NVME_ZONE_ACTION_OFFLINE: - if (all) { - proc_mask = NVME_PROC_READ_ONLY_ZONES; - } + op = BLK_ZO_OFFLINE; trace_pci_nvme_offline_zone(slba, zone_idx, all); - status = nvme_do_zone_op(ns, zone, proc_mask, nvme_offline_zone, req); break; case NVME_ZONE_ACTION_SET_ZD_EXT: + int zd_ext_size = blk_get_zd_ext_size(blk); trace_pci_nvme_set_descriptor_extension(slba, zone_idx); - if (all || !blk_get_zd_ext_size(ns->blkconf.blk)) { + if (all || !zd_ext_size) { return NVME_INVALID_FIELD | NVME_DNR; } zd_ext = nvme_get_zd_extension(ns, zone_idx); - status = nvme_h2c(n, zd_ext, blk_get_zd_ext_size(ns->blkconf.blk), req); + status = nvme_h2c(n, zd_ext, zd_ext_size, req); if (status) { trace_pci_nvme_err_zd_extension_map_error(zone_idx); return status; } - - status = nvme_set_zd_ext(ns, zone); - if (status == NVME_SUCCESS) { - trace_pci_nvme_zd_extension_set(zone_idx); - return status; - } + trace_pci_nvme_zd_extension_set(zone_idx); break; case NVME_ZONE_ACTION_ZRWA_FLUSH: @@ -4116,16 +3401,34 @@ static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, NvmeRequest *req) return NVME_INVALID_FIELD | NVME_DNR; } - return nvme_zone_mgmt_send_zrwa_flush(n, zone, slba, req); + return nvme_zone_mgmt_send_zrwa_flush(n, zone_idx, slba, req); default: trace_pci_nvme_err_invalid_mgmt_action(action); status = NVME_INVALID_FIELD; } + if (flag && (op != BLK_ZO_UNSUP)) { + iocb = blk_aio_get(&nvme_zone_mgmt_aiocb_info, ns->blkconf.blk, + nvme_misc_cb, req); + iocb->req = req; + iocb->ret = 0; + iocb->all = all; + /* Convert it to bytes for accessing block layers */ + offset = nvme_l2b(ns, slba); + iocb->offset = offset; + iocb->len = len; + iocb->op = op; + + req->aiocb = &iocb->common; + nvme_zone_mgmt_send_cb(iocb, 0); + + return NVME_NO_COMPLETE; + } + if (status == NVME_ZONE_INVAL_TRANSITION) { trace_pci_nvme_err_invalid_zone_state_transition(action, slba, - zone->d.za); + TO_DO_ZA); } if (status) { status |= NVME_DNR; @@ -4134,50 +3437,144 @@ static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, NvmeRequest *req) return status; } -static bool nvme_zone_matches_filter(uint32_t zafs, NvmeZone *zl) +static bool nvme_zone_matches_filter(uint32_t zafs, BlockZoneState zs) { - NvmeZoneState zs = nvme_get_zone_state(zl); - switch (zafs) { case NVME_ZONE_REPORT_ALL: return true; case NVME_ZONE_REPORT_EMPTY: - return zs == NVME_ZONE_STATE_EMPTY; + return zs == BLK_ZS_EMPTY; case NVME_ZONE_REPORT_IMPLICITLY_OPEN: - return zs == NVME_ZONE_STATE_IMPLICITLY_OPEN; + return zs == BLK_ZS_IOPEN; case NVME_ZONE_REPORT_EXPLICITLY_OPEN: - return zs == NVME_ZONE_STATE_EXPLICITLY_OPEN; + return zs == BLK_ZS_EOPEN; case NVME_ZONE_REPORT_CLOSED: - return zs == NVME_ZONE_STATE_CLOSED; + return zs == BLK_ZS_CLOSED; case NVME_ZONE_REPORT_FULL: - return zs == NVME_ZONE_STATE_FULL; + return zs == BLK_ZS_FULL; case NVME_ZONE_REPORT_READ_ONLY: - return zs == NVME_ZONE_STATE_READ_ONLY; + return zs == BLK_ZS_RDONLY; case NVME_ZONE_REPORT_OFFLINE: - return zs == NVME_ZONE_STATE_OFFLINE; + return zs == BLK_ZS_OFFLINE; default: return false; } } +static void nvme_zone_mgmt_recv_completed_cb(void *opaque, int ret) +{ + NvmeZoneCmdAIOCB *iocb = opaque; + NvmeRequest *req = iocb->req; + NvmeCmd *cmd = iocb->cmd; + uint32_t dw13 = le32_to_cpu(cmd->cdw13); + int64_t zrp_size, j = 0; + uint32_t zrasf; + g_autofree void *buf = NULL; + void *buf_p; + NvmeZoneReportHeader *zrp_hdr; + uint64_t nz = iocb->zone_report_data.nr_zones; + BlockZoneDescriptor *in_zone = iocb->zone_report_data.zones; + NvmeZoneDescr *out_zone; + + if (ret < 0) { + error_report("Invalid zone recv %d", ret); + goto out; + } + + zrasf = (dw13 >> 8) & 0xff; + if (zrasf > NVME_ZONE_REPORT_OFFLINE) { + error_report("Nvme invalid field"); + return; + } + + zrp_size = sizeof(NvmeZoneReportHeader) + sizeof(NvmeZoneDescr) * nz; + buf = g_malloc0(zrp_size); + + zrp_hdr = buf; + zrp_hdr->nr_zones = cpu_to_le64(nz); + buf_p = buf + sizeof(NvmeZoneReportHeader); + + for (; j < nz; j++) { + out_zone = buf_p; + buf_p += sizeof(NvmeZoneDescr); + + BlockZoneState zs = in_zone[j].state; + if (!nvme_zone_matches_filter(zrasf, zs)) { + continue; + } + + *out_zone = (NvmeZoneDescr) { + .zslba = nvme_b2l(req->ns, in_zone[j].start), + .zcap = nvme_b2l(req->ns, in_zone[j].cap), + .wp = nvme_b2l(req->ns, in_zone[j].wp), + }; + + switch (in_zone[j].type) { + case BLK_ZT_CONV: + out_zone->zt = NVME_ZONE_TYPE_RESERVED; + break; + case BLK_ZT_SWR: + out_zone->zt = NVME_ZONE_TYPE_SEQ_WRITE; + break; + case BLK_ZT_SWP: + out_zone->zt = NVME_ZONE_TYPE_RESERVED; + break; + default: + g_assert_not_reached(); + } + + switch (zs) { + case BLK_ZS_RDONLY: + out_zone->zs = NVME_ZONE_STATE_READ_ONLY << 4; + break; + case BLK_ZS_OFFLINE: + out_zone->zs = NVME_ZONE_STATE_OFFLINE << 4; + break; + case BLK_ZS_EMPTY: + out_zone->zs = NVME_ZONE_STATE_EMPTY << 4; + break; + case BLK_ZS_CLOSED: + out_zone->zs = NVME_ZONE_STATE_CLOSED << 4; + break; + case BLK_ZS_FULL: + out_zone->zs = NVME_ZONE_STATE_FULL << 4; + break; + case BLK_ZS_EOPEN: + out_zone->zs = NVME_ZONE_STATE_EXPLICITLY_OPEN << 4; + break; + case BLK_ZS_IOPEN: + out_zone->zs = NVME_ZONE_STATE_IMPLICITLY_OPEN << 4; + break; + case BLK_ZS_NOT_WP: + out_zone->zs = NVME_ZONE_STATE_RESERVED << 4; + break; + default: + g_assert_not_reached(); + } + } + + nvme_c2h(iocb->n, (uint8_t *)buf, zrp_size, req); + +out: + g_free(iocb->zone_report_data.zones); + g_free(iocb); + return; +} + static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, NvmeRequest *req) { NvmeCmd *cmd = (NvmeCmd *)&req->cmd; NvmeNamespace *ns = req->ns; + BlockBackend *blk = ns->blkconf.blk; + NvmeZoneCmdAIOCB *iocb; /* cdw12 is zero-based number of dwords to return. Convert to bytes */ uint32_t data_size = (le32_to_cpu(cmd->cdw12) + 1) << 2; uint32_t dw13 = le32_to_cpu(cmd->cdw13); - uint32_t zone_idx, zra, zrasf, partial; - uint64_t max_zones, nr_zones = 0; + uint32_t zone_idx, zra, zrasf, partial, nr_zones; uint16_t status; uint64_t slba; - NvmeZoneDescr *z; - NvmeZone *zone; - NvmeZoneReportHeader *header; - void *buf, *buf_p; size_t zone_entry_sz; - int i; - + int64_t offset; req->status = NVME_SUCCESS; status = nvme_get_mgmt_zone_slba_idx(ns, cmd, &slba, &zone_idx); @@ -4208,64 +3605,31 @@ static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, NvmeRequest *req) return status; } - partial = (dw13 >> 16) & 0x01; - zone_entry_sz = sizeof(NvmeZoneDescr); if (zra == NVME_ZONE_REPORT_EXTENDED) { - zone_entry_sz += blk_get_zd_ext_size(ns->blkconf.blk) ; - } - - max_zones = (data_size - sizeof(NvmeZoneReportHeader)) / zone_entry_sz; - buf = g_malloc0(data_size); - - zone = &ns->zone_array[zone_idx]; - for (i = zone_idx; i < ns->num_zones; i++) { - if (partial && nr_zones >= max_zones) { - break; - } - if (nvme_zone_matches_filter(zrasf, zone++)) { - nr_zones++; - } + zone_entry_sz += blk_get_zd_ext_size(ns->blkconf.blk); } - header = buf; - header->nr_zones = cpu_to_le64(nr_zones); - - buf_p = buf + sizeof(NvmeZoneReportHeader); - for (; zone_idx < ns->num_zones && max_zones > 0; zone_idx++) { - zone = &ns->zone_array[zone_idx]; - if (nvme_zone_matches_filter(zrasf, zone)) { - z = buf_p; - buf_p += sizeof(NvmeZoneDescr); - - z->zt = zone->d.zt; - z->zs = zone->d.zs; - z->zcap = cpu_to_le64(zone->d.zcap); - z->zslba = cpu_to_le64(zone->d.zslba); - z->za = zone->d.za; - - if (nvme_wp_is_valid(zone)) { - z->wp = cpu_to_le64(zone->d.wp); - } else { - z->wp = cpu_to_le64(~0ULL); - } - if (zra == NVME_ZONE_REPORT_EXTENDED) { - int zd_ext_size = blk_get_zd_ext_size(ns->blkconf.blk); - if (zone->d.za & NVME_ZA_ZD_EXT_VALID) { - memcpy(buf_p, nvme_get_zd_extension(ns, zone_idx), - zd_ext_size); - } - buf_p += zd_ext_size; - } - - max_zones--; - } + offset = nvme_l2b(ns, slba); + nr_zones = (data_size - sizeof(NvmeZoneReportHeader)) / zone_entry_sz; + partial = (dw13 >> 16) & 0x01; + if (!partial) { + nr_zones = blk_get_nr_zones(blk); + offset = 0; } - status = nvme_c2h(n, (uint8_t *)buf, data_size, req); - - g_free(buf); - + iocb = g_malloc0(sizeof(NvmeZoneCmdAIOCB)); + iocb->req = req; + iocb->n = n; + iocb->cmd = cmd; + iocb->zone_report_data.nr_zones = nr_zones; + iocb->zone_report_data.zones = g_malloc0( + sizeof(BlockZoneDescriptor) * nr_zones); + + blk_aio_zone_report(blk, offset, + &iocb->zone_report_data.nr_zones, + iocb->zone_report_data.zones, + nvme_zone_mgmt_recv_completed_cb, iocb); return status; } diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c index 45c08391f5..63106a0f27 100644 --- a/hw/nvme/ns.c +++ b/hw/nvme/ns.c @@ -219,36 +219,10 @@ static int nvme_ns_zoned_check_calc_geometry(NvmeNamespace *ns, Error **errp) static void nvme_ns_zoned_init_state(NvmeNamespace *ns) { BlockBackend *blk = ns->blkconf.blk; - uint64_t start = 0, zone_size = ns->zone_size; - uint64_t capacity = ns->num_zones * zone_size; - NvmeZone *zone; - int i; - - ns->zone_array = g_new0(NvmeZone, ns->num_zones); if (blk_get_zone_extension(blk)) { ns->zd_extensions = blk_get_zone_extension(blk); } - QTAILQ_INIT(&ns->exp_open_zones); - QTAILQ_INIT(&ns->imp_open_zones); - QTAILQ_INIT(&ns->closed_zones); - QTAILQ_INIT(&ns->full_zones); - - zone = ns->zone_array; - for (i = 0; i < ns->num_zones; i++, zone++) { - if (start + zone_size > capacity) { - zone_size = capacity - start; - } - zone->d.zt = NVME_ZONE_TYPE_SEQ_WRITE; - nvme_set_zone_state(zone, NVME_ZONE_STATE_EMPTY); - zone->d.za = 0; - zone->d.zcap = ns->zone_capacity; - zone->d.zslba = start; - zone->d.wp = start; - zone->w_ptr = start; - start += zone_size; - } - ns->zone_size_log2 = 0; if (is_power_of_2(ns->zone_size)) { ns->zone_size_log2 = 63 - clz64(ns->zone_size); @@ -319,56 +293,12 @@ static void nvme_ns_init_zoned(NvmeNamespace *ns) ns->id_ns_zoned = id_ns_z; } -static void nvme_clear_zone(NvmeNamespace *ns, NvmeZone *zone) -{ - uint8_t state; - - zone->w_ptr = zone->d.wp; - state = nvme_get_zone_state(zone); - if (zone->d.wp != zone->d.zslba || - (zone->d.za & NVME_ZA_ZD_EXT_VALID)) { - if (state != NVME_ZONE_STATE_CLOSED) { - trace_pci_nvme_clear_ns_close(state, zone->d.zslba); - nvme_set_zone_state(zone, NVME_ZONE_STATE_CLOSED); - } - nvme_aor_inc_active(ns); - QTAILQ_INSERT_HEAD(&ns->closed_zones, zone, entry); - } else { - trace_pci_nvme_clear_ns_reset(state, zone->d.zslba); - if (zone->d.za & NVME_ZA_ZRWA_VALID) { - zone->d.za &= ~NVME_ZA_ZRWA_VALID; - ns->zns.numzrwa++; - } - nvme_set_zone_state(zone, NVME_ZONE_STATE_EMPTY); - } -} - /* * Close all the zones that are currently open. */ static void nvme_zoned_ns_shutdown(NvmeNamespace *ns) { - NvmeZone *zone, *next; - - QTAILQ_FOREACH_SAFE(zone, &ns->closed_zones, entry, next) { - QTAILQ_REMOVE(&ns->closed_zones, zone, entry); - nvme_aor_dec_active(ns); - nvme_clear_zone(ns, zone); - } - QTAILQ_FOREACH_SAFE(zone, &ns->imp_open_zones, entry, next) { - QTAILQ_REMOVE(&ns->imp_open_zones, zone, entry); - nvme_aor_dec_open(ns); - nvme_aor_dec_active(ns); - nvme_clear_zone(ns, zone); - } - QTAILQ_FOREACH_SAFE(zone, &ns->exp_open_zones, entry, next) { - QTAILQ_REMOVE(&ns->exp_open_zones, zone, entry); - nvme_aor_dec_open(ns); - nvme_aor_dec_active(ns); - nvme_clear_zone(ns, zone); - } - - assert(ns->nr_open_zones == 0); + /* Set states (exp/imp_open/closed/full) to empty */ } static NvmeRuHandle *nvme_find_ruh_by_attr(NvmeEnduranceGroup *endgrp, @@ -662,7 +592,6 @@ void nvme_ns_cleanup(NvmeNamespace *ns) { if (blk_get_zone_model(ns->blkconf.blk)) { g_free(ns->id_ns_zoned); - g_free(ns->zone_array); } if (ns->endgrp && ns->endgrp->fdp.enabled) { @@ -776,10 +705,6 @@ static Property nvme_ns_props[] = { DEFINE_PROP_UINT8("msrc", NvmeNamespace, params.msrc, 127), DEFINE_PROP_BOOL("zoned.cross_read", NvmeNamespace, params.cross_zone_read, false), - DEFINE_PROP_UINT32("zoned.max_active", NvmeNamespace, - params.max_active_zones, 0), - DEFINE_PROP_UINT32("zoned.max_open", NvmeNamespace, - params.max_open_zones, 0), DEFINE_PROP_UINT32("zoned.numzrwa", NvmeNamespace, params.numzrwa, 0), DEFINE_PROP_SIZE("zoned.zrwas", NvmeNamespace, params.zrwas, 0), DEFINE_PROP_SIZE("zoned.zrwafg", NvmeNamespace, params.zrwafg, -1), diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h index 37007952fc..c2d1b07f88 100644 --- a/hw/nvme/nvme.h +++ b/hw/nvme/nvme.h @@ -150,6 +150,9 @@ static inline NvmeNamespace *nvme_subsys_ns(NvmeSubsystem *subsys, #define NVME_NS(obj) \ OBJECT_CHECK(NvmeNamespace, (obj), TYPE_NVME_NS) +#define TO_DO_STATE 0 +#define TO_DO_ZA 0 + typedef struct NvmeZone { NvmeZoneDescr d; uint64_t w_ptr; @@ -190,8 +193,6 @@ typedef struct NvmeNamespaceParams { uint8_t msrc; bool cross_zone_read; - uint32_t max_active_zones; - uint32_t max_open_zones; uint32_t numzrwa; uint64_t zrwas; @@ -228,11 +229,10 @@ typedef struct NvmeNamespace { QTAILQ_ENTRY(NvmeNamespace) entry; NvmeIdNsZoned *id_ns_zoned; - NvmeZone *zone_array; - QTAILQ_HEAD(, NvmeZone) exp_open_zones; - QTAILQ_HEAD(, NvmeZone) imp_open_zones; - QTAILQ_HEAD(, NvmeZone) closed_zones; - QTAILQ_HEAD(, NvmeZone) full_zones; + uint32_t *exp_open_zones; + uint32_t *imp_open_zones; + uint32_t *closed_zones; + uint32_t *full_zones; uint32_t num_zones; uint64_t zone_size; uint64_t zone_capacity; @@ -265,6 +265,12 @@ static inline uint32_t nvme_nsid(NvmeNamespace *ns) return 0; } +/* Bytes to LBAs */ +static inline uint64_t nvme_b2l(NvmeNamespace *ns, uint64_t lba) +{ + return lba >> ns->lbaf.ds; +} + static inline size_t nvme_l2b(NvmeNamespace *ns, uint64_t lba) { return lba << ns->lbaf.ds; @@ -285,70 +291,9 @@ static inline bool nvme_ns_ext(NvmeNamespace *ns) return !!NVME_ID_NS_FLBAS_EXTENDED(ns->id_ns.flbas); } -static inline NvmeZoneState nvme_get_zone_state(NvmeZone *zone) +static inline NvmeZoneState nvme_get_zone_state(uint64_t wp) { - return zone->d.zs >> 4; -} - -static inline void nvme_set_zone_state(NvmeZone *zone, NvmeZoneState state) -{ - zone->d.zs = state << 4; -} - -static inline uint64_t nvme_zone_rd_boundary(NvmeNamespace *ns, NvmeZone *zone) -{ - return zone->d.zslba + ns->zone_size; -} - -static inline uint64_t nvme_zone_wr_boundary(NvmeZone *zone) -{ - return zone->d.zslba + zone->d.zcap; -} - -static inline bool nvme_wp_is_valid(NvmeZone *zone) -{ - uint8_t st = nvme_get_zone_state(zone); - - return st != NVME_ZONE_STATE_FULL && - st != NVME_ZONE_STATE_READ_ONLY && - st != NVME_ZONE_STATE_OFFLINE; -} - -static inline void nvme_aor_inc_open(NvmeNamespace *ns) -{ - assert(ns->nr_open_zones >= 0); - if (ns->params.max_open_zones) { - ns->nr_open_zones++; - assert(ns->nr_open_zones <= ns->params.max_open_zones); - } -} - -static inline void nvme_aor_dec_open(NvmeNamespace *ns) -{ - if (ns->params.max_open_zones) { - assert(ns->nr_open_zones > 0); - ns->nr_open_zones--; - } - assert(ns->nr_open_zones >= 0); -} - -static inline void nvme_aor_inc_active(NvmeNamespace *ns) -{ - assert(ns->nr_active_zones >= 0); - if (ns->params.max_active_zones) { - ns->nr_active_zones++; - assert(ns->nr_active_zones <= ns->params.max_active_zones); - } -} - -static inline void nvme_aor_dec_active(NvmeNamespace *ns) -{ - if (ns->params.max_active_zones) { - assert(ns->nr_active_zones > 0); - ns->nr_active_zones--; - assert(ns->nr_active_zones >= ns->nr_open_zones); - } - assert(ns->nr_active_zones >= 0); + return wp >> 60; } static inline void nvme_fdp_stat_inc(uint64_t *a, uint64_t b) diff --git a/include/block/block-common.h b/include/block/block-common.h index d7599564db..ea213c3887 100644 --- a/include/block/block-common.h +++ b/include/block/block-common.h @@ -90,6 +90,7 @@ typedef enum BlockZoneOp { BLK_ZO_CLOSE, BLK_ZO_FINISH, BLK_ZO_RESET, + BLK_ZO_OFFLINE, } BlockZoneOp; typedef enum BlockZoneModel { @@ -269,6 +270,13 @@ typedef enum { */ #define BDRV_ZT_IS_CONV(wp) (wp & (1ULL << 63)) +/* + * Clear the zone state, type and attribute information in the wp. + */ +#define BDRV_ZP_GET_WP(wp) ((wp << 6) >> 6) +#define BDRV_ZP_GET_ZS(wp) (wp >> 60) +#define BDRV_ZP_GET_ZA(wp) (wp & ((1ULL << 8) - 1ULL) << 51) + #define BDRV_REQUEST_MAX_SECTORS MIN_CONST(SIZE_MAX >> BDRV_SECTOR_BITS, \ INT_MAX >> BDRV_SECTOR_BITS) #define BDRV_REQUEST_MAX_BYTES (BDRV_REQUEST_MAX_SECTORS << BDRV_SECTOR_BITS) diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h index c649f1ca75..ad983ad243 100644 --- a/include/block/block_int-common.h +++ b/include/block/block_int-common.h @@ -916,6 +916,8 @@ typedef struct BlockLimits { /* size of data that is associated with a zone in bytes */ uint32_t zd_extension_size; + + uint8_t zone_attribute; } BlockLimits; typedef struct BdrvOpBlocker BdrvOpBlocker; From patchwork Mon Nov 27 08:56:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sam Li X-Patchwork-Id: 13469274 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C3B29C4167B for ; Mon, 27 Nov 2023 09:00:56 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1r7XRp-0002tw-4A; Mon, 27 Nov 2023 03:58:33 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1r7XRc-0002rL-Iy; Mon, 27 Nov 2023 03:58:20 -0500 Received: from mail-pg1-x535.google.com ([2607:f8b0:4864:20::535]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1r7XRa-0000P0-Fz; Mon, 27 Nov 2023 03:58:20 -0500 Received: by mail-pg1-x535.google.com with SMTP id 41be03b00d2f7-5bd099e3d3cso2168449a12.1; Mon, 27 Nov 2023 00:58:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701075496; x=1701680296; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=IRj3retRAO0HOo6B5A/+ujJ4Xk6jTr6OaVB94xeIZHU=; b=ZwVTJwJxoMqMKvR8XhJx6AGgl19QD7AWZ0CBeFekRWRDaCNdbBGdTeOH46xnRDUU17 IGRCjBSvm5cjKKaIUhWwUJ4LrFZ5OYgcARlOTKHmo05ZoQe+qhYbUc7lzgZcljNxfZ71 ugyrotWh9yN/97cGspZr0fv3ZhGSWAbgpwD/fPGsfclzaqKl1V6pqE6eAHrfMBTLWkxs Ks+JzpUd3y5UP9sWTEnk30KZB6+bhJ9h3CKUOeMyyim6qobVtr1STGSw6YK1mG/qzMZl g1Ql7WjwBdJWijg+8JzXtuAYxg+2spd57q7ZkrZumWK3e3KQkLhyLDRAvpBpxDZspUZh ujqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701075496; x=1701680296; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=IRj3retRAO0HOo6B5A/+ujJ4Xk6jTr6OaVB94xeIZHU=; b=Q9FEBRoM63xk6wCZq1FYz7pk/6ozm88UqvvJZMTIqbgpuxxtRiJy4ex99oUEOJIOOW /Yfc3eRxkT++bJOUi3XYE7424ByYLSVNqg+CFTTc6qmO9nI9Nx9YDcn1n6iMXCZHNX8y EInFJ6ggBTsdN+5B2BGGT00HHXsn7QSIDI1fUqE8afnE2hgzzHWP0alrpLjcLiCiO7Y1 O/RDfm5OF/a7xmcog3/x2VvBUfdGIkNDRvfqQOD51Z9JXV1SfqCr8ZTYE0Kx05vCzO0v 3rDAb+30li6vB/tvRBF5CCL7FsXC1/lRkFxR1ORlVs/yLOizK1T0p4qbXX1WI/ju2mRV aE5g== X-Gm-Message-State: AOJu0YwbeUSvTHdUlr6PLd+F72Z/FY8mRaw6Rr8voh6nPsxoqm2p+P3r 7svShW3bGq3TzO1fuZ2TdkZ9DNP/xl46bQ== X-Google-Smtp-Source: AGHT+IGoRiatiaIxHakRw0KCVtEWe6ggT5gGC9w4oEZPpbHrRU+iUbuP/iss435Pwkg6EwGa0imidg== X-Received: by 2002:a17:90b:17cd:b0:285:a0ae:6a4c with SMTP id me13-20020a17090b17cd00b00285a0ae6a4cmr6512317pjb.29.1701075495709; Mon, 27 Nov 2023 00:58:15 -0800 (PST) Received: from fedlinux.. ([106.84.128.244]) by smtp.gmail.com with ESMTPSA id ci8-20020a17090afc8800b0027d0adf653bsm6906901pjb.7.2023.11.27.00.58.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 27 Nov 2023 00:58:15 -0800 (PST) From: Sam Li To: qemu-devel@nongnu.org Cc: stefanha@redhat.com, Klaus Jensen , qemu-block@nongnu.org, hare@suse.de, David Hildenbrand , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Keith Busch , Hanna Reitz , dmitry.fomichev@wdc.com, Kevin Wolf , Markus Armbruster , Eric Blake , Peter Xu , Paolo Bonzini , dlemoal@kernel.org, Sam Li Subject: [RFC v2 6/7] hw/nvme: refactor zone append write using block layer APIs Date: Mon, 27 Nov 2023 16:56:40 +0800 Message-Id: <20231127085641.3729-7-faithilikerun@gmail.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20231127085641.3729-1-faithilikerun@gmail.com> References: <20231127085641.3729-1-faithilikerun@gmail.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::535; envelope-from=faithilikerun@gmail.com; helo=mail-pg1-x535.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Signed-off-by: Sam Li --- block/qcow2.c | 2 +- hw/nvme/ctrl.c | 190 ++++++++++++++++++++++++++++++++----------- include/sysemu/dma.h | 3 + system/dma-helpers.c | 17 ++++ 4 files changed, 162 insertions(+), 50 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index dfaf5566e2..74d2e2bf39 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -2290,7 +2290,7 @@ static void qcow2_refresh_limits(BlockDriverState *bs, Error **errp) bs->bl.max_open_zones = s->zoned_header.max_open_zones; bs->bl.zone_size = s->zoned_header.zone_size; bs->bl.zone_capacity = s->zoned_header.zone_capacity; - bs->bl.write_granularity = BDRV_SECTOR_SIZE; + bs->bl.write_granularity = BDRV_SECTOR_SIZE; /* physical block size */ bs->bl.zd_extension_size = s->zoned_header.zd_extension_size; } diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index b9ed3495e1..f65a87646e 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -1735,6 +1735,95 @@ static void nvme_misc_cb(void *opaque, int ret) nvme_enqueue_req_completion(nvme_cq(req), req); } +typedef struct NvmeZoneCmdAIOCB { + NvmeRequest *req; + NvmeCmd *cmd; + NvmeCtrl *n; + + union { + struct { + uint32_t partial; + unsigned int nr_zones; + BlockZoneDescriptor *zones; + } zone_report_data; + struct { + int64_t offset; + } zone_append_data; + }; +} NvmeZoneCmdAIOCB; + +static void nvme_blk_zone_append_complete_cb(void *opaque, int ret) +{ + NvmeZoneCmdAIOCB *cb = opaque; + NvmeRequest *req = cb->req; + int64_t *offset = (int64_t *)&req->cqe; + + if (ret) { + nvme_aio_err(req, ret); + } + + *offset = nvme_b2l(req->ns, cb->zone_append_data.offset); + nvme_enqueue_req_completion(nvme_cq(req), req); + g_free(cb); +} + +static inline void nvme_blk_zone_append(BlockBackend *blk, int64_t *offset, + uint32_t align, + BlockCompletionFunc *cb, + NvmeZoneCmdAIOCB *aiocb) +{ + NvmeRequest *req = aiocb->req; + assert(req->sg.flags & NVME_SG_ALLOC); + + if (req->sg.flags & NVME_SG_DMA) { + req->aiocb = dma_blk_zone_append(blk, &req->sg.qsg, (int64_t)offset, + align, cb, aiocb); + } else { + req->aiocb = blk_aio_zone_append(blk, offset, &req->sg.iov, 0, + cb, aiocb); + } +} + +static void nvme_zone_append_cb(void *opaque, int ret) +{ + NvmeZoneCmdAIOCB *aiocb = opaque; + NvmeRequest *req = aiocb->req; + NvmeNamespace *ns = req->ns; + + BlockBackend *blk = ns->blkconf.blk; + + trace_pci_nvme_rw_cb(nvme_cid(req), blk_name(blk)); + + if (ret) { + goto out; + } + + if (ns->lbaf.ms) { + NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd; + uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb) + 1; + int64_t offset = aiocb->zone_append_data.offset; + + if (nvme_ns_ext(ns) || req->cmd.mptr) { + uint16_t status; + + nvme_sg_unmap(&req->sg); + status = nvme_map_mdata(nvme_ctrl(req), nlb, req); + if (status) { + ret = -EFAULT; + goto out; + } + + return nvme_blk_zone_append(blk, &offset, 1, + nvme_blk_zone_append_complete_cb, + aiocb); + } + } + +out: + nvme_blk_zone_append_complete_cb(aiocb, ret); +} + + void nvme_rw_complete_cb(void *opaque, int ret) { NvmeRequest *req = opaque; @@ -3061,6 +3150,9 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest *req, bool append, uint64_t mapped_size = data_size; uint64_t data_offset; BlockBackend *blk = ns->blkconf.blk; + BlockZoneWps *wps = blk_get_zone_wps(blk); + uint32_t zone_size = blk_get_zone_size(blk); + uint32_t zone_idx; uint16_t status; if (nvme_ns_ext(ns)) { @@ -3091,42 +3183,47 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest *req, bool append, } if (blk_get_zone_model(blk)) { - uint32_t zone_size = blk_get_zone_size(blk); - uint32_t zone_idx = slba / zone_size; - int64_t zone_start = zone_idx * zone_size; + assert(wps); + if (zone_size) { + zone_idx = slba / zone_size; + int64_t zone_start = zone_idx * zone_size; + + if (append) { + bool piremap = !!(ctrl & NVME_RW_PIREMAP); + + if (n->params.zasl && + data_size > (uint64_t) + n->page_size << n->params.zasl) { + trace_pci_nvme_err_zasl(data_size); + return NVME_INVALID_FIELD | NVME_DNR; + } - if (append) { - bool piremap = !!(ctrl & NVME_RW_PIREMAP); + rw->slba = cpu_to_le64(slba); - if (n->params.zasl && - data_size > (uint64_t)n->page_size << n->params.zasl) { - trace_pci_nvme_err_zasl(data_size); - return NVME_INVALID_FIELD | NVME_DNR; - } + switch (NVME_ID_NS_DPS_TYPE(ns->id_ns.dps)) { + case NVME_ID_NS_DPS_TYPE_1: + if (!piremap) { + return NVME_INVALID_PROT_INFO | NVME_DNR; + } - rw->slba = cpu_to_le64(slba); - switch (NVME_ID_NS_DPS_TYPE(ns->id_ns.dps)) { - case NVME_ID_NS_DPS_TYPE_1: - if (!piremap) { - return NVME_INVALID_PROT_INFO | NVME_DNR; - } + /* fallthrough */ - /* fallthrough */ + case NVME_ID_NS_DPS_TYPE_2: + if (piremap) { + uint32_t reftag = le32_to_cpu(rw->reftag); + rw->reftag = + cpu_to_le32(reftag + (slba - zone_start)); + } - case NVME_ID_NS_DPS_TYPE_2: - if (piremap) { - uint32_t reftag = le32_to_cpu(rw->reftag); - rw->reftag = cpu_to_le32(reftag + (slba - zone_start)); - } + break; - break; + case NVME_ID_NS_DPS_TYPE_3: + if (piremap) { + return NVME_INVALID_PROT_INFO | NVME_DNR; + } - case NVME_ID_NS_DPS_TYPE_3: - if (piremap) { - return NVME_INVALID_PROT_INFO | NVME_DNR; + break; } - - break; } } @@ -3146,9 +3243,21 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest *req, bool append, goto invalid; } - block_acct_start(blk_get_stats(blk), &req->acct, data_size, - BLOCK_ACCT_WRITE); - nvme_blk_write(blk, data_offset, BDRV_SECTOR_SIZE, nvme_rw_cb, req); + if (append) { + NvmeZoneCmdAIOCB *cb = g_malloc(sizeof(NvmeZoneCmdAIOCB)); + cb->req = req; + cb->zone_append_data.offset = data_offset; + + block_acct_start(blk_get_stats(blk), &req->acct, data_size, + BLOCK_ACCT_ZONE_APPEND); + nvme_blk_zone_append(blk, &cb->zone_append_data.offset, + blk_get_write_granularity(blk), + nvme_zone_append_cb, cb); + } else { + block_acct_start(blk_get_stats(blk), &req->acct, data_size, + BLOCK_ACCT_WRITE); + nvme_blk_write(blk, data_offset, BDRV_SECTOR_SIZE, nvme_rw_cb, req); + } } else { req->aiocb = blk_aio_pwrite_zeroes(blk, data_offset, data_size, BDRV_REQ_MAY_UNMAP, nvme_rw_cb, @@ -3172,24 +3281,7 @@ static inline uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeRequest *req) return nvme_do_write(n, req, false, true); } -typedef struct NvmeZoneCmdAIOCB { - NvmeRequest *req; - NvmeCmd *cmd; - NvmeCtrl *n; - - union { - struct { - uint32_t partial; - unsigned int nr_zones; - BlockZoneDescriptor *zones; - } zone_report_data; - struct { - int64_t offset; - } zone_append_data; - }; -} NvmeZoneCmdAIOCB; - -static inline uint16_t nvme_zone_append(NvmeCtrl *n, NvmeRequest *req) +static uint16_t nvme_zone_append(NvmeCtrl *n, NvmeRequest *req) { return nvme_do_write(n, req, true, false); } diff --git a/include/sysemu/dma.h b/include/sysemu/dma.h index a1ac5bc1b5..680e0b5477 100644 --- a/include/sysemu/dma.h +++ b/include/sysemu/dma.h @@ -301,6 +301,9 @@ BlockAIOCB *dma_blk_read(BlockBackend *blk, BlockAIOCB *dma_blk_write(BlockBackend *blk, QEMUSGList *sg, uint64_t offset, uint32_t align, BlockCompletionFunc *cb, void *opaque); +BlockAIOCB *dma_blk_zone_append(BlockBackend *blk, + QEMUSGList *sg, int64_t offset, uint32_t align, + void (*cb)(void *opaque, int ret), void *opaque); MemTxResult dma_buf_read(void *ptr, dma_addr_t len, dma_addr_t *residual, QEMUSGList *sg, MemTxAttrs attrs); MemTxResult dma_buf_write(void *ptr, dma_addr_t len, dma_addr_t *residual, diff --git a/system/dma-helpers.c b/system/dma-helpers.c index 36211acc7e..98c97a165d 100644 --- a/system/dma-helpers.c +++ b/system/dma-helpers.c @@ -274,6 +274,23 @@ BlockAIOCB *dma_blk_write(BlockBackend *blk, DMA_DIRECTION_TO_DEVICE); } +static +BlockAIOCB *dma_blk_zone_append_io_func(int64_t offset, QEMUIOVector *iov, + BlockCompletionFunc *cb, void *cb_opaque, + void *opaque) +{ + BlockBackend *blk = opaque; + return blk_aio_zone_append(blk, (int64_t *)offset, iov, 0, cb, cb_opaque); +} + +BlockAIOCB *dma_blk_zone_append(BlockBackend *blk, + QEMUSGList *sg, int64_t offset, uint32_t align, + void (*cb)(void *opaque, int ret), void *opaque) +{ + return dma_blk_io(blk_get_aio_context(blk), sg, offset, align, + dma_blk_zone_append_io_func, blk, cb, opaque, + DMA_DIRECTION_TO_DEVICE); +} static MemTxResult dma_buf_rw(void *buf, dma_addr_t len, dma_addr_t *residual, QEMUSGList *sg, DMADirection dir, From patchwork Mon Nov 27 08:56:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sam Li X-Patchwork-Id: 13469273 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3E37AC4167B for ; Mon, 27 Nov 2023 09:00:35 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1r7XRp-0002u6-Rd; Mon, 27 Nov 2023 03:58:34 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1r7XRk-0002tC-Ve; Mon, 27 Nov 2023 03:58:29 -0500 Received: from mail-pj1-x1032.google.com ([2607:f8b0:4864:20::1032]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1r7XRi-0000RQ-Hv; Mon, 27 Nov 2023 03:58:28 -0500 Received: by mail-pj1-x1032.google.com with SMTP id 98e67ed59e1d1-285d32c6f07so422643a91.1; Mon, 27 Nov 2023 00:58:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701075504; x=1701680304; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=qlLcGE49x20put7Po+TMfj+/AujYOwf5FbyhtttvTBI=; b=XU0jbsJ6AZrBsVjPlvagEhKO5/At8Jwwx66s+1zsSaRuBa/atS7if5exceJMgk3sIZ DyT2GY14KNgS0f+VtN3tWCU1TVG3+J9XjaiUmQyqNvfWRTkZm0j5dF2G8y/EUwJeTtIH XUvLK4GlbMGLt6rK/AcqHPrE9jgJqNMbH5ePsMDlsVgg/RGMNhZ5dZVlYlA3LZSToE/Y IyBW9gm/LxssQW9xDArrSieYqTElDGBgAjhgg3DpP16ETWdTfqq/Bu1HKdscJS4ODU3z FP4F7BXwuovcFvZQsQI0tFvkj4IqOcX8AMuYxq/Xoxae+/a+4TwcirepGVq0a9Ew9k9U /tnw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701075504; x=1701680304; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=qlLcGE49x20put7Po+TMfj+/AujYOwf5FbyhtttvTBI=; b=erxyeP6aG2vkMgVPHXsQDP8PjSSt0jcgG8q+/fjZjvgfU6AFx24vcwpw+/DX/Zjrnz hmJ2Qb0LMu3LUA80opceVNSZ51CeTLAIZXjNo0DywgFLt1vrRQGORYL3iIecnzHHqb8T I4XdccOYXLgkWatPGbbJd+Y32jEwS2lgIqmh05fcgB5q3toAMr3dX7d0mJmmnDeWSbTS cM1jLiVfwU7HTX762wSVuFzjislBp604ZXkej2f5ozXmCTnfSSg4YFVU9KMsRvY82jZw /ZgUGZm8kcvGkYt/Jl1geUk3saEi/bJ3JQ/nwD4zWxrtwn+sKcQ+lnSI7G6uYLche5oJ gHsw== X-Gm-Message-State: AOJu0YzTp28fJrWSdPsiOuafurtSivmsndYSLc8b2pDaITEHEKTxPNak qIN3rN5rJ1Ou8bU27SmMAHfyx3WM87cLjg== X-Google-Smtp-Source: AGHT+IHaZ+bkJCMYqjqqwssV8Y9lc+WNyeVkf5iZQ+1tEUNTh9jpNefJVkMH02JturbWrhLnPRXkVg== X-Received: by 2002:a17:90b:1d0f:b0:285:a2af:31aa with SMTP id on15-20020a17090b1d0f00b00285a2af31aamr4838294pjb.37.1701075504087; Mon, 27 Nov 2023 00:58:24 -0800 (PST) Received: from fedlinux.. ([106.84.128.244]) by smtp.gmail.com with ESMTPSA id ci8-20020a17090afc8800b0027d0adf653bsm6906901pjb.7.2023.11.27.00.58.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 27 Nov 2023 00:58:23 -0800 (PST) From: Sam Li To: qemu-devel@nongnu.org Cc: stefanha@redhat.com, Klaus Jensen , qemu-block@nongnu.org, hare@suse.de, David Hildenbrand , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Keith Busch , Hanna Reitz , dmitry.fomichev@wdc.com, Kevin Wolf , Markus Armbruster , Eric Blake , Peter Xu , Paolo Bonzini , dlemoal@kernel.org, Sam Li Subject: [RFC v2 7/7] hw/nvme: make ZDED persistent Date: Mon, 27 Nov 2023 16:56:41 +0800 Message-Id: <20231127085641.3729-8-faithilikerun@gmail.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20231127085641.3729-1-faithilikerun@gmail.com> References: <20231127085641.3729-1-faithilikerun@gmail.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::1032; envelope-from=faithilikerun@gmail.com; helo=mail-pj1-x1032.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Zone descriptor extension data (ZDED) is not persistent across QEMU restarts. The zone descriptor extension valid bit (ZDEV) is part of zone attributes, which sets to one when the ZDED is associated with the zone. With the qcow2 img as the backing file, the NVMe ZNS device stores the zone attributes at the following eight bit of zone type bit of write pointers for each zone. The ZDED is stored as part of zoned metadata as write pointers. Signed-off-by: Sam Li --- block/qcow2.c | 45 ++++++++++++++++++++++++++++++++++++ hw/nvme/ctrl.c | 1 + include/block/block-common.h | 1 + 3 files changed, 47 insertions(+) diff --git a/block/qcow2.c b/block/qcow2.c index 74d2e2bf39..861a8f9f06 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -25,6 +25,7 @@ #include "qemu/osdep.h" #include "block/qdict.h" +#include "block/nvme.h" #include "sysemu/block-backend.h" #include "qemu/main-loop.h" #include "qemu/module.h" @@ -235,6 +236,17 @@ static inline BlockZoneState qcow2_get_zone_state(BlockDriverState *bs, return BLK_ZS_NOT_WP; } +static inline void qcow2_set_za(uint64_t *wp, uint8_t za) +{ + /* + * The zone attribute takes up one byte. Store it after the zoned + * bit. + */ + uint64_t addr = *wp; + addr |= ((uint64_t)za << 51); + *wp = addr; +} + /* * Write the new wp value to the dedicated location of the image file. */ @@ -4990,6 +5002,36 @@ unlock: return ret; } +static int qcow2_zns_set_zded(BlockDriverState *bs, uint32_t index) +{ + BDRVQcow2State *s = bs->opaque; + int ret; + + qemu_co_mutex_lock(&bs->wps->colock); + uint64_t *wp = &bs->wps->wp[index]; + BlockZoneState zs = qcow2_get_zone_state(bs, index); + if (zs == BLK_ZS_EMPTY) { + ret = qcow2_check_zone_resources(bs, zs); + if (ret < 0) { + goto unlock; + } + + qcow2_set_za(wp, NVME_ZA_ZD_EXT_VALID); + ret = qcow2_write_wp_at(bs, wp, index); + if (ret < 0) { + error_report("Failed to set zone extension at 0x%" PRIx64 "", *wp); + goto unlock; + } + s->nr_zones_closed++; + qemu_co_mutex_unlock(&bs->wps->colock); + return ret; + } + +unlock: + qemu_co_mutex_unlock(&bs->wps->colock); + return NVME_ZONE_INVAL_TRANSITION; +} + static int coroutine_fn qcow2_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op, int64_t offset, int64_t len) { @@ -5046,6 +5088,9 @@ static int coroutine_fn qcow2_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op, case BLK_ZO_OFFLINE: /* There are no transitions from the offline state to any other state */ break; + case BLK_ZO_SET_ZDED: + ret = qcow2_zns_set_zded(bs, index); + break; default: error_report("Unsupported zone op: 0x%x", op); ret = -ENOTSUP; diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index f65a87646e..c33e24e303 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -3474,6 +3474,7 @@ static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, NvmeRequest *req) break; case NVME_ZONE_ACTION_SET_ZD_EXT: + op = BLK_ZO_SET_ZDED; int zd_ext_size = blk_get_zd_ext_size(blk); trace_pci_nvme_set_descriptor_extension(slba, zone_idx); if (all || !zd_ext_size) { diff --git a/include/block/block-common.h b/include/block/block-common.h index ea213c3887..b61541599f 100644 --- a/include/block/block-common.h +++ b/include/block/block-common.h @@ -91,6 +91,7 @@ typedef enum BlockZoneOp { BLK_ZO_FINISH, BLK_ZO_RESET, BLK_ZO_OFFLINE, + BLK_ZO_SET_ZDED, } BlockZoneOp; typedef enum BlockZoneModel {