From patchwork Mon Jan 22 19:00:07 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sam Li X-Patchwork-Id: 13526113 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id CF6A3C47DD9 for ; Mon, 22 Jan 2024 19:01:34 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rRzX4-0005aa-QW; Mon, 22 Jan 2024 14:00:30 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rRzX2-0005ZF-T0; Mon, 22 Jan 2024 14:00:28 -0500 Received: from mail-ed1-x531.google.com ([2a00:1450:4864:20::531]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rRzX1-00023t-1t; Mon, 22 Jan 2024 14:00:28 -0500 Received: by mail-ed1-x531.google.com with SMTP id 4fb4d7f45d1cf-557dcb0f870so3969030a12.2; Mon, 22 Jan 2024 11:00:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1705950024; x=1706554824; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=aTarc3jgbW0f6VY8YmALJoPGDUg3nmJKlHXVbN5rD1c=; b=iKaI+bMyOx7EW+831vmSAfB/CeXbiHXXAQWoBaZLACmQa3oPz8JDKd+wUlWVDzd+pU /iPng+BUD42UyXAAdvFOE5iKb/fm71ves9ngXeVWP/x16uBlbsu0JT8+cZSiPofPCBXr 9qQzQVrnlt97jch4SFdpdiJQR35VM3c4I9UinqjmEUei678YhR5DK1OoGwBk1Q9/9xuy WHw9bu7FbeKoKdmMvVmgEufNU3bL+S5QhnrtW2G1pWky1giJfpL30lINJHD+YDymYBI9 b0LYZsvHwC1MkgEXcsKquqzZrsHnXMyatCNGeCEjA9gCjCY87Ofc1djMI16BucPdIeKI g6nQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705950024; x=1706554824; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=aTarc3jgbW0f6VY8YmALJoPGDUg3nmJKlHXVbN5rD1c=; b=gXSC6QHiHhjhW1e7ijnpbpodFXcNSMLhM9HFqjaq+gtxAJo7sqwzLyKlLGY4Jxi21q L0Hs1TffSF8VdVldADhN8W5TuCjcFAwrUejZkGICEzgbjM4T1bRQ8VDlEQM2N9SRHeCQ TtnTZEOaMrIuyXMIcHc7UYTNlgVU20PFP3nRA3VOjGDlIjKqItM2LDNSfhcB9ltgbFMx ZLGMU5xduAxe9pE8RlNpMe6bktb2DIv7lKq9w9pjTHLqbf6qWFwqV9jig6tX74iDxi74 etnqCOd77/JpGi9+ONdYzS1zWR/juU1xEIO0uoXK7HWrwd3IxRsIouSxL6bUqBre/ykb ClOg== X-Gm-Message-State: AOJu0Yy2jawA2SilSvkltwgrxt0D3fViDGeVrH5MO52CF6i++4suNzes HnhcqvnH7MJougaQaphX9trFFFTxYkNpLDgC9nG1v1jWESJsN6MESMkNJFbXMR4= X-Google-Smtp-Source: AGHT+IHgbl01qfHFVjd/LEGzkOQkMIZrlS67HN5LgAl40n0RasClh09LveU4emiUN+D8+gQkscNaqg== X-Received: by 2002:a17:906:a10e:b0:a2c:e804:e2ec with SMTP id t14-20020a170906a10e00b00a2ce804e2ecmr2112462ejy.51.1705950024238; Mon, 22 Jan 2024 11:00:24 -0800 (PST) Received: from localhost.localdomain ([2a02:2454:367:1500:fa08:d4d:b569:ac2d]) by smtp.gmail.com with ESMTPSA id k3-20020a170906a38300b00a298d735a1bsm13842413ejz.149.2024.01.22.11.00.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Jan 2024 11:00:23 -0800 (PST) From: Sam Li To: qemu-devel@nongnu.org Cc: qemu-block@nongnu.org, Kevin Wolf , Paolo Bonzini , stefanha@redhat.com, Peter Xu , David Hildenbrand , dmitry.fomichev@wdc.com, hare@suse.de, Hanna Reitz , Eric Blake , Markus Armbruster , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , dlemoal@kernel.org, Keith Busch , Klaus Jensen , Sam Li Subject: [RFC v3 1/7] docs/qcow2: add zd_extension_size option to the zoned format feature Date: Mon, 22 Jan 2024 20:00:07 +0100 Message-Id: <20240122190013.41302-2-faithilikerun@gmail.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240122190013.41302-1-faithilikerun@gmail.com> References: <20240122190013.41302-1-faithilikerun@gmail.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2a00:1450:4864:20::531; envelope-from=faithilikerun@gmail.com; helo=mail-ed1-x531.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org The NVMe ZNS command set has the zone descriptor extension feature for associating the data to a zone. Devices that supports ZAC/ZBC have zero zone descriptor extension size. Signed-off-by: Sam Li --- docs/interop/qcow2.txt | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt index a8dd4c3b15..106477d9ad 100644 --- a/docs/interop/qcow2.txt +++ b/docs/interop/qcow2.txt @@ -436,6 +436,15 @@ The fields of the zoned extension are: The offset of zoned metadata structure in the contained image, in bytes. + 44 - 51: zd_extension_size + The size of zone descriptor extension data in bytes. + The value must be a multiple of 64. + + The zone descriptor extension feature is associating data + to a zone which is only available in the NVMe ZNS command + set. A value of zero indicates the feature is not + available. + == Full disk encryption header pointer == The full disk encryption header must be present if, and only if, the From patchwork Mon Jan 22 19:00:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sam Li X-Patchwork-Id: 13526114 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 77046C47DAF for ; Mon, 22 Jan 2024 19:01:47 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rRzX6-0005dU-0U; Mon, 22 Jan 2024 14:00:32 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rRzX4-0005Zw-E4; Mon, 22 Jan 2024 14:00:30 -0500 Received: from mail-ej1-x634.google.com ([2a00:1450:4864:20::634]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rRzX2-000244-5i; Mon, 22 Jan 2024 14:00:30 -0500 Received: by mail-ej1-x634.google.com with SMTP id a640c23a62f3a-a2cea0563cbso533052866b.3; Mon, 22 Jan 2024 11:00:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1705950026; x=1706554826; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=5hNJ04BBtl/d/n4RMb97rf/4F5jeEcyAbdZoYNre8FU=; b=HOaLKCDeBcbeDbZjwYoHUt0AlWSHwbuV5wK9+6NmNiLzd+BkK2bQgGVDL46xP/ypAj GrpwQhE27IbxFE/znO5CErfzihPf2P34ypMZRXOiHrBgpv1PlGH+c89Xbc51XWOk+IPs Ymsj9fZb1NRmYT3fZW7t61r+zmqvHgNZDGkd/u/p+2e8Hb49YOX4aMISa/egzFkP/Uam 6EVKnI88WuZfZ/aX/TrckUNyMt83Y4rnORFOD0T+9Iw2S3w7z3HrAEJFa/OPK0cHrBii ++sfnfCEq8z2nDlyEkjK7Rkf8R4grWrXV8tMGYJoyleeFDv1H/uSo2hx8qAJMnlNxVzd j8+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705950026; x=1706554826; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5hNJ04BBtl/d/n4RMb97rf/4F5jeEcyAbdZoYNre8FU=; b=xHqp1h1MStzC2fZoXN8IFLPVvGpgKwOtnofNqHbLxVF3Mehn4VMyhglugWam4rpfuk fUmxiKWJzGeNEhL9rwdGUfMpavocvNpT4JtzkleBGCxd8e6A+2maYelyt9eI4h6/JBuM LtqHvULcYyWfe/5VbKNPxbRdU1iZVZoaCiROWjHwXio04noa6ABb4rwdRHMHLn8TT9Bg 1T7rJf1qQbzy1dsAypb4lIYhisfyr46yYIrba9kMYVzzrQ1R0yrqGfaJcJphjNIMJ4ZN 1lnuacG3jUdWvfyhVqwzZKfrbhAyTHnJ/BdkjvfIFYjxDaPuP7226eLTgjg4sL9pv8q8 i3NQ== X-Gm-Message-State: AOJu0YxUao0uzR18oKdnYS108sbTfd7kTMbchKOzu1620FsJtuB9Ng3r CDI/CvCndBg8uWdAVQ9WGggia2nakZYkT0NAVPGZp5Pe3Aw+d/T8uwXC6IxTCxI= X-Google-Smtp-Source: AGHT+IH1ssW5/vSpVRxDzxg4eHkIVwWOQqIjrgfeg61KBw7T2MT0MjPO1BcTLkJPiDZomLE1AxaVAg== X-Received: by 2002:a17:906:66da:b0:a2c:1b17:d267 with SMTP id k26-20020a17090666da00b00a2c1b17d267mr2440786ejp.148.1705950025620; Mon, 22 Jan 2024 11:00:25 -0800 (PST) Received: from localhost.localdomain ([2a02:2454:367:1500:fa08:d4d:b569:ac2d]) by smtp.gmail.com with ESMTPSA id k3-20020a170906a38300b00a298d735a1bsm13842413ejz.149.2024.01.22.11.00.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Jan 2024 11:00:25 -0800 (PST) From: Sam Li To: qemu-devel@nongnu.org Cc: qemu-block@nongnu.org, Kevin Wolf , Paolo Bonzini , stefanha@redhat.com, Peter Xu , David Hildenbrand , dmitry.fomichev@wdc.com, hare@suse.de, Hanna Reitz , Eric Blake , Markus Armbruster , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , dlemoal@kernel.org, Keith Busch , Klaus Jensen , Sam Li Subject: [RFC v3 2/7] qcow2: add zd_extension configurations to zoned metadata Date: Mon, 22 Jan 2024 20:00:08 +0100 Message-Id: <20240122190013.41302-3-faithilikerun@gmail.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240122190013.41302-1-faithilikerun@gmail.com> References: <20240122190013.41302-1-faithilikerun@gmail.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2a00:1450:4864:20::634; envelope-from=faithilikerun@gmail.com; helo=mail-ej1-x634.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Zone descriptor extension data is host definied data that is associated with each zone. Add zone descriptor extensions to zonedmeta struct. Signed-off-by: Sam Li --- block/qcow2.c | 70 +++++++++++++++++++++++++++++--- block/qcow2.h | 2 + include/block/block_int-common.h | 6 +++ qapi/block-core.json | 4 ++ 4 files changed, 76 insertions(+), 6 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index db28585b82..5098edf656 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -448,9 +448,9 @@ qcow2_refresh_zonedmeta(BlockDriverState *bs) { int ret; BDRVQcow2State *s = bs->opaque; - uint64_t wps_size = s->zoned_header.zonedmeta_size; + uint64_t wps_size = s->zoned_header.zonedmeta_size - + s->zded_size; g_autofree uint64_t *temp; - temp = g_new(uint64_t, wps_size); ret = bdrv_pread(bs->file, s->zoned_header.zonedmeta_offset, wps_size, temp, 0); @@ -459,7 +459,17 @@ qcow2_refresh_zonedmeta(BlockDriverState *bs) return ret; } + g_autofree uint8_t *zded = NULL; + zded = g_try_malloc0(s->zded_size); + ret = bdrv_pread(bs->file, s->zoned_header.zonedmeta_offset + wps_size, + s->zded_size, zded, 0); + if (ret < 0) { + error_report("Can not read zded"); + return ret; + } + memcpy(bs->wps->wp, temp, wps_size); + memcpy(bs->zd_extensions, zded, s->zded_size); return 0; } @@ -520,6 +530,19 @@ qcow2_check_zone_options(Qcow2ZonedHeaderExtension *zone_opt) zone_opt->max_open_zones = sequential_zones; } + if (zone_opt->zd_extension_size) { + if (zone_opt->zd_extension_size & 0x3f) { + error_report("zone descriptor extension size must be a " + "multiple of 64B"); + return false; + } + + if ((zone_opt->zd_extension_size >> 6) > 0xff) { + error_report("Zone descriptor extension size is too large"); + return false; + } + } + return true; } return false; @@ -784,6 +807,8 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset, zoned_ext.conventional_zones = be32_to_cpu(zoned_ext.conventional_zones); zoned_ext.nr_zones = be32_to_cpu(zoned_ext.nr_zones); + zoned_ext.zd_extension_size = + be32_to_cpu(zoned_ext.zd_extension_size); zoned_ext.max_open_zones = be32_to_cpu(zoned_ext.max_open_zones); zoned_ext.max_active_zones = be32_to_cpu(zoned_ext.max_active_zones); @@ -794,7 +819,8 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset, zoned_ext.zonedmeta_size = be64_to_cpu(zoned_ext.zonedmeta_size); s->zoned_header = zoned_ext; bs->wps = g_malloc(sizeof(BlockZoneWps) - + s->zoned_header.zonedmeta_size); + + zoned_ext.zonedmeta_size - s->zded_size); + bs->zd_extensions = g_malloc0(s->zded_size); ret = qcow2_refresh_zonedmeta(bs); if (ret < 0) { return ret; @@ -2370,6 +2396,7 @@ static void qcow2_refresh_limits(BlockDriverState *bs, Error **errp) bs->bl.zone_size = s->zoned_header.zone_size; bs->bl.zone_capacity = s->zoned_header.zone_capacity; bs->bl.write_granularity = BDRV_SECTOR_SIZE; + bs->bl.zd_extension_size = s->zoned_header.zd_extension_size; } static int GRAPH_UNLOCKED @@ -3621,6 +3648,8 @@ int qcow2_update_header(BlockDriverState *bs) .conventional_zones = cpu_to_be32(s->zoned_header.conventional_zones), .nr_zones = cpu_to_be32(s->zoned_header.nr_zones), + .zd_extension_size = + cpu_to_be32(s->zoned_header.zd_extension_size), .max_open_zones = cpu_to_be32(s->zoned_header.max_open_zones), .max_active_zones = cpu_to_be32(s->zoned_header.max_active_zones), @@ -4373,6 +4402,15 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp) } s->zoned_header.max_append_bytes = zone_host_managed->max_append_bytes; + uint64_t zded_size = 0; + if (zone_host_managed->has_descriptor_extension_size) { + s->zoned_header.zd_extension_size = + zone_host_managed->descriptor_extension_size; + zded_size = s->zoned_header.zd_extension_size * + bs->bl.nr_zones; + } + s->zded_size = zded_size; + if (!qcow2_check_zone_options(&s->zoned_header)) { s->zoned_header.zoned = BLK_Z_NONE; ret = -EINVAL; @@ -4380,7 +4418,7 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp) } uint32_t nrz = s->zoned_header.nr_zones; - zoned_meta_size = sizeof(uint64_t) * nrz; + zoned_meta_size = sizeof(uint64_t) * nrz + zded_size; g_autofree uint64_t *meta = NULL; meta = g_new0(uint64_t, nrz); @@ -4412,12 +4450,25 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp) error_setg_errno(errp, -ret, "Could not zero fill zoned metadata"); goto unlock; } - ret = bdrv_pwrite(blk_bs(blk)->file, offset, zoned_meta_size, meta, 0); + + ret = bdrv_pwrite(blk_bs(blk)->file, offset, + zoned_meta_size - zded_size, meta, 0); if (ret < 0) { error_setg_errno(errp, -ret, "Could not write zoned metadata " "to disk"); goto unlock; } + + if (zone_host_managed->has_descriptor_extension_size) { + /* Initialize zone descriptor extensions */ + ret = bdrv_co_pwrite_zeroes(blk_bs(blk)->file, offset + zded_size, + zded_size, 0); + if (ret < 0) { + error_setg_errno(errp, -ret, "Could not write zone descriptor" + "extensions to disk"); + goto unlock; + } + } } else { s->zoned_header.zoned = BLK_Z_NONE; } @@ -4562,6 +4613,7 @@ qcow2_co_create_opts(BlockDriver *drv, const char *filename, QemuOpts *opts, { BLOCK_OPT_MAX_OPEN_ZONES, "zone.max-open-zones" }, { BLOCK_OPT_MAX_ACTIVE_ZONES, "zone.max-active-zones" }, { BLOCK_OPT_MAX_APPEND_BYTES, "zone.max-append-bytes" }, + { BLOCK_OPT_ZD_EXT_SIZE, "zone.descriptor-extension-size" }, { NULL, NULL }, }; @@ -7126,7 +7178,13 @@ static QemuOptsList qcow2_create_opts = { .name = BLOCK_OPT_MAX_OPEN_ZONES, \ .type = QEMU_OPT_NUMBER, \ .help = "max open zones", \ - }, + }, \ + { \ + .name = BLOCK_OPT_ZD_EXT_SIZE, \ + .type = QEMU_OPT_SIZE, \ + .help = "zone descriptor extension size (defaults " \ + "to 0, must be a multiple of 64 bytes)", \ + }, \ QCOW_COMMON_OPTIONS, { /* end of list */ } } diff --git a/block/qcow2.h b/block/qcow2.h index 7f37bb4034..b7a8f4f4b6 100644 --- a/block/qcow2.h +++ b/block/qcow2.h @@ -249,6 +249,7 @@ typedef struct Qcow2ZonedHeaderExtension { uint32_t max_append_bytes; uint64_t zonedmeta_size; uint64_t zonedmeta_offset; + uint32_t zd_extension_size; /* must be multiple of 64 B */ } QEMU_PACKED Qcow2ZonedHeaderExtension; typedef struct Qcow2ZoneListEntry { @@ -456,6 +457,7 @@ typedef struct BDRVQcow2State { uint32_t nr_zones_exp_open; uint32_t nr_zones_imp_open; uint32_t nr_zones_closed; + uint64_t zded_size; } BDRVQcow2State; typedef struct Qcow2COWRegion { diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h index d48486f344..825b8dac55 100644 --- a/include/block/block_int-common.h +++ b/include/block/block_int-common.h @@ -64,6 +64,7 @@ #define BLOCK_OPT_MAX_APPEND_BYTES "zone.max_append_bytes" #define BLOCK_OPT_MAX_ACTIVE_ZONES "zone.max_active_zones" #define BLOCK_OPT_MAX_OPEN_ZONES "zone.max_open_zones" +#define BLOCK_OPT_ZD_EXT_SIZE "zd_extension_size" #define BLOCK_PROBE_BUF_SIZE 512 @@ -912,6 +913,9 @@ typedef struct BlockLimits { uint32_t max_active_zones; uint32_t write_granularity; + + /* size of data that is associated with a zone in bytes */ + uint32_t zd_extension_size; } BlockLimits; typedef struct BdrvOpBlocker BdrvOpBlocker; @@ -1268,6 +1272,8 @@ struct BlockDriverState { /* array of write pointers' location of each zone in the zoned device. */ BlockZoneWps *wps; + + uint8_t *zd_extensions; }; struct BlockBackendRootState { diff --git a/qapi/block-core.json b/qapi/block-core.json index e2e0ec21a5..485533546a 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -5077,12 +5077,16 @@ # append request that can be issued to the device. It must be # 512-byte aligned and less than the zone capacity. # +# @descriptor-extension-size: The size of zone descriptor extension +# data. Must be a multiple of 64 bytes (default 0) +# # Since 8.2 ## { 'struct': 'Qcow2ZoneHostManaged', 'data': { '*size': 'size', '*capacity': 'size', '*conventional-zones': 'uint32', + '*descriptor-extension-size': 'size', '*max-open-zones': 'uint32', '*max-active-zones': 'uint32', '*max-append-bytes': 'size' } } From patchwork Mon Jan 22 19:00:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sam Li X-Patchwork-Id: 13526111 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 64F5DC47DDC for ; Mon, 22 Jan 2024 19:01:17 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rRzX7-0005eo-De; Mon, 22 Jan 2024 14:00:33 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rRzX5-0005cN-Kq; Mon, 22 Jan 2024 14:00:31 -0500 Received: from mail-ej1-x634.google.com ([2a00:1450:4864:20::634]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rRzX3-00024H-6R; Mon, 22 Jan 2024 14:00:31 -0500 Received: by mail-ej1-x634.google.com with SMTP id a640c23a62f3a-a2f22bfb4e6so347392566b.0; Mon, 22 Jan 2024 11:00:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1705950027; x=1706554827; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=9L5KA42SHaInEEgLWcBpssTsQwiyyBuWZ9qpD9e7NMo=; b=PWEtNZrMHW/VidLh8OspFyiwqQanxdCpvzX64D/NtMZSItLTothJpIT9tnre3eyukm N43hTw2UN4jnOGpWcpQkOGwoksLEpjEKjKgXKxLNORSDehYqgdcgVdEt1GquW6X2P83Y CQjtH3l3nUB+GfV7Tun88oHN+RO1V5YR6vnvVMVqVdJoVdYgKezSH8ZL0T644RsjVFcm bnpl3oIKvqJqnLGcBQjZxkDQ1R9DMvlYOYk1emuc4iV8jQJmgpIejJ2mdgujV+50cwHG bfhkEwBvRXtln7YdVOGMw+8hLVR+EYDXHqqq5Iy30lVjI7vhFdXEG2nTEHW9YqK57CBv KAKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705950027; x=1706554827; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9L5KA42SHaInEEgLWcBpssTsQwiyyBuWZ9qpD9e7NMo=; b=PDX+sAGiIT4nYCUNekpG+NFzF34IGFAUQMrJYtWujUhbYEvPo1wLu2KdGisBk9/9uN 2Qa7O41XSVQbE/12J0IAs2hrTZskjg0lOdlwX6wSW/iZ82PLeq0SKLF5TVoZzYSMRXOS ERGJPF/CJuTyB20qPxdPGlaI3Dx5kXah6HJtcfxpUX+jsis4lRSYK2/aaDPJNp/g74vl r1Il0Fs7stDtq/Cd5C8S8YPIBWjYU9lPdu0vpTHUukcMfA02ZllYneLNAbNFp3GZHkAD F2lfThx4mj7t+C5mEm7h5N5wiUU/oDD5cg1EZQZzK8aiL9vON5vKffM8TbaNAAHk9aaB Gy5g== X-Gm-Message-State: AOJu0Yzehgf5JrC3x3axYvgVesRTBMiw8Mv7moqPglD6KjpgoT8GWLKF S+MMLqErpy8CwnNoIsi35H1MhJ6EuxVs6tEGOzJwGwVA6BAJKFnUl3g6OtSFJOU= X-Google-Smtp-Source: AGHT+IGsmrT9ZKfz31X1a/7KvT3Jw3GmPW3UGwCL6rRxvJw7nDTFgj6qeYymvH1ESOEE8QFv0USXJw== X-Received: by 2002:a17:906:5ca:b0:a2d:d8f0:d987 with SMTP id t10-20020a17090605ca00b00a2dd8f0d987mr2208574ejt.33.1705950027048; Mon, 22 Jan 2024 11:00:27 -0800 (PST) Received: from localhost.localdomain ([2a02:2454:367:1500:fa08:d4d:b569:ac2d]) by smtp.gmail.com with ESMTPSA id k3-20020a170906a38300b00a298d735a1bsm13842413ejz.149.2024.01.22.11.00.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Jan 2024 11:00:26 -0800 (PST) From: Sam Li To: qemu-devel@nongnu.org Cc: qemu-block@nongnu.org, Kevin Wolf , Paolo Bonzini , stefanha@redhat.com, Peter Xu , David Hildenbrand , dmitry.fomichev@wdc.com, hare@suse.de, Hanna Reitz , Eric Blake , Markus Armbruster , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , dlemoal@kernel.org, Keith Busch , Klaus Jensen , Sam Li Subject: [RFC v3 3/7] hw/nvme: use blk_get_*() to access zone info in the block layer Date: Mon, 22 Jan 2024 20:00:09 +0100 Message-Id: <20240122190013.41302-4-faithilikerun@gmail.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240122190013.41302-1-faithilikerun@gmail.com> References: <20240122190013.41302-1-faithilikerun@gmail.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2a00:1450:4864:20::634; envelope-from=faithilikerun@gmail.com; helo=mail-ej1-x634.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org The zone information is contained in the BlockLimits fileds. Add blk_get_*() functions to access the block layer and update zone info accessing in the NVMe device emulation. Signed-off-by: Sam Li --- block/block-backend.c | 72 +++++++++++++++++++++++++++++++ hw/nvme/ctrl.c | 34 +++++---------- hw/nvme/ns.c | 61 ++++++++------------------ hw/nvme/nvme.h | 3 -- include/sysemu/block-backend-io.h | 9 ++++ 5 files changed, 111 insertions(+), 68 deletions(-) diff --git a/block/block-backend.c b/block/block-backend.c index 209eb07528..c23f2a731b 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -2359,6 +2359,78 @@ int blk_get_max_iov(BlockBackend *blk) return blk->root->bs->bl.max_iov; } +uint8_t blk_get_zone_model(BlockBackend *blk) +{ + BlockDriverState *bs = blk_bs(blk); + IO_CODE(); + return bs ? bs->bl.zoned: 0; + +} + +uint32_t blk_get_zone_size(BlockBackend *blk) +{ + BlockDriverState *bs = blk_bs(blk); + IO_CODE(); + + return bs ? bs->bl.zone_size : 0; +} + +uint32_t blk_get_zone_capacity(BlockBackend *blk) +{ + BlockDriverState *bs = blk_bs(blk); + IO_CODE(); + + return bs ? bs->bl.zone_capacity : 0; +} + +uint32_t blk_get_max_open_zones(BlockBackend *blk) +{ + BlockDriverState *bs = blk_bs(blk); + IO_CODE(); + + return bs ? bs->bl.max_open_zones : 0; +} + +uint32_t blk_get_max_active_zones(BlockBackend *blk) +{ + BlockDriverState *bs = blk_bs(blk); + IO_CODE(); + + return bs ? bs->bl.max_active_zones : 0; +} + +uint32_t blk_get_max_append_sectors(BlockBackend *blk) +{ + BlockDriverState *bs = blk_bs(blk); + IO_CODE(); + + return bs ? bs->bl.max_append_sectors : 0; +} + +uint32_t blk_get_nr_zones(BlockBackend *blk) +{ + BlockDriverState *bs = blk_bs(blk); + IO_CODE(); + + return bs ? bs->bl.nr_zones : 0; +} + +uint32_t blk_get_write_granularity(BlockBackend *blk) +{ + BlockDriverState *bs = blk_bs(blk); + IO_CODE(); + + return bs ? bs->bl.write_granularity : 0; +} + +BlockZoneWps *blk_get_zone_wps(BlockBackend *blk) +{ + BlockDriverState *bs = blk_bs(blk); + IO_CODE(); + + return bs ? bs->wps : NULL; +} + void *blk_try_blockalign(BlockBackend *blk, size_t size) { IO_CODE(); diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index f026245d1e..e64b021454 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -417,18 +417,6 @@ static void nvme_assign_zone_state(NvmeNamespace *ns, NvmeZone *zone, static uint16_t nvme_zns_check_resources(NvmeNamespace *ns, uint32_t act, uint32_t opn, uint32_t zrwa) { - if (ns->params.max_active_zones != 0 && - ns->nr_active_zones + act > ns->params.max_active_zones) { - trace_pci_nvme_err_insuff_active_res(ns->params.max_active_zones); - return NVME_ZONE_TOO_MANY_ACTIVE | NVME_DNR; - } - - if (ns->params.max_open_zones != 0 && - ns->nr_open_zones + opn > ns->params.max_open_zones) { - trace_pci_nvme_err_insuff_open_res(ns->params.max_open_zones); - return NVME_ZONE_TOO_MANY_OPEN | NVME_DNR; - } - if (zrwa > ns->zns.numzrwa) { return NVME_NOZRWA | NVME_DNR; } @@ -1988,9 +1976,9 @@ static uint16_t nvme_zrm_reset(NvmeNamespace *ns, NvmeZone *zone) static void nvme_zrm_auto_transition_zone(NvmeNamespace *ns) { NvmeZone *zone; + int moz = blk_get_max_open_zones(ns->blkconf.blk); - if (ns->params.max_open_zones && - ns->nr_open_zones == ns->params.max_open_zones) { + if (moz && ns->nr_open_zones == moz) { zone = QTAILQ_FIRST(&ns->imp_open_zones); if (zone) { /* @@ -2160,7 +2148,7 @@ void nvme_rw_complete_cb(void *opaque, int ret) block_acct_done(stats, acct); } - if (ns->params.zoned && nvme_is_write(req)) { + if (blk_get_zone_model(blk) && nvme_is_write(req)) { nvme_finalize_zoned_write(ns, req); } @@ -2882,7 +2870,7 @@ static void nvme_copy_out_completed_cb(void *opaque, int ret) goto out; } - if (ns->params.zoned) { + if (blk_get_zone_model(ns->blkconf.blk)) { nvme_advance_zone_wp(ns, iocb->zone, nlb); } @@ -2994,7 +2982,7 @@ static void nvme_copy_in_completed_cb(void *opaque, int ret) goto invalid; } - if (ns->params.zoned) { + if (blk_get_zone_model(ns->blkconf.blk)) { status = nvme_check_zone_write(ns, iocb->zone, iocb->slba, nlb); if (status) { goto invalid; @@ -3088,7 +3076,7 @@ static void nvme_do_copy(NvmeCopyAIOCB *iocb) } } - if (ns->params.zoned) { + if (blk_get_zone_model(ns->blkconf.blk)) { status = nvme_check_zone_read(ns, slba, nlb); if (status) { goto invalid; @@ -3164,7 +3152,7 @@ static uint16_t nvme_copy(NvmeCtrl *n, NvmeRequest *req) iocb->slba = le64_to_cpu(copy->sdlba); - if (ns->params.zoned) { + if (blk_get_zone_model(ns->blkconf.blk)) { iocb->zone = nvme_get_zone_by_slba(ns, iocb->slba); if (!iocb->zone) { status = NVME_LBA_RANGE | NVME_DNR; @@ -3434,7 +3422,7 @@ static uint16_t nvme_read(NvmeCtrl *n, NvmeRequest *req) goto invalid; } - if (ns->params.zoned) { + if (blk_get_zone_model(blk)) { status = nvme_check_zone_read(ns, slba, nlb); if (status) { trace_pci_nvme_err_zone_read_not_ok(slba, nlb, status); @@ -3549,7 +3537,7 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest *req, bool append, goto invalid; } - if (ns->params.zoned) { + if (blk_get_zone_model(blk)) { zone = nvme_get_zone_by_slba(ns, slba); assert(zone); @@ -3667,7 +3655,7 @@ static uint16_t nvme_get_mgmt_zone_slba_idx(NvmeNamespace *ns, NvmeCmd *c, uint32_t dw10 = le32_to_cpu(c->cdw10); uint32_t dw11 = le32_to_cpu(c->cdw11); - if (!ns->params.zoned) { + if (blk_get_zone_model(ns->blkconf.blk)) { trace_pci_nvme_err_invalid_opc(c->opcode); return NVME_INVALID_OPCODE | NVME_DNR; } @@ -6527,7 +6515,7 @@ done: static uint16_t nvme_format_check(NvmeNamespace *ns, uint8_t lbaf, uint8_t pi) { - if (ns->params.zoned) { + if (blk_get_zone_model(ns->blkconf.blk)) { return NVME_INVALID_FORMAT | NVME_DNR; } diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c index 0eabcf5cf5..82d4f7932d 100644 --- a/hw/nvme/ns.c +++ b/hw/nvme/ns.c @@ -25,7 +25,6 @@ #include "trace.h" #define MIN_DISCARD_GRANULARITY (4 * KiB) -#define NVME_DEFAULT_ZONE_SIZE (128 * MiB) void nvme_ns_init_format(NvmeNamespace *ns) { @@ -177,19 +176,11 @@ static int nvme_ns_init_blk(NvmeNamespace *ns, Error **errp) static int nvme_ns_zoned_check_calc_geometry(NvmeNamespace *ns, Error **errp) { - uint64_t zone_size, zone_cap; + BlockBackend *blk = ns->blkconf.blk; + uint64_t zone_size = blk_get_zone_size(blk); + uint64_t zone_cap = blk_get_zone_capacity(blk); /* Make sure that the values of ZNS properties are sane */ - if (ns->params.zone_size_bs) { - zone_size = ns->params.zone_size_bs; - } else { - zone_size = NVME_DEFAULT_ZONE_SIZE; - } - if (ns->params.zone_cap_bs) { - zone_cap = ns->params.zone_cap_bs; - } else { - zone_cap = zone_size; - } if (zone_cap > zone_size) { error_setg(errp, "zone capacity %"PRIu64"B exceeds " "zone size %"PRIu64"B", zone_cap, zone_size); @@ -266,6 +257,7 @@ static void nvme_ns_zoned_init_state(NvmeNamespace *ns) static void nvme_ns_init_zoned(NvmeNamespace *ns) { + BlockBackend *blk = ns->blkconf.blk; NvmeIdNsZoned *id_ns_z; int i; @@ -274,8 +266,8 @@ static void nvme_ns_init_zoned(NvmeNamespace *ns) id_ns_z = g_new0(NvmeIdNsZoned, 1); /* MAR/MOR are zeroes-based, FFFFFFFFFh means no limit */ - id_ns_z->mar = cpu_to_le32(ns->params.max_active_zones - 1); - id_ns_z->mor = cpu_to_le32(ns->params.max_open_zones - 1); + id_ns_z->mar = cpu_to_le32(blk_get_max_active_zones(blk) - 1); + id_ns_z->mor = cpu_to_le32(blk_get_max_open_zones(blk) - 1); id_ns_z->zoc = 0; id_ns_z->ozcs = ns->params.cross_zone_read ? NVME_ID_NS_ZONED_OZCS_RAZB : 0x00; @@ -539,6 +531,7 @@ static bool nvme_ns_init_fdp(NvmeNamespace *ns, Error **errp) static int nvme_ns_check_constraints(NvmeNamespace *ns, Error **errp) { + BlockBackend *blk = ns->blkconf.blk; unsigned int pi_size; if (!ns->blkconf.blk) { @@ -577,25 +570,12 @@ static int nvme_ns_check_constraints(NvmeNamespace *ns, Error **errp) return -1; } - if (ns->params.zoned && ns->endgrp && ns->endgrp->fdp.enabled) { + if (blk_get_zone_model(blk) && ns->endgrp && ns->endgrp->fdp.enabled) { error_setg(errp, "cannot be a zoned- in an FDP configuration"); return -1; } - if (ns->params.zoned) { - if (ns->params.max_active_zones) { - if (ns->params.max_open_zones > ns->params.max_active_zones) { - error_setg(errp, "max_open_zones (%u) exceeds " - "max_active_zones (%u)", ns->params.max_open_zones, - ns->params.max_active_zones); - return -1; - } - - if (!ns->params.max_open_zones) { - ns->params.max_open_zones = ns->params.max_active_zones; - } - } - + if (blk_get_zone_model(blk)) { if (ns->params.zd_extension_size) { if (ns->params.zd_extension_size & 0x3f) { error_setg(errp, "zone descriptor extension size must be a " @@ -630,14 +610,14 @@ static int nvme_ns_check_constraints(NvmeNamespace *ns, Error **errp) return -1; } - if (ns->params.max_active_zones) { - if (ns->params.numzrwa > ns->params.max_active_zones) { + int maz = blk_get_max_active_zones(blk); + if (maz) { + if (ns->params.numzrwa > maz) { error_setg(errp, "number of zone random write area " "resources (zoned.numzrwa, %d) must be less " "than or equal to maximum active resources " "(zoned.max_active_zones, %d)", - ns->params.numzrwa, - ns->params.max_active_zones); + ns->params.numzrwa, maz); return -1; } } @@ -660,7 +640,7 @@ int nvme_ns_setup(NvmeNamespace *ns, Error **errp) if (nvme_ns_init(ns, errp)) { return -1; } - if (ns->params.zoned) { + if (blk_get_zone_model(ns->blkconf.blk)) { if (nvme_ns_zoned_check_calc_geometry(ns, errp) != 0) { return -1; } @@ -683,15 +663,17 @@ void nvme_ns_drain(NvmeNamespace *ns) void nvme_ns_shutdown(NvmeNamespace *ns) { - blk_flush(ns->blkconf.blk); - if (ns->params.zoned) { + + BlockBackend *blk = ns->blkconf.blk; + blk_flush(blk); + if (blk_get_zone_model(blk)) { nvme_zoned_ns_shutdown(ns); } } void nvme_ns_cleanup(NvmeNamespace *ns) { - if (ns->params.zoned) { + if (blk_get_zone_model(ns->blkconf.blk)) { g_free(ns->id_ns_zoned); g_free(ns->zone_array); g_free(ns->zd_extensions); @@ -806,11 +788,6 @@ static Property nvme_ns_props[] = { DEFINE_PROP_UINT16("mssrl", NvmeNamespace, params.mssrl, 128), DEFINE_PROP_UINT32("mcl", NvmeNamespace, params.mcl, 128), DEFINE_PROP_UINT8("msrc", NvmeNamespace, params.msrc, 127), - DEFINE_PROP_BOOL("zoned", NvmeNamespace, params.zoned, false), - DEFINE_PROP_SIZE("zoned.zone_size", NvmeNamespace, params.zone_size_bs, - NVME_DEFAULT_ZONE_SIZE), - DEFINE_PROP_SIZE("zoned.zone_capacity", NvmeNamespace, params.zone_cap_bs, - 0), DEFINE_PROP_BOOL("zoned.cross_read", NvmeNamespace, params.cross_zone_read, false), DEFINE_PROP_UINT32("zoned.max_active", NvmeNamespace, diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h index 5f2ae7b28b..76677a86e9 100644 --- a/hw/nvme/nvme.h +++ b/hw/nvme/nvme.h @@ -189,10 +189,7 @@ typedef struct NvmeNamespaceParams { uint32_t mcl; uint8_t msrc; - bool zoned; bool cross_zone_read; - uint64_t zone_size_bs; - uint64_t zone_cap_bs; uint32_t max_active_zones; uint32_t max_open_zones; uint32_t zd_extension_size; diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-io.h index d174275a5c..44e44954fa 100644 --- a/include/sysemu/block-backend-io.h +++ b/include/sysemu/block-backend-io.h @@ -99,6 +99,15 @@ void blk_error_action(BlockBackend *blk, BlockErrorAction action, void blk_iostatus_set_err(BlockBackend *blk, int error); int blk_get_max_iov(BlockBackend *blk); int blk_get_max_hw_iov(BlockBackend *blk); +uint8_t blk_get_zone_model(BlockBackend *blk); +uint32_t blk_get_zone_size(BlockBackend *blk); +uint32_t blk_get_zone_capacity(BlockBackend *blk); +uint32_t blk_get_max_open_zones(BlockBackend *blk); +uint32_t blk_get_max_active_zones(BlockBackend *blk); +uint32_t blk_get_max_append_sectors(BlockBackend *blk); +uint32_t blk_get_nr_zones(BlockBackend *blk); +uint32_t blk_get_write_granularity(BlockBackend *blk); +BlockZoneWps *blk_get_zone_wps(BlockBackend *blk); AioContext *blk_get_aio_context(BlockBackend *blk); BlockAcctStats *blk_get_stats(BlockBackend *blk); From patchwork Mon Jan 22 19:00:10 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sam Li X-Patchwork-Id: 13526109 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 53CDBC47DD9 for ; Mon, 22 Jan 2024 19:01:17 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rRzX8-0005fC-2u; Mon, 22 Jan 2024 14:00:34 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rRzX7-0005ef-1T; Mon, 22 Jan 2024 14:00:33 -0500 Received: from mail-ed1-x533.google.com ([2a00:1450:4864:20::533]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rRzX5-00024c-22; Mon, 22 Jan 2024 14:00:32 -0500 Received: by mail-ed1-x533.google.com with SMTP id 4fb4d7f45d1cf-559cef15db5so8054606a12.0; Mon, 22 Jan 2024 11:00:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1705950029; x=1706554829; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=+OvmjsEOUcoHzwGCXwmbsbs97JD2U8yE+EAWzlHpR9c=; b=PU0PVaBw+kXCLPQYCjlD6bzzUqQggivQy/MVYm7cye6vx+X8SFGw1ETGRWc6H2eifN xeH9AZJ+/s6bnTgT5l657G5ZSNVbemrZ5Tu6gCTT4UG2IpTfAjFNljvlPDjeZp9btDKP 00wK9RH3zOHk+bzSKFHYVoSuh7Hqb+1MiwYZf2QYOQXEn7J13XpGIUqG+AK3KcD1zlA+ znzGoPdn2Vp4pw3Z2wQ/NBxYHvrUX6b/a3tTa2dZCP17DzlKZk+aM2sU4vjWkCppC1Rh a1lKjQ1yyqmZWKK8WaInj1bst7XulP0LUDb7L9l7OKrupHoz7RdZ0iSZ0jD3R9ggLozq NjAw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705950029; x=1706554829; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+OvmjsEOUcoHzwGCXwmbsbs97JD2U8yE+EAWzlHpR9c=; b=QXrj+bKQLLEnqaWjzk9y3+EED5WhCgaj0M53cMbjnBR6ngM8HsVsXDF8oU9C965scR leVFVnCYxkdrE9LdQiVp3bfCeM3kpb4XWt3Bb2n++s2AJz4zdPlidPZGibZbuvs9HiVo AVyz6AdaKoFHWkwS7aU/aZqZVQZBqWpYGYmnA0Roqqi1nYxoaWg8vTPgLGeJoEWXKijZ eaK2udISHxsealgu0Pq+DoPjDJm5Yst120601w/7SiG0hvGElkY9wflrxY/I5rmF58Gi Gm6o4Ir5DUAvqGqC3KzquRLItXQW6jWyhHUsrMJXK67Zu7MeAjBG0hcy8E968q1ycmFc 1Dmw== X-Gm-Message-State: AOJu0YykyZmYqIt9MFaAE2WIZnNqm8FBV0zS33ecl5+YNIq/06OVDPcO KsE8LOLiYOSMcp198Lb92D8Ri1V6wiPcksxwRjEF5ypoZuQV4HO6o4UC/LFNHLA= X-Google-Smtp-Source: AGHT+IFpvk7tWZpswHIsn7Vwd09dqjZ3xADYWH8+OrTeyFyOWJcBYquxtJen3928eYl5NU5H9nrS/Q== X-Received: by 2002:a17:906:e297:b0:a2c:aaa5:1a10 with SMTP id gg23-20020a170906e29700b00a2caaa51a10mr4820768ejb.5.1705950028646; Mon, 22 Jan 2024 11:00:28 -0800 (PST) Received: from localhost.localdomain ([2a02:2454:367:1500:fa08:d4d:b569:ac2d]) by smtp.gmail.com with ESMTPSA id k3-20020a170906a38300b00a298d735a1bsm13842413ejz.149.2024.01.22.11.00.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Jan 2024 11:00:28 -0800 (PST) From: Sam Li To: qemu-devel@nongnu.org Cc: qemu-block@nongnu.org, Kevin Wolf , Paolo Bonzini , stefanha@redhat.com, Peter Xu , David Hildenbrand , dmitry.fomichev@wdc.com, hare@suse.de, Hanna Reitz , Eric Blake , Markus Armbruster , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , dlemoal@kernel.org, Keith Busch , Klaus Jensen , Sam Li Subject: [RFC v3 4/7] hw/nvme: add blk_get_zone_extension to access zd_extensions Date: Mon, 22 Jan 2024 20:00:10 +0100 Message-Id: <20240122190013.41302-5-faithilikerun@gmail.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240122190013.41302-1-faithilikerun@gmail.com> References: <20240122190013.41302-1-faithilikerun@gmail.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2a00:1450:4864:20::533; envelope-from=faithilikerun@gmail.com; helo=mail-ed1-x533.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Signed-off-by: Sam Li --- block/block-backend.c | 16 ++++++++++++++++ hw/nvme/ctrl.c | 20 ++++++++++++++------ hw/nvme/ns.c | 24 ++++-------------------- hw/nvme/nvme.h | 7 ------- include/sysemu/block-backend-io.h | 2 ++ 5 files changed, 36 insertions(+), 33 deletions(-) diff --git a/block/block-backend.c b/block/block-backend.c index c23f2a731b..3bebee12b9 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -2431,6 +2431,22 @@ BlockZoneWps *blk_get_zone_wps(BlockBackend *blk) return bs ? bs->wps : NULL; } +uint8_t *blk_get_zone_extension(BlockBackend *blk) +{ + BlockDriverState *bs = blk_bs(blk); + IO_CODE(); + + return bs ? bs->zd_extensions : NULL; +} + +uint32_t blk_get_zd_ext_size(BlockBackend *blk) +{ + BlockDriverState *bs = blk_bs(blk); + IO_CODE(); + + return bs ? bs->bl.zd_extension_size : 0; +} + void *blk_try_blockalign(BlockBackend *blk, size_t size) { IO_CODE(); diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index e64b021454..dae6f00e4f 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -4004,6 +4004,12 @@ static uint16_t nvme_zone_mgmt_send_zrwa_flush(NvmeCtrl *n, NvmeZone *zone, return NVME_SUCCESS; } +static inline uint8_t *nvme_get_zd_extension(NvmeNamespace *ns, + uint32_t zone_idx) +{ + return &ns->zd_extensions[zone_idx * blk_get_zd_ext_size(ns->blkconf.blk)]; +} + static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, NvmeRequest *req) { NvmeZoneSendCmd *cmd = (NvmeZoneSendCmd *)&req->cmd; @@ -4088,11 +4094,11 @@ static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, NvmeRequest *req) case NVME_ZONE_ACTION_SET_ZD_EXT: trace_pci_nvme_set_descriptor_extension(slba, zone_idx); - if (all || !ns->params.zd_extension_size) { + if (all || !blk_get_zd_ext_size(ns->blkconf.blk)) { return NVME_INVALID_FIELD | NVME_DNR; } zd_ext = nvme_get_zd_extension(ns, zone_idx); - status = nvme_h2c(n, zd_ext, ns->params.zd_extension_size, req); + status = nvme_h2c(n, zd_ext, blk_get_zd_ext_size(ns->blkconf.blk), req); if (status) { trace_pci_nvme_err_zd_extension_map_error(zone_idx); return status; @@ -4183,7 +4189,8 @@ static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, NvmeRequest *req) if (zra != NVME_ZONE_REPORT && zra != NVME_ZONE_REPORT_EXTENDED) { return NVME_INVALID_FIELD | NVME_DNR; } - if (zra == NVME_ZONE_REPORT_EXTENDED && !ns->params.zd_extension_size) { + if (zra == NVME_ZONE_REPORT_EXTENDED && + !blk_get_zd_ext_size(ns->blkconf.blk)) { return NVME_INVALID_FIELD | NVME_DNR; } @@ -4205,7 +4212,7 @@ static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, NvmeRequest *req) zone_entry_sz = sizeof(NvmeZoneDescr); if (zra == NVME_ZONE_REPORT_EXTENDED) { - zone_entry_sz += ns->params.zd_extension_size; + zone_entry_sz += blk_get_zd_ext_size(ns->blkconf.blk) ; } max_zones = (data_size - sizeof(NvmeZoneReportHeader)) / zone_entry_sz; @@ -4243,11 +4250,12 @@ static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, NvmeRequest *req) } if (zra == NVME_ZONE_REPORT_EXTENDED) { + int zd_ext_size = blk_get_zd_ext_size(ns->blkconf.blk); if (zone->d.za & NVME_ZA_ZD_EXT_VALID) { memcpy(buf_p, nvme_get_zd_extension(ns, zone_idx), - ns->params.zd_extension_size); + zd_ext_size); } - buf_p += ns->params.zd_extension_size; + buf_p += zd_ext_size; } max_zones--; diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c index 82d4f7932d..45c08391f5 100644 --- a/hw/nvme/ns.c +++ b/hw/nvme/ns.c @@ -218,15 +218,15 @@ static int nvme_ns_zoned_check_calc_geometry(NvmeNamespace *ns, Error **errp) static void nvme_ns_zoned_init_state(NvmeNamespace *ns) { + BlockBackend *blk = ns->blkconf.blk; uint64_t start = 0, zone_size = ns->zone_size; uint64_t capacity = ns->num_zones * zone_size; NvmeZone *zone; int i; ns->zone_array = g_new0(NvmeZone, ns->num_zones); - if (ns->params.zd_extension_size) { - ns->zd_extensions = g_malloc0(ns->params.zd_extension_size * - ns->num_zones); + if (blk_get_zone_extension(blk)) { + ns->zd_extensions = blk_get_zone_extension(blk); } QTAILQ_INIT(&ns->exp_open_zones); @@ -275,7 +275,7 @@ static void nvme_ns_init_zoned(NvmeNamespace *ns) for (i = 0; i <= ns->id_ns.nlbaf; i++) { id_ns_z->lbafe[i].zsze = cpu_to_le64(ns->zone_size); id_ns_z->lbafe[i].zdes = - ns->params.zd_extension_size >> 6; /* Units of 64B */ + blk_get_zd_ext_size(blk) >> 6; /* Units of 64B */ } if (ns->params.zrwas) { @@ -576,19 +576,6 @@ static int nvme_ns_check_constraints(NvmeNamespace *ns, Error **errp) } if (blk_get_zone_model(blk)) { - if (ns->params.zd_extension_size) { - if (ns->params.zd_extension_size & 0x3f) { - error_setg(errp, "zone descriptor extension size must be a " - "multiple of 64B"); - return -1; - } - if ((ns->params.zd_extension_size >> 6) > 0xff) { - error_setg(errp, - "zone descriptor extension size is too large"); - return -1; - } - } - if (ns->params.zrwas) { if (ns->params.zrwas % ns->blkconf.logical_block_size) { error_setg(errp, "zone random write area size (zoned.zrwas " @@ -676,7 +663,6 @@ void nvme_ns_cleanup(NvmeNamespace *ns) if (blk_get_zone_model(ns->blkconf.blk)) { g_free(ns->id_ns_zoned); g_free(ns->zone_array); - g_free(ns->zd_extensions); } if (ns->endgrp && ns->endgrp->fdp.enabled) { @@ -794,8 +780,6 @@ static Property nvme_ns_props[] = { params.max_active_zones, 0), DEFINE_PROP_UINT32("zoned.max_open", NvmeNamespace, params.max_open_zones, 0), - DEFINE_PROP_UINT32("zoned.descr_ext_size", NvmeNamespace, - params.zd_extension_size, 0), DEFINE_PROP_UINT32("zoned.numzrwa", NvmeNamespace, params.numzrwa, 0), DEFINE_PROP_SIZE("zoned.zrwas", NvmeNamespace, params.zrwas, 0), DEFINE_PROP_SIZE("zoned.zrwafg", NvmeNamespace, params.zrwafg, -1), diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h index 76677a86e9..37007952fc 100644 --- a/hw/nvme/nvme.h +++ b/hw/nvme/nvme.h @@ -192,7 +192,6 @@ typedef struct NvmeNamespaceParams { bool cross_zone_read; uint32_t max_active_zones; uint32_t max_open_zones; - uint32_t zd_extension_size; uint32_t numzrwa; uint64_t zrwas; @@ -315,12 +314,6 @@ static inline bool nvme_wp_is_valid(NvmeZone *zone) st != NVME_ZONE_STATE_OFFLINE; } -static inline uint8_t *nvme_get_zd_extension(NvmeNamespace *ns, - uint32_t zone_idx) -{ - return &ns->zd_extensions[zone_idx * ns->params.zd_extension_size]; -} - static inline void nvme_aor_inc_open(NvmeNamespace *ns) { assert(ns->nr_open_zones >= 0); diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-io.h index 44e44954fa..ab388801b1 100644 --- a/include/sysemu/block-backend-io.h +++ b/include/sysemu/block-backend-io.h @@ -108,6 +108,8 @@ uint32_t blk_get_max_append_sectors(BlockBackend *blk); uint32_t blk_get_nr_zones(BlockBackend *blk); uint32_t blk_get_write_granularity(BlockBackend *blk); BlockZoneWps *blk_get_zone_wps(BlockBackend *blk); +uint8_t *blk_get_zone_extension(BlockBackend *blk); +uint32_t blk_get_zd_ext_size(BlockBackend *blk); AioContext *blk_get_aio_context(BlockBackend *blk); BlockAcctStats *blk_get_stats(BlockBackend *blk); From patchwork Mon Jan 22 19:00:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sam Li X-Patchwork-Id: 13526112 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 87440C47DAF for ; Mon, 22 Jan 2024 19:01:32 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rRzXC-0005gv-OC; Mon, 22 Jan 2024 14:00:38 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rRzXB-0005gC-9c; Mon, 22 Jan 2024 14:00:37 -0500 Received: from mail-ed1-x52a.google.com ([2a00:1450:4864:20::52a]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rRzX7-000256-53; Mon, 22 Jan 2024 14:00:36 -0500 Received: by mail-ed1-x52a.google.com with SMTP id 4fb4d7f45d1cf-559cef15db5so8054675a12.0; Mon, 22 Jan 2024 11:00:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1705950031; x=1706554831; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=KJj4/sPsUc3WFPNiDK5H2Uhaq/LbbdIs1YuKh3jTyws=; b=cUYNcGAXT6hIE22KvVSWevEHMvV5ku6IqwySOytxWeEb3pVSu4lq/XB2VykLpR9ENe t26Ft7PFjF8mmc833EkfPwIZ+1HcbVigysKIm3g4joP4A4hjyg1Ab/9q3KGGwUcbgRTX 61eXMYxpyXmiYtv5BcYBE81oTTmYXoaVshS0ef8oB6JXyQSgnAslg52utoH58lyEKNMm rBN+yz+VAfUUCHfEbt2l234OBMpqoIIpNY/oOAdDCUHSHUgqpgfpShIZgimcr/huxyey MkeImh8n2iEe0I1E52A1viV/OQiRK9zUAiXULNw9xadWWKdNUCQjj0rSjDMtTDDhkgrf wD+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705950031; x=1706554831; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KJj4/sPsUc3WFPNiDK5H2Uhaq/LbbdIs1YuKh3jTyws=; b=jSehBPvOGwwrJs3bWTJA9DCTgGFYPBe6lPajYgtOymsx+6Tcr6Z4DC6j+5TZyWRbST HgWN6WVRWw67IgWpt0k2jsNjblBVUw2KOBOAkzkOPk9pWf+kif0jJ4rm+Lo9bY42v/q4 jF7Nc6NTZfc0UKYxVFRxkghhqPM0okNRPSYyoV2iJwf7WSTiZSv9gP2KQIdrPAii5xfv 7/xZUsVyEyWuggvNVrmSfG4Mfm2H4w7wfVuLOut8ers56cxD9kWHcmPLSRyjQuA3TbOg LbYS/UniqOokISV+j6tylYQllodz+0SgTq+K6Eah3iuotkOaCxg4RRcIt5uMNb8YLKVD AjOA== X-Gm-Message-State: AOJu0Yzpjstg1Mq2O8xy/nD+zklIzmvhiuST2ZrUOqmZd+UgKWu7KtF+ QwIYzjUFHItjcEJQoGDNoXKgMRkUbZ4//odlfl1oRXPaiy47CWYZWK1ozPXlzhw= X-Google-Smtp-Source: AGHT+IGFAs5rXOG2DKYBICletP8EEBgE0H7Xp9XT442x3o4lsUAV8kLTcFZ9teQYwBjjz6kQ+gX4pg== X-Received: by 2002:a17:906:32ce:b0:a26:f7ea:7cb6 with SMTP id k14-20020a17090632ce00b00a26f7ea7cb6mr6992790ejk.16.1705950030674; Mon, 22 Jan 2024 11:00:30 -0800 (PST) Received: from localhost.localdomain ([2a02:2454:367:1500:fa08:d4d:b569:ac2d]) by smtp.gmail.com with ESMTPSA id k3-20020a170906a38300b00a298d735a1bsm13842413ejz.149.2024.01.22.11.00.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Jan 2024 11:00:30 -0800 (PST) From: Sam Li To: qemu-devel@nongnu.org Cc: qemu-block@nongnu.org, Kevin Wolf , Paolo Bonzini , stefanha@redhat.com, Peter Xu , David Hildenbrand , dmitry.fomichev@wdc.com, hare@suse.de, Hanna Reitz , Eric Blake , Markus Armbruster , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , dlemoal@kernel.org, Keith Busch , Klaus Jensen , Sam Li Subject: [RFC v3 5/7] hw/nvme: make the metadata of ZNS emulation persistent Date: Mon, 22 Jan 2024 20:00:11 +0100 Message-Id: <20240122190013.41302-6-faithilikerun@gmail.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240122190013.41302-1-faithilikerun@gmail.com> References: <20240122190013.41302-1-faithilikerun@gmail.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2a00:1450:4864:20::52a; envelope-from=faithilikerun@gmail.com; helo=mail-ed1-x52a.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org The NVMe ZNS devices follow NVMe ZNS spec but the state of namespace zones does not persist accross restarts of QEMU. This patch makes the metadata of ZNS emulation persistent by using new block layer APIs. The ZNS device calls zone report and zone mgmt APIs from the block layer which will handle zone state transition and manage zone resources. Signed-off-by: Sam Li --- block/qcow2.c | 3 + hw/nvme/ctrl.c | 1115 +++++++----------------------- hw/nvme/ns.c | 77 +-- hw/nvme/nvme.h | 85 +-- include/block/block-common.h | 8 + include/block/block_int-common.h | 2 + 6 files changed, 264 insertions(+), 1026 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index 5098edf656..0bb249fa6e 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -5107,6 +5107,9 @@ qcow2_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op, case BLK_ZO_RESET: ret = qcow2_reset_zone(bs, index, len); break; + case BLK_ZO_OFFLINE: + /* There are no transitions from the offline state to any other state */ + break; default: error_report("Unsupported zone op: 0x%x", op); ret = -ENOTSUP; diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index dae6f00e4f..e31aa52c06 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -372,67 +372,6 @@ static inline bool nvme_parse_pid(NvmeNamespace *ns, uint16_t pid, return nvme_ph_valid(ns, *ph) && nvme_rg_valid(ns->endgrp, *rg); } -static void nvme_assign_zone_state(NvmeNamespace *ns, NvmeZone *zone, - NvmeZoneState state) -{ - if (QTAILQ_IN_USE(zone, entry)) { - switch (nvme_get_zone_state(zone)) { - case NVME_ZONE_STATE_EXPLICITLY_OPEN: - QTAILQ_REMOVE(&ns->exp_open_zones, zone, entry); - break; - case NVME_ZONE_STATE_IMPLICITLY_OPEN: - QTAILQ_REMOVE(&ns->imp_open_zones, zone, entry); - break; - case NVME_ZONE_STATE_CLOSED: - QTAILQ_REMOVE(&ns->closed_zones, zone, entry); - break; - case NVME_ZONE_STATE_FULL: - QTAILQ_REMOVE(&ns->full_zones, zone, entry); - default: - ; - } - } - - nvme_set_zone_state(zone, state); - - switch (state) { - case NVME_ZONE_STATE_EXPLICITLY_OPEN: - QTAILQ_INSERT_TAIL(&ns->exp_open_zones, zone, entry); - break; - case NVME_ZONE_STATE_IMPLICITLY_OPEN: - QTAILQ_INSERT_TAIL(&ns->imp_open_zones, zone, entry); - break; - case NVME_ZONE_STATE_CLOSED: - QTAILQ_INSERT_TAIL(&ns->closed_zones, zone, entry); - break; - case NVME_ZONE_STATE_FULL: - QTAILQ_INSERT_TAIL(&ns->full_zones, zone, entry); - case NVME_ZONE_STATE_READ_ONLY: - break; - default: - zone->d.za = 0; - } -} - -static uint16_t nvme_zns_check_resources(NvmeNamespace *ns, uint32_t act, - uint32_t opn, uint32_t zrwa) -{ - if (zrwa > ns->zns.numzrwa) { - return NVME_NOZRWA | NVME_DNR; - } - - return NVME_SUCCESS; -} - -/* - * Check if we can open a zone without exceeding open/active limits. - * AOR stands for "Active and Open Resources" (see TP 4053 section 2.5). - */ -static uint16_t nvme_aor_check(NvmeNamespace *ns, uint32_t act, uint32_t opn) -{ - return nvme_zns_check_resources(ns, act, opn, 0); -} - static NvmeFdpEvent *nvme_fdp_alloc_event(NvmeCtrl *n, NvmeFdpEventBuffer *ebuf) { NvmeFdpEvent *ret = NULL; @@ -1769,355 +1708,11 @@ static inline uint32_t nvme_zone_idx(NvmeNamespace *ns, uint64_t slba) slba / ns->zone_size; } -static inline NvmeZone *nvme_get_zone_by_slba(NvmeNamespace *ns, uint64_t slba) -{ - uint32_t zone_idx = nvme_zone_idx(ns, slba); - - if (zone_idx >= ns->num_zones) { - return NULL; - } - - return &ns->zone_array[zone_idx]; -} - -static uint16_t nvme_check_zone_state_for_write(NvmeZone *zone) -{ - uint64_t zslba = zone->d.zslba; - - switch (nvme_get_zone_state(zone)) { - case NVME_ZONE_STATE_EMPTY: - case NVME_ZONE_STATE_IMPLICITLY_OPEN: - case NVME_ZONE_STATE_EXPLICITLY_OPEN: - case NVME_ZONE_STATE_CLOSED: - return NVME_SUCCESS; - case NVME_ZONE_STATE_FULL: - trace_pci_nvme_err_zone_is_full(zslba); - return NVME_ZONE_FULL; - case NVME_ZONE_STATE_OFFLINE: - trace_pci_nvme_err_zone_is_offline(zslba); - return NVME_ZONE_OFFLINE; - case NVME_ZONE_STATE_READ_ONLY: - trace_pci_nvme_err_zone_is_read_only(zslba); - return NVME_ZONE_READ_ONLY; - default: - assert(false); - } - - return NVME_INTERNAL_DEV_ERROR; -} - -static uint16_t nvme_check_zone_write(NvmeNamespace *ns, NvmeZone *zone, - uint64_t slba, uint32_t nlb) -{ - uint64_t zcap = nvme_zone_wr_boundary(zone); - uint16_t status; - - status = nvme_check_zone_state_for_write(zone); - if (status) { - return status; - } - - if (zone->d.za & NVME_ZA_ZRWA_VALID) { - uint64_t ezrwa = zone->w_ptr + 2 * ns->zns.zrwas; - - if (slba < zone->w_ptr || slba + nlb > ezrwa) { - trace_pci_nvme_err_zone_invalid_write(slba, zone->w_ptr); - return NVME_ZONE_INVALID_WRITE; - } - } else { - if (unlikely(slba != zone->w_ptr)) { - trace_pci_nvme_err_write_not_at_wp(slba, zone->d.zslba, - zone->w_ptr); - return NVME_ZONE_INVALID_WRITE; - } - } - - if (unlikely((slba + nlb) > zcap)) { - trace_pci_nvme_err_zone_boundary(slba, nlb, zcap); - return NVME_ZONE_BOUNDARY_ERROR; - } - - return NVME_SUCCESS; -} - -static uint16_t nvme_check_zone_state_for_read(NvmeZone *zone) -{ - switch (nvme_get_zone_state(zone)) { - case NVME_ZONE_STATE_EMPTY: - case NVME_ZONE_STATE_IMPLICITLY_OPEN: - case NVME_ZONE_STATE_EXPLICITLY_OPEN: - case NVME_ZONE_STATE_FULL: - case NVME_ZONE_STATE_CLOSED: - case NVME_ZONE_STATE_READ_ONLY: - return NVME_SUCCESS; - case NVME_ZONE_STATE_OFFLINE: - trace_pci_nvme_err_zone_is_offline(zone->d.zslba); - return NVME_ZONE_OFFLINE; - default: - assert(false); - } - - return NVME_INTERNAL_DEV_ERROR; -} - -static uint16_t nvme_check_zone_read(NvmeNamespace *ns, uint64_t slba, - uint32_t nlb) -{ - NvmeZone *zone; - uint64_t bndry, end; - uint16_t status; - - zone = nvme_get_zone_by_slba(ns, slba); - assert(zone); - - bndry = nvme_zone_rd_boundary(ns, zone); - end = slba + nlb; - - status = nvme_check_zone_state_for_read(zone); - if (status) { - ; - } else if (unlikely(end > bndry)) { - if (!ns->params.cross_zone_read) { - status = NVME_ZONE_BOUNDARY_ERROR; - } else { - /* - * Read across zone boundary - check that all subsequent - * zones that are being read have an appropriate state. - */ - do { - zone++; - status = nvme_check_zone_state_for_read(zone); - if (status) { - break; - } - } while (end > nvme_zone_rd_boundary(ns, zone)); - } - } - - return status; -} - -static uint16_t nvme_zrm_finish(NvmeNamespace *ns, NvmeZone *zone) -{ - switch (nvme_get_zone_state(zone)) { - case NVME_ZONE_STATE_FULL: - return NVME_SUCCESS; - - case NVME_ZONE_STATE_IMPLICITLY_OPEN: - case NVME_ZONE_STATE_EXPLICITLY_OPEN: - nvme_aor_dec_open(ns); - /* fallthrough */ - case NVME_ZONE_STATE_CLOSED: - nvme_aor_dec_active(ns); - - if (zone->d.za & NVME_ZA_ZRWA_VALID) { - zone->d.za &= ~NVME_ZA_ZRWA_VALID; - if (ns->params.numzrwa) { - ns->zns.numzrwa++; - } - } - - /* fallthrough */ - case NVME_ZONE_STATE_EMPTY: - nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_FULL); - return NVME_SUCCESS; - - default: - return NVME_ZONE_INVAL_TRANSITION; - } -} - -static uint16_t nvme_zrm_close(NvmeNamespace *ns, NvmeZone *zone) -{ - switch (nvme_get_zone_state(zone)) { - case NVME_ZONE_STATE_EXPLICITLY_OPEN: - case NVME_ZONE_STATE_IMPLICITLY_OPEN: - nvme_aor_dec_open(ns); - nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_CLOSED); - /* fall through */ - case NVME_ZONE_STATE_CLOSED: - return NVME_SUCCESS; - - default: - return NVME_ZONE_INVAL_TRANSITION; - } -} - -static uint16_t nvme_zrm_reset(NvmeNamespace *ns, NvmeZone *zone) -{ - switch (nvme_get_zone_state(zone)) { - case NVME_ZONE_STATE_EXPLICITLY_OPEN: - case NVME_ZONE_STATE_IMPLICITLY_OPEN: - nvme_aor_dec_open(ns); - /* fallthrough */ - case NVME_ZONE_STATE_CLOSED: - nvme_aor_dec_active(ns); - - if (zone->d.za & NVME_ZA_ZRWA_VALID) { - if (ns->params.numzrwa) { - ns->zns.numzrwa++; - } - } - - /* fallthrough */ - case NVME_ZONE_STATE_FULL: - zone->w_ptr = zone->d.zslba; - zone->d.wp = zone->w_ptr; - nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_EMPTY); - /* fallthrough */ - case NVME_ZONE_STATE_EMPTY: - return NVME_SUCCESS; - - default: - return NVME_ZONE_INVAL_TRANSITION; - } -} - -static void nvme_zrm_auto_transition_zone(NvmeNamespace *ns) -{ - NvmeZone *zone; - int moz = blk_get_max_open_zones(ns->blkconf.blk); - - if (moz && ns->nr_open_zones == moz) { - zone = QTAILQ_FIRST(&ns->imp_open_zones); - if (zone) { - /* - * Automatically close this implicitly open zone. - */ - QTAILQ_REMOVE(&ns->imp_open_zones, zone, entry); - nvme_zrm_close(ns, zone); - } - } -} - enum { NVME_ZRM_AUTO = 1 << 0, NVME_ZRM_ZRWA = 1 << 1, }; -static uint16_t nvme_zrm_open_flags(NvmeCtrl *n, NvmeNamespace *ns, - NvmeZone *zone, int flags) -{ - int act = 0; - uint16_t status; - - switch (nvme_get_zone_state(zone)) { - case NVME_ZONE_STATE_EMPTY: - act = 1; - - /* fallthrough */ - - case NVME_ZONE_STATE_CLOSED: - if (n->params.auto_transition_zones) { - nvme_zrm_auto_transition_zone(ns); - } - status = nvme_zns_check_resources(ns, act, 1, - (flags & NVME_ZRM_ZRWA) ? 1 : 0); - if (status) { - return status; - } - - if (act) { - nvme_aor_inc_active(ns); - } - - nvme_aor_inc_open(ns); - - if (flags & NVME_ZRM_AUTO) { - nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_IMPLICITLY_OPEN); - return NVME_SUCCESS; - } - - /* fallthrough */ - - case NVME_ZONE_STATE_IMPLICITLY_OPEN: - if (flags & NVME_ZRM_AUTO) { - return NVME_SUCCESS; - } - - nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_EXPLICITLY_OPEN); - - /* fallthrough */ - - case NVME_ZONE_STATE_EXPLICITLY_OPEN: - if (flags & NVME_ZRM_ZRWA) { - ns->zns.numzrwa--; - - zone->d.za |= NVME_ZA_ZRWA_VALID; - } - - return NVME_SUCCESS; - - default: - return NVME_ZONE_INVAL_TRANSITION; - } -} - -static inline uint16_t nvme_zrm_auto(NvmeCtrl *n, NvmeNamespace *ns, - NvmeZone *zone) -{ - return nvme_zrm_open_flags(n, ns, zone, NVME_ZRM_AUTO); -} - -static void nvme_advance_zone_wp(NvmeNamespace *ns, NvmeZone *zone, - uint32_t nlb) -{ - zone->d.wp += nlb; - - if (zone->d.wp == nvme_zone_wr_boundary(zone)) { - nvme_zrm_finish(ns, zone); - } -} - -static void nvme_zoned_zrwa_implicit_flush(NvmeNamespace *ns, NvmeZone *zone, - uint32_t nlbc) -{ - uint16_t nzrwafgs = DIV_ROUND_UP(nlbc, ns->zns.zrwafg); - - nlbc = nzrwafgs * ns->zns.zrwafg; - - trace_pci_nvme_zoned_zrwa_implicit_flush(zone->d.zslba, nlbc); - - zone->w_ptr += nlbc; - - nvme_advance_zone_wp(ns, zone, nlbc); -} - -static void nvme_finalize_zoned_write(NvmeNamespace *ns, NvmeRequest *req) -{ - NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd; - NvmeZone *zone; - uint64_t slba; - uint32_t nlb; - - slba = le64_to_cpu(rw->slba); - nlb = le16_to_cpu(rw->nlb) + 1; - zone = nvme_get_zone_by_slba(ns, slba); - assert(zone); - - if (zone->d.za & NVME_ZA_ZRWA_VALID) { - uint64_t ezrwa = zone->w_ptr + ns->zns.zrwas - 1; - uint64_t elba = slba + nlb - 1; - - if (elba > ezrwa) { - nvme_zoned_zrwa_implicit_flush(ns, zone, elba - ezrwa); - } - - return; - } - - nvme_advance_zone_wp(ns, zone, nlb); -} - -static inline bool nvme_is_write(NvmeRequest *req) -{ - NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd; - - return rw->opcode == NVME_CMD_WRITE || - rw->opcode == NVME_CMD_ZONE_APPEND || - rw->opcode == NVME_CMD_WRITE_ZEROES; -} - static void nvme_misc_cb(void *opaque, int ret) { NvmeRequest *req = opaque; @@ -2148,10 +1743,6 @@ void nvme_rw_complete_cb(void *opaque, int ret) block_acct_done(stats, acct); } - if (blk_get_zone_model(blk) && nvme_is_write(req)) { - nvme_finalize_zoned_write(ns, req); - } - nvme_enqueue_req_completion(nvme_cq(req), req); } @@ -2856,8 +2447,6 @@ static inline uint16_t nvme_check_copy_mcl(NvmeNamespace *ns, static void nvme_copy_out_completed_cb(void *opaque, int ret) { NvmeCopyAIOCB *iocb = opaque; - NvmeRequest *req = iocb->req; - NvmeNamespace *ns = req->ns; uint32_t nlb; nvme_copy_source_range_parse(iocb->ranges, iocb->idx, iocb->format, NULL, @@ -2870,10 +2459,6 @@ static void nvme_copy_out_completed_cb(void *opaque, int ret) goto out; } - if (blk_get_zone_model(ns->blkconf.blk)) { - nvme_advance_zone_wp(ns, iocb->zone, nlb); - } - iocb->idx++; iocb->slba += nlb; out: @@ -2982,17 +2567,6 @@ static void nvme_copy_in_completed_cb(void *opaque, int ret) goto invalid; } - if (blk_get_zone_model(ns->blkconf.blk)) { - status = nvme_check_zone_write(ns, iocb->zone, iocb->slba, nlb); - if (status) { - goto invalid; - } - - if (!(iocb->zone->d.za & NVME_ZA_ZRWA_VALID)) { - iocb->zone->w_ptr += nlb; - } - } - qemu_iovec_reset(&iocb->iov); qemu_iovec_add(&iocb->iov, iocb->bounce, len); @@ -3076,13 +2650,6 @@ static void nvme_do_copy(NvmeCopyAIOCB *iocb) } } - if (blk_get_zone_model(ns->blkconf.blk)) { - status = nvme_check_zone_read(ns, slba, nlb); - if (status) { - goto invalid; - } - } - qemu_iovec_reset(&iocb->iov); qemu_iovec_add(&iocb->iov, iocb->bounce, len); @@ -3152,19 +2719,6 @@ static uint16_t nvme_copy(NvmeCtrl *n, NvmeRequest *req) iocb->slba = le64_to_cpu(copy->sdlba); - if (blk_get_zone_model(ns->blkconf.blk)) { - iocb->zone = nvme_get_zone_by_slba(ns, iocb->slba); - if (!iocb->zone) { - status = NVME_LBA_RANGE | NVME_DNR; - goto invalid; - } - - status = nvme_zrm_auto(n, ns, iocb->zone); - if (status) { - goto invalid; - } - } - status = nvme_check_copy_mcl(ns, iocb, nr); if (status) { goto invalid; @@ -3422,14 +2976,6 @@ static uint16_t nvme_read(NvmeCtrl *n, NvmeRequest *req) goto invalid; } - if (blk_get_zone_model(blk)) { - status = nvme_check_zone_read(ns, slba, nlb); - if (status) { - trace_pci_nvme_err_zone_read_not_ok(slba, nlb, status); - goto invalid; - } - } - if (NVME_ERR_REC_DULBE(ns->features.err_rec)) { status = nvme_check_dulbe(ns, slba, nlb); if (status) { @@ -3505,8 +3051,6 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest *req, bool append, uint64_t data_size = nvme_l2b(ns, nlb); uint64_t mapped_size = data_size; uint64_t data_offset; - NvmeZone *zone; - NvmeZonedResult *res = (NvmeZonedResult *)&req->cqe; BlockBackend *blk = ns->blkconf.blk; uint16_t status; @@ -3538,32 +3082,20 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest *req, bool append, } if (blk_get_zone_model(blk)) { - zone = nvme_get_zone_by_slba(ns, slba); - assert(zone); + uint32_t zone_size = blk_get_zone_size(blk); + uint32_t zone_idx = slba / zone_size; + int64_t zone_start = zone_idx * zone_size; if (append) { bool piremap = !!(ctrl & NVME_RW_PIREMAP); - if (unlikely(zone->d.za & NVME_ZA_ZRWA_VALID)) { - return NVME_INVALID_ZONE_OP | NVME_DNR; - } - - if (unlikely(slba != zone->d.zslba)) { - trace_pci_nvme_err_append_not_at_start(slba, zone->d.zslba); - status = NVME_INVALID_FIELD; - goto invalid; - } - if (n->params.zasl && data_size > (uint64_t)n->page_size << n->params.zasl) { trace_pci_nvme_err_zasl(data_size); return NVME_INVALID_FIELD | NVME_DNR; } - slba = zone->w_ptr; rw->slba = cpu_to_le64(slba); - res->slba = cpu_to_le64(slba); - switch (NVME_ID_NS_DPS_TYPE(ns->id_ns.dps)) { case NVME_ID_NS_DPS_TYPE_1: if (!piremap) { @@ -3575,7 +3107,7 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest *req, bool append, case NVME_ID_NS_DPS_TYPE_2: if (piremap) { uint32_t reftag = le32_to_cpu(rw->reftag); - rw->reftag = cpu_to_le32(reftag + (slba - zone->d.zslba)); + rw->reftag = cpu_to_le32(reftag + (slba - zone_start)); } break; @@ -3589,19 +3121,6 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest *req, bool append, } } - status = nvme_check_zone_write(ns, zone, slba, nlb); - if (status) { - goto invalid; - } - - status = nvme_zrm_auto(n, ns, zone); - if (status) { - goto invalid; - } - - if (!(zone->d.za & NVME_ZA_ZRWA_VALID)) { - zone->w_ptr += nlb; - } } else if (ns->endgrp && ns->endgrp->fdp.enabled) { nvme_do_write_fdp(n, req, slba, nlb); } @@ -3644,6 +3163,23 @@ static inline uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeRequest *req) return nvme_do_write(n, req, false, true); } +typedef struct NvmeZoneCmdAIOCB { + NvmeRequest *req; + NvmeCmd *cmd; + NvmeCtrl *n; + + union { + struct { + uint32_t partial; + unsigned int nr_zones; + BlockZoneDescriptor *zones; + } zone_report_data; + struct { + int64_t offset; + } zone_append_data; + }; +} NvmeZoneCmdAIOCB; + static inline uint16_t nvme_zone_append(NvmeCtrl *n, NvmeRequest *req) { return nvme_do_write(n, req, true, false); @@ -3655,7 +3191,7 @@ static uint16_t nvme_get_mgmt_zone_slba_idx(NvmeNamespace *ns, NvmeCmd *c, uint32_t dw10 = le32_to_cpu(c->cdw10); uint32_t dw11 = le32_to_cpu(c->cdw11); - if (blk_get_zone_model(ns->blkconf.blk)) { + if (!blk_get_zone_model(ns->blkconf.blk)) { trace_pci_nvme_err_invalid_opc(c->opcode); return NVME_INVALID_OPCODE | NVME_DNR; } @@ -3673,198 +3209,21 @@ static uint16_t nvme_get_mgmt_zone_slba_idx(NvmeNamespace *ns, NvmeCmd *c, return NVME_SUCCESS; } -typedef uint16_t (*op_handler_t)(NvmeNamespace *, NvmeZone *, NvmeZoneState, - NvmeRequest *); - -enum NvmeZoneProcessingMask { - NVME_PROC_CURRENT_ZONE = 0, - NVME_PROC_OPENED_ZONES = 1 << 0, - NVME_PROC_CLOSED_ZONES = 1 << 1, - NVME_PROC_READ_ONLY_ZONES = 1 << 2, - NVME_PROC_FULL_ZONES = 1 << 3, -}; - -static uint16_t nvme_open_zone(NvmeNamespace *ns, NvmeZone *zone, - NvmeZoneState state, NvmeRequest *req) -{ - NvmeZoneSendCmd *cmd = (NvmeZoneSendCmd *)&req->cmd; - int flags = 0; - - if (cmd->zsflags & NVME_ZSFLAG_ZRWA_ALLOC) { - uint16_t ozcs = le16_to_cpu(ns->id_ns_zoned->ozcs); - - if (!(ozcs & NVME_ID_NS_ZONED_OZCS_ZRWASUP)) { - return NVME_INVALID_ZONE_OP | NVME_DNR; - } - - if (zone->w_ptr % ns->zns.zrwafg) { - return NVME_NOZRWA | NVME_DNR; - } - - flags = NVME_ZRM_ZRWA; - } - - return nvme_zrm_open_flags(nvme_ctrl(req), ns, zone, flags); -} - -static uint16_t nvme_close_zone(NvmeNamespace *ns, NvmeZone *zone, - NvmeZoneState state, NvmeRequest *req) -{ - return nvme_zrm_close(ns, zone); -} - -static uint16_t nvme_finish_zone(NvmeNamespace *ns, NvmeZone *zone, - NvmeZoneState state, NvmeRequest *req) -{ - return nvme_zrm_finish(ns, zone); -} - -static uint16_t nvme_offline_zone(NvmeNamespace *ns, NvmeZone *zone, - NvmeZoneState state, NvmeRequest *req) -{ - switch (state) { - case NVME_ZONE_STATE_READ_ONLY: - nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_OFFLINE); - /* fall through */ - case NVME_ZONE_STATE_OFFLINE: - return NVME_SUCCESS; - default: - return NVME_ZONE_INVAL_TRANSITION; - } -} - -static uint16_t nvme_set_zd_ext(NvmeNamespace *ns, NvmeZone *zone) -{ - uint16_t status; - uint8_t state = nvme_get_zone_state(zone); - - if (state == NVME_ZONE_STATE_EMPTY) { - status = nvme_aor_check(ns, 1, 0); - if (status) { - return status; - } - nvme_aor_inc_active(ns); - zone->d.za |= NVME_ZA_ZD_EXT_VALID; - nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_CLOSED); - return NVME_SUCCESS; - } - - return NVME_ZONE_INVAL_TRANSITION; -} - -static uint16_t nvme_bulk_proc_zone(NvmeNamespace *ns, NvmeZone *zone, - enum NvmeZoneProcessingMask proc_mask, - op_handler_t op_hndlr, NvmeRequest *req) -{ - uint16_t status = NVME_SUCCESS; - NvmeZoneState zs = nvme_get_zone_state(zone); - bool proc_zone; - - switch (zs) { - case NVME_ZONE_STATE_IMPLICITLY_OPEN: - case NVME_ZONE_STATE_EXPLICITLY_OPEN: - proc_zone = proc_mask & NVME_PROC_OPENED_ZONES; - break; - case NVME_ZONE_STATE_CLOSED: - proc_zone = proc_mask & NVME_PROC_CLOSED_ZONES; - break; - case NVME_ZONE_STATE_READ_ONLY: - proc_zone = proc_mask & NVME_PROC_READ_ONLY_ZONES; - break; - case NVME_ZONE_STATE_FULL: - proc_zone = proc_mask & NVME_PROC_FULL_ZONES; - break; - default: - proc_zone = false; - } - - if (proc_zone) { - status = op_hndlr(ns, zone, zs, req); - } - - return status; -} - -static uint16_t nvme_do_zone_op(NvmeNamespace *ns, NvmeZone *zone, - enum NvmeZoneProcessingMask proc_mask, - op_handler_t op_hndlr, NvmeRequest *req) -{ - NvmeZone *next; - uint16_t status = NVME_SUCCESS; - int i; - - if (!proc_mask) { - status = op_hndlr(ns, zone, nvme_get_zone_state(zone), req); - } else { - if (proc_mask & NVME_PROC_CLOSED_ZONES) { - QTAILQ_FOREACH_SAFE(zone, &ns->closed_zones, entry, next) { - status = nvme_bulk_proc_zone(ns, zone, proc_mask, op_hndlr, - req); - if (status && status != NVME_NO_COMPLETE) { - goto out; - } - } - } - if (proc_mask & NVME_PROC_OPENED_ZONES) { - QTAILQ_FOREACH_SAFE(zone, &ns->imp_open_zones, entry, next) { - status = nvme_bulk_proc_zone(ns, zone, proc_mask, op_hndlr, - req); - if (status && status != NVME_NO_COMPLETE) { - goto out; - } - } - - QTAILQ_FOREACH_SAFE(zone, &ns->exp_open_zones, entry, next) { - status = nvme_bulk_proc_zone(ns, zone, proc_mask, op_hndlr, - req); - if (status && status != NVME_NO_COMPLETE) { - goto out; - } - } - } - if (proc_mask & NVME_PROC_FULL_ZONES) { - QTAILQ_FOREACH_SAFE(zone, &ns->full_zones, entry, next) { - status = nvme_bulk_proc_zone(ns, zone, proc_mask, op_hndlr, - req); - if (status && status != NVME_NO_COMPLETE) { - goto out; - } - } - } - - if (proc_mask & NVME_PROC_READ_ONLY_ZONES) { - for (i = 0; i < ns->num_zones; i++, zone++) { - status = nvme_bulk_proc_zone(ns, zone, proc_mask, op_hndlr, - req); - if (status && status != NVME_NO_COMPLETE) { - goto out; - } - } - } - } - -out: - return status; -} - -typedef struct NvmeZoneResetAIOCB { +typedef struct NvmeZoneMgmtAIOCB { BlockAIOCB common; BlockAIOCB *aiocb; NvmeRequest *req; int ret; bool all; - int idx; - NvmeZone *zone; -} NvmeZoneResetAIOCB; + uint64_t offset; + uint64_t len; + BlockZoneOp op; +} NvmeZoneMgmtAIOCB; -static void nvme_zone_reset_cancel(BlockAIOCB *aiocb) +static void nvme_zone_mgmt_send_cancel(BlockAIOCB *aiocb) { - NvmeZoneResetAIOCB *iocb = container_of(aiocb, NvmeZoneResetAIOCB, common); - NvmeRequest *req = iocb->req; - NvmeNamespace *ns = req->ns; - - iocb->idx = ns->num_zones; + NvmeZoneMgmtAIOCB *iocb = container_of(aiocb, NvmeZoneMgmtAIOCB, common); iocb->ret = -ECANCELED; @@ -3874,117 +3233,66 @@ static void nvme_zone_reset_cancel(BlockAIOCB *aiocb) } } -static const AIOCBInfo nvme_zone_reset_aiocb_info = { - .aiocb_size = sizeof(NvmeZoneResetAIOCB), - .cancel_async = nvme_zone_reset_cancel, +static const AIOCBInfo nvme_zone_mgmt_aiocb_info = { + .aiocb_size = sizeof(NvmeZoneMgmtAIOCB), + .cancel_async = nvme_zone_mgmt_send_cancel, }; -static void nvme_zone_reset_cb(void *opaque, int ret); +static void nvme_zone_mgmt_send_cb(void *opaque, int ret); -static void nvme_zone_reset_epilogue_cb(void *opaque, int ret) +static void nvme_zone_mgmt_send_epilogue_cb(void *opaque, int ret) { - NvmeZoneResetAIOCB *iocb = opaque; - NvmeRequest *req = iocb->req; - NvmeNamespace *ns = req->ns; - int64_t moff; - int count; + NvmeZoneMgmtAIOCB *iocb = opaque; + NvmeNamespace *ns = iocb->req->ns; if (ret < 0 || iocb->ret < 0 || !ns->lbaf.ms) { - goto out; + iocb->ret = ret; + error_report("Invalid zone mgmt op %d", ret); + goto done; } - moff = nvme_moff(ns, iocb->zone->d.zslba); - count = nvme_m2b(ns, ns->zone_size); - - iocb->aiocb = blk_aio_pwrite_zeroes(ns->blkconf.blk, moff, count, - BDRV_REQ_MAY_UNMAP, - nvme_zone_reset_cb, iocb); return; -out: - nvme_zone_reset_cb(iocb, ret); +done: + iocb->aiocb = NULL; + iocb->common.cb(iocb->common.opaque, iocb->ret); + qemu_aio_unref(iocb); } -static void nvme_zone_reset_cb(void *opaque, int ret) +static void nvme_zone_mgmt_send_cb(void *opaque, int ret) { - NvmeZoneResetAIOCB *iocb = opaque; + NvmeZoneMgmtAIOCB *iocb = opaque; NvmeRequest *req = iocb->req; NvmeNamespace *ns = req->ns; + BlockBackend *blk = ns->blkconf.blk; - if (iocb->ret < 0) { - goto done; - } else if (ret < 0) { - iocb->ret = ret; - goto done; - } - - if (iocb->zone) { - nvme_zrm_reset(ns, iocb->zone); - - if (!iocb->all) { - goto done; - } - } - - while (iocb->idx < ns->num_zones) { - NvmeZone *zone = &ns->zone_array[iocb->idx++]; - - switch (nvme_get_zone_state(zone)) { - case NVME_ZONE_STATE_EMPTY: - if (!iocb->all) { - goto done; - } - - continue; - - case NVME_ZONE_STATE_EXPLICITLY_OPEN: - case NVME_ZONE_STATE_IMPLICITLY_OPEN: - case NVME_ZONE_STATE_CLOSED: - case NVME_ZONE_STATE_FULL: - iocb->zone = zone; - break; - - default: - continue; - } - - trace_pci_nvme_zns_zone_reset(zone->d.zslba); - - iocb->aiocb = blk_aio_pwrite_zeroes(ns->blkconf.blk, - nvme_l2b(ns, zone->d.zslba), - nvme_l2b(ns, ns->zone_size), - BDRV_REQ_MAY_UNMAP, - nvme_zone_reset_epilogue_cb, - iocb); - return; - } - -done: - iocb->aiocb = NULL; - - iocb->common.cb(iocb->common.opaque, iocb->ret); - qemu_aio_unref(iocb); + iocb->aiocb = blk_aio_zone_mgmt(blk, iocb->op, iocb->offset, + iocb->len, + nvme_zone_mgmt_send_epilogue_cb, iocb); + return; } -static uint16_t nvme_zone_mgmt_send_zrwa_flush(NvmeCtrl *n, NvmeZone *zone, +static uint16_t nvme_zone_mgmt_send_zrwa_flush(NvmeCtrl *n, uint32_t zidx, uint64_t elba, NvmeRequest *req) { NvmeNamespace *ns = req->ns; uint16_t ozcs = le16_to_cpu(ns->id_ns_zoned->ozcs); - uint64_t wp = zone->d.wp; - uint32_t nlb = elba - wp + 1; - uint16_t status; - + BlockZoneWps *wps = blk_get_zone_wps(ns->blkconf.blk); + uint64_t *wp = &wps->wp[zidx]; + uint64_t raw_wpv = BDRV_ZP_GET_WP(*wp); + uint8_t za = BDRV_ZP_GET_ZA(raw_wpv); + uint64_t wpv = BDRV_ZP_GET_WP(raw_wpv); + uint32_t nlb = elba - wpv + 1; if (!(ozcs & NVME_ID_NS_ZONED_OZCS_ZRWASUP)) { return NVME_INVALID_ZONE_OP | NVME_DNR; } - if (!(zone->d.za & NVME_ZA_ZRWA_VALID)) { + if (!(za & NVME_ZA_ZRWA_VALID)) { return NVME_INVALID_FIELD | NVME_DNR; } - if (elba < wp || elba > wp + ns->zns.zrwas) { + if (elba < wpv || elba > wpv + ns->zns.zrwas) { return NVME_ZONE_BOUNDARY_ERROR | NVME_DNR; } @@ -3992,37 +3300,36 @@ static uint16_t nvme_zone_mgmt_send_zrwa_flush(NvmeCtrl *n, NvmeZone *zone, return NVME_INVALID_FIELD | NVME_DNR; } - status = nvme_zrm_auto(n, ns, zone); - if (status) { - return status; - } - - zone->w_ptr += nlb; - - nvme_advance_zone_wp(ns, zone, nlb); + *wp += nlb; return NVME_SUCCESS; } static inline uint8_t *nvme_get_zd_extension(NvmeNamespace *ns, - uint32_t zone_idx) + uint32_t zone_idx) { return &ns->zd_extensions[zone_idx * blk_get_zd_ext_size(ns->blkconf.blk)]; } +#define BLK_ZO_UNSUP 0x22 static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, NvmeRequest *req) { NvmeZoneSendCmd *cmd = (NvmeZoneSendCmd *)&req->cmd; NvmeNamespace *ns = req->ns; - NvmeZone *zone; - NvmeZoneResetAIOCB *iocb; - uint8_t *zd_ext; + NvmeZoneMgmtAIOCB *iocb; uint64_t slba = 0; uint32_t zone_idx = 0; uint16_t status; uint8_t action = cmd->zsa; + uint8_t *zd_ext; + uint64_t offset, len; + BlockBackend *blk = ns->blkconf.blk; + uint32_t zone_size = blk_get_zone_size(blk); + uint64_t size = zone_size * blk_get_nr_zones(blk); + BlockZoneOp op = BLK_ZO_UNSUP; + /* support flag, true when the op is supported */ + bool flag = true; bool all; - enum NvmeZoneProcessingMask proc_mask = NVME_PROC_CURRENT_ZONE; all = cmd->zsflags & NVME_ZSFLAG_SELECT_ALL; @@ -4033,82 +3340,51 @@ static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, NvmeRequest *req) if (status) { return status; } - } - - zone = &ns->zone_array[zone_idx]; - if (slba != zone->d.zslba && action != NVME_ZONE_ACTION_ZRWA_FLUSH) { - trace_pci_nvme_err_unaligned_zone_cmd(action, slba, zone->d.zslba); - return NVME_INVALID_FIELD | NVME_DNR; + len = zone_size; + } else { + len = size; } switch (action) { case NVME_ZONE_ACTION_OPEN: - if (all) { - proc_mask = NVME_PROC_CLOSED_ZONES; - } + op = BLK_ZO_OPEN; trace_pci_nvme_open_zone(slba, zone_idx, all); - status = nvme_do_zone_op(ns, zone, proc_mask, nvme_open_zone, req); break; case NVME_ZONE_ACTION_CLOSE: - if (all) { - proc_mask = NVME_PROC_OPENED_ZONES; - } + op = BLK_ZO_CLOSE; trace_pci_nvme_close_zone(slba, zone_idx, all); - status = nvme_do_zone_op(ns, zone, proc_mask, nvme_close_zone, req); break; case NVME_ZONE_ACTION_FINISH: - if (all) { - proc_mask = NVME_PROC_OPENED_ZONES | NVME_PROC_CLOSED_ZONES; - } + op = BLK_ZO_FINISH; trace_pci_nvme_finish_zone(slba, zone_idx, all); - status = nvme_do_zone_op(ns, zone, proc_mask, nvme_finish_zone, req); break; case NVME_ZONE_ACTION_RESET: + op = BLK_ZO_RESET; trace_pci_nvme_reset_zone(slba, zone_idx, all); - - iocb = blk_aio_get(&nvme_zone_reset_aiocb_info, ns->blkconf.blk, - nvme_misc_cb, req); - - iocb->req = req; - iocb->ret = 0; - iocb->all = all; - iocb->idx = zone_idx; - iocb->zone = NULL; - - req->aiocb = &iocb->common; - nvme_zone_reset_cb(iocb, 0); - - return NVME_NO_COMPLETE; + break; case NVME_ZONE_ACTION_OFFLINE: - if (all) { - proc_mask = NVME_PROC_READ_ONLY_ZONES; - } + op = BLK_ZO_OFFLINE; trace_pci_nvme_offline_zone(slba, zone_idx, all); - status = nvme_do_zone_op(ns, zone, proc_mask, nvme_offline_zone, req); break; case NVME_ZONE_ACTION_SET_ZD_EXT: + int zd_ext_size = blk_get_zd_ext_size(blk); trace_pci_nvme_set_descriptor_extension(slba, zone_idx); - if (all || !blk_get_zd_ext_size(ns->blkconf.blk)) { + if (all || !zd_ext_size) { return NVME_INVALID_FIELD | NVME_DNR; } zd_ext = nvme_get_zd_extension(ns, zone_idx); - status = nvme_h2c(n, zd_ext, blk_get_zd_ext_size(ns->blkconf.blk), req); + status = nvme_h2c(n, zd_ext, zd_ext_size, req); if (status) { trace_pci_nvme_err_zd_extension_map_error(zone_idx); return status; } - - status = nvme_set_zd_ext(ns, zone); - if (status == NVME_SUCCESS) { - trace_pci_nvme_zd_extension_set(zone_idx); - return status; - } + trace_pci_nvme_zd_extension_set(zone_idx); break; case NVME_ZONE_ACTION_ZRWA_FLUSH: @@ -4116,16 +3392,34 @@ static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, NvmeRequest *req) return NVME_INVALID_FIELD | NVME_DNR; } - return nvme_zone_mgmt_send_zrwa_flush(n, zone, slba, req); + return nvme_zone_mgmt_send_zrwa_flush(n, zone_idx, slba, req); default: trace_pci_nvme_err_invalid_mgmt_action(action); status = NVME_INVALID_FIELD; } + if (flag && (op != BLK_ZO_UNSUP)) { + iocb = blk_aio_get(&nvme_zone_mgmt_aiocb_info, ns->blkconf.blk, + nvme_misc_cb, req); + iocb->req = req; + iocb->ret = 0; + iocb->all = all; + /* Convert it to bytes for accessing block layers */ + offset = nvme_l2b(ns, slba); + iocb->offset = offset; + iocb->len = len; + iocb->op = op; + + req->aiocb = &iocb->common; + nvme_zone_mgmt_send_cb(iocb, 0); + + return NVME_NO_COMPLETE; + } + if (status == NVME_ZONE_INVAL_TRANSITION) { trace_pci_nvme_err_invalid_zone_state_transition(action, slba, - zone->d.za); + TO_DO_ZA); } if (status) { status |= NVME_DNR; @@ -4134,50 +3428,144 @@ static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, NvmeRequest *req) return status; } -static bool nvme_zone_matches_filter(uint32_t zafs, NvmeZone *zl) +static bool nvme_zone_matches_filter(uint32_t zafs, BlockZoneState zs) { - NvmeZoneState zs = nvme_get_zone_state(zl); - switch (zafs) { case NVME_ZONE_REPORT_ALL: return true; case NVME_ZONE_REPORT_EMPTY: - return zs == NVME_ZONE_STATE_EMPTY; + return zs == BLK_ZS_EMPTY; case NVME_ZONE_REPORT_IMPLICITLY_OPEN: - return zs == NVME_ZONE_STATE_IMPLICITLY_OPEN; + return zs == BLK_ZS_IOPEN; case NVME_ZONE_REPORT_EXPLICITLY_OPEN: - return zs == NVME_ZONE_STATE_EXPLICITLY_OPEN; + return zs == BLK_ZS_EOPEN; case NVME_ZONE_REPORT_CLOSED: - return zs == NVME_ZONE_STATE_CLOSED; + return zs == BLK_ZS_CLOSED; case NVME_ZONE_REPORT_FULL: - return zs == NVME_ZONE_STATE_FULL; + return zs == BLK_ZS_FULL; case NVME_ZONE_REPORT_READ_ONLY: - return zs == NVME_ZONE_STATE_READ_ONLY; + return zs == BLK_ZS_RDONLY; case NVME_ZONE_REPORT_OFFLINE: - return zs == NVME_ZONE_STATE_OFFLINE; + return zs == BLK_ZS_OFFLINE; default: return false; } } +static void nvme_zone_mgmt_recv_completed_cb(void *opaque, int ret) +{ + NvmeZoneCmdAIOCB *iocb = opaque; + NvmeRequest *req = iocb->req; + NvmeCmd *cmd = iocb->cmd; + uint32_t dw13 = le32_to_cpu(cmd->cdw13); + int64_t zrp_size, j = 0; + uint32_t zrasf; + g_autofree void *buf = NULL; + void *buf_p; + NvmeZoneReportHeader *zrp_hdr; + uint64_t nz = iocb->zone_report_data.nr_zones; + BlockZoneDescriptor *in_zone = iocb->zone_report_data.zones; + NvmeZoneDescr *out_zone; + + if (ret < 0) { + error_report("Invalid zone recv %d", ret); + goto out; + } + + zrasf = (dw13 >> 8) & 0xff; + if (zrasf > NVME_ZONE_REPORT_OFFLINE) { + error_report("Nvme invalid field"); + return; + } + + zrp_size = sizeof(NvmeZoneReportHeader) + sizeof(NvmeZoneDescr) * nz; + buf = g_malloc0(zrp_size); + + zrp_hdr = buf; + zrp_hdr->nr_zones = cpu_to_le64(nz); + buf_p = buf + sizeof(NvmeZoneReportHeader); + + for (; j < nz; j++) { + out_zone = buf_p; + buf_p += sizeof(NvmeZoneDescr); + + BlockZoneState zs = in_zone[j].state; + if (!nvme_zone_matches_filter(zrasf, zs)) { + continue; + } + + *out_zone = (NvmeZoneDescr) { + .zslba = nvme_b2l(req->ns, in_zone[j].start), + .zcap = nvme_b2l(req->ns, in_zone[j].cap), + .wp = nvme_b2l(req->ns, in_zone[j].wp), + }; + + switch (in_zone[j].type) { + case BLK_ZT_CONV: + out_zone->zt = NVME_ZONE_TYPE_RESERVED; + break; + case BLK_ZT_SWR: + out_zone->zt = NVME_ZONE_TYPE_SEQ_WRITE; + break; + case BLK_ZT_SWP: + out_zone->zt = NVME_ZONE_TYPE_RESERVED; + break; + default: + g_assert_not_reached(); + } + + switch (zs) { + case BLK_ZS_RDONLY: + out_zone->zs = NVME_ZONE_STATE_READ_ONLY << 4; + break; + case BLK_ZS_OFFLINE: + out_zone->zs = NVME_ZONE_STATE_OFFLINE << 4; + break; + case BLK_ZS_EMPTY: + out_zone->zs = NVME_ZONE_STATE_EMPTY << 4; + break; + case BLK_ZS_CLOSED: + out_zone->zs = NVME_ZONE_STATE_CLOSED << 4; + break; + case BLK_ZS_FULL: + out_zone->zs = NVME_ZONE_STATE_FULL << 4; + break; + case BLK_ZS_EOPEN: + out_zone->zs = NVME_ZONE_STATE_EXPLICITLY_OPEN << 4; + break; + case BLK_ZS_IOPEN: + out_zone->zs = NVME_ZONE_STATE_IMPLICITLY_OPEN << 4; + break; + case BLK_ZS_NOT_WP: + out_zone->zs = NVME_ZONE_STATE_RESERVED << 4; + break; + default: + g_assert_not_reached(); + } + } + + nvme_c2h(iocb->n, (uint8_t *)buf, zrp_size, req); + +out: + g_free(iocb->zone_report_data.zones); + g_free(iocb); + return; +} + static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, NvmeRequest *req) { NvmeCmd *cmd = (NvmeCmd *)&req->cmd; NvmeNamespace *ns = req->ns; + BlockBackend *blk = ns->blkconf.blk; + NvmeZoneCmdAIOCB *iocb; /* cdw12 is zero-based number of dwords to return. Convert to bytes */ uint32_t data_size = (le32_to_cpu(cmd->cdw12) + 1) << 2; uint32_t dw13 = le32_to_cpu(cmd->cdw13); - uint32_t zone_idx, zra, zrasf, partial; - uint64_t max_zones, nr_zones = 0; + uint32_t zone_idx, zra, zrasf, partial, nr_zones; uint16_t status; uint64_t slba; - NvmeZoneDescr *z; - NvmeZone *zone; - NvmeZoneReportHeader *header; - void *buf, *buf_p; size_t zone_entry_sz; - int i; - + int64_t offset; req->status = NVME_SUCCESS; status = nvme_get_mgmt_zone_slba_idx(ns, cmd, &slba, &zone_idx); @@ -4208,64 +3596,31 @@ static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, NvmeRequest *req) return status; } - partial = (dw13 >> 16) & 0x01; - zone_entry_sz = sizeof(NvmeZoneDescr); if (zra == NVME_ZONE_REPORT_EXTENDED) { - zone_entry_sz += blk_get_zd_ext_size(ns->blkconf.blk) ; + zone_entry_sz += blk_get_zd_ext_size(ns->blkconf.blk); } - max_zones = (data_size - sizeof(NvmeZoneReportHeader)) / zone_entry_sz; - buf = g_malloc0(data_size); - - zone = &ns->zone_array[zone_idx]; - for (i = zone_idx; i < ns->num_zones; i++) { - if (partial && nr_zones >= max_zones) { - break; - } - if (nvme_zone_matches_filter(zrasf, zone++)) { - nr_zones++; - } - } - header = buf; - header->nr_zones = cpu_to_le64(nr_zones); - - buf_p = buf + sizeof(NvmeZoneReportHeader); - for (; zone_idx < ns->num_zones && max_zones > 0; zone_idx++) { - zone = &ns->zone_array[zone_idx]; - if (nvme_zone_matches_filter(zrasf, zone)) { - z = buf_p; - buf_p += sizeof(NvmeZoneDescr); - - z->zt = zone->d.zt; - z->zs = zone->d.zs; - z->zcap = cpu_to_le64(zone->d.zcap); - z->zslba = cpu_to_le64(zone->d.zslba); - z->za = zone->d.za; - - if (nvme_wp_is_valid(zone)) { - z->wp = cpu_to_le64(zone->d.wp); - } else { - z->wp = cpu_to_le64(~0ULL); - } - - if (zra == NVME_ZONE_REPORT_EXTENDED) { - int zd_ext_size = blk_get_zd_ext_size(ns->blkconf.blk); - if (zone->d.za & NVME_ZA_ZD_EXT_VALID) { - memcpy(buf_p, nvme_get_zd_extension(ns, zone_idx), - zd_ext_size); - } - buf_p += zd_ext_size; - } - - max_zones--; - } + offset = nvme_l2b(ns, slba); + nr_zones = (data_size - sizeof(NvmeZoneReportHeader)) / zone_entry_sz; + partial = (dw13 >> 16) & 0x01; + if (!partial) { + nr_zones = blk_get_nr_zones(blk); + offset = 0; } - status = nvme_c2h(n, (uint8_t *)buf, data_size, req); - - g_free(buf); - + iocb = g_malloc0(sizeof(NvmeZoneCmdAIOCB)); + iocb->req = req; + iocb->n = n; + iocb->cmd = cmd; + iocb->zone_report_data.nr_zones = nr_zones; + iocb->zone_report_data.zones = g_malloc0( + sizeof(BlockZoneDescriptor) * nr_zones); + + blk_aio_zone_report(blk, offset, + &iocb->zone_report_data.nr_zones, + iocb->zone_report_data.zones, + nvme_zone_mgmt_recv_completed_cb, iocb); return status; } diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c index 45c08391f5..63106a0f27 100644 --- a/hw/nvme/ns.c +++ b/hw/nvme/ns.c @@ -219,36 +219,10 @@ static int nvme_ns_zoned_check_calc_geometry(NvmeNamespace *ns, Error **errp) static void nvme_ns_zoned_init_state(NvmeNamespace *ns) { BlockBackend *blk = ns->blkconf.blk; - uint64_t start = 0, zone_size = ns->zone_size; - uint64_t capacity = ns->num_zones * zone_size; - NvmeZone *zone; - int i; - - ns->zone_array = g_new0(NvmeZone, ns->num_zones); if (blk_get_zone_extension(blk)) { ns->zd_extensions = blk_get_zone_extension(blk); } - QTAILQ_INIT(&ns->exp_open_zones); - QTAILQ_INIT(&ns->imp_open_zones); - QTAILQ_INIT(&ns->closed_zones); - QTAILQ_INIT(&ns->full_zones); - - zone = ns->zone_array; - for (i = 0; i < ns->num_zones; i++, zone++) { - if (start + zone_size > capacity) { - zone_size = capacity - start; - } - zone->d.zt = NVME_ZONE_TYPE_SEQ_WRITE; - nvme_set_zone_state(zone, NVME_ZONE_STATE_EMPTY); - zone->d.za = 0; - zone->d.zcap = ns->zone_capacity; - zone->d.zslba = start; - zone->d.wp = start; - zone->w_ptr = start; - start += zone_size; - } - ns->zone_size_log2 = 0; if (is_power_of_2(ns->zone_size)) { ns->zone_size_log2 = 63 - clz64(ns->zone_size); @@ -319,56 +293,12 @@ static void nvme_ns_init_zoned(NvmeNamespace *ns) ns->id_ns_zoned = id_ns_z; } -static void nvme_clear_zone(NvmeNamespace *ns, NvmeZone *zone) -{ - uint8_t state; - - zone->w_ptr = zone->d.wp; - state = nvme_get_zone_state(zone); - if (zone->d.wp != zone->d.zslba || - (zone->d.za & NVME_ZA_ZD_EXT_VALID)) { - if (state != NVME_ZONE_STATE_CLOSED) { - trace_pci_nvme_clear_ns_close(state, zone->d.zslba); - nvme_set_zone_state(zone, NVME_ZONE_STATE_CLOSED); - } - nvme_aor_inc_active(ns); - QTAILQ_INSERT_HEAD(&ns->closed_zones, zone, entry); - } else { - trace_pci_nvme_clear_ns_reset(state, zone->d.zslba); - if (zone->d.za & NVME_ZA_ZRWA_VALID) { - zone->d.za &= ~NVME_ZA_ZRWA_VALID; - ns->zns.numzrwa++; - } - nvme_set_zone_state(zone, NVME_ZONE_STATE_EMPTY); - } -} - /* * Close all the zones that are currently open. */ static void nvme_zoned_ns_shutdown(NvmeNamespace *ns) { - NvmeZone *zone, *next; - - QTAILQ_FOREACH_SAFE(zone, &ns->closed_zones, entry, next) { - QTAILQ_REMOVE(&ns->closed_zones, zone, entry); - nvme_aor_dec_active(ns); - nvme_clear_zone(ns, zone); - } - QTAILQ_FOREACH_SAFE(zone, &ns->imp_open_zones, entry, next) { - QTAILQ_REMOVE(&ns->imp_open_zones, zone, entry); - nvme_aor_dec_open(ns); - nvme_aor_dec_active(ns); - nvme_clear_zone(ns, zone); - } - QTAILQ_FOREACH_SAFE(zone, &ns->exp_open_zones, entry, next) { - QTAILQ_REMOVE(&ns->exp_open_zones, zone, entry); - nvme_aor_dec_open(ns); - nvme_aor_dec_active(ns); - nvme_clear_zone(ns, zone); - } - - assert(ns->nr_open_zones == 0); + /* Set states (exp/imp_open/closed/full) to empty */ } static NvmeRuHandle *nvme_find_ruh_by_attr(NvmeEnduranceGroup *endgrp, @@ -662,7 +592,6 @@ void nvme_ns_cleanup(NvmeNamespace *ns) { if (blk_get_zone_model(ns->blkconf.blk)) { g_free(ns->id_ns_zoned); - g_free(ns->zone_array); } if (ns->endgrp && ns->endgrp->fdp.enabled) { @@ -776,10 +705,6 @@ static Property nvme_ns_props[] = { DEFINE_PROP_UINT8("msrc", NvmeNamespace, params.msrc, 127), DEFINE_PROP_BOOL("zoned.cross_read", NvmeNamespace, params.cross_zone_read, false), - DEFINE_PROP_UINT32("zoned.max_active", NvmeNamespace, - params.max_active_zones, 0), - DEFINE_PROP_UINT32("zoned.max_open", NvmeNamespace, - params.max_open_zones, 0), DEFINE_PROP_UINT32("zoned.numzrwa", NvmeNamespace, params.numzrwa, 0), DEFINE_PROP_SIZE("zoned.zrwas", NvmeNamespace, params.zrwas, 0), DEFINE_PROP_SIZE("zoned.zrwafg", NvmeNamespace, params.zrwafg, -1), diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h index 37007952fc..c2d1b07f88 100644 --- a/hw/nvme/nvme.h +++ b/hw/nvme/nvme.h @@ -150,6 +150,9 @@ static inline NvmeNamespace *nvme_subsys_ns(NvmeSubsystem *subsys, #define NVME_NS(obj) \ OBJECT_CHECK(NvmeNamespace, (obj), TYPE_NVME_NS) +#define TO_DO_STATE 0 +#define TO_DO_ZA 0 + typedef struct NvmeZone { NvmeZoneDescr d; uint64_t w_ptr; @@ -190,8 +193,6 @@ typedef struct NvmeNamespaceParams { uint8_t msrc; bool cross_zone_read; - uint32_t max_active_zones; - uint32_t max_open_zones; uint32_t numzrwa; uint64_t zrwas; @@ -228,11 +229,10 @@ typedef struct NvmeNamespace { QTAILQ_ENTRY(NvmeNamespace) entry; NvmeIdNsZoned *id_ns_zoned; - NvmeZone *zone_array; - QTAILQ_HEAD(, NvmeZone) exp_open_zones; - QTAILQ_HEAD(, NvmeZone) imp_open_zones; - QTAILQ_HEAD(, NvmeZone) closed_zones; - QTAILQ_HEAD(, NvmeZone) full_zones; + uint32_t *exp_open_zones; + uint32_t *imp_open_zones; + uint32_t *closed_zones; + uint32_t *full_zones; uint32_t num_zones; uint64_t zone_size; uint64_t zone_capacity; @@ -265,6 +265,12 @@ static inline uint32_t nvme_nsid(NvmeNamespace *ns) return 0; } +/* Bytes to LBAs */ +static inline uint64_t nvme_b2l(NvmeNamespace *ns, uint64_t lba) +{ + return lba >> ns->lbaf.ds; +} + static inline size_t nvme_l2b(NvmeNamespace *ns, uint64_t lba) { return lba << ns->lbaf.ds; @@ -285,70 +291,9 @@ static inline bool nvme_ns_ext(NvmeNamespace *ns) return !!NVME_ID_NS_FLBAS_EXTENDED(ns->id_ns.flbas); } -static inline NvmeZoneState nvme_get_zone_state(NvmeZone *zone) +static inline NvmeZoneState nvme_get_zone_state(uint64_t wp) { - return zone->d.zs >> 4; -} - -static inline void nvme_set_zone_state(NvmeZone *zone, NvmeZoneState state) -{ - zone->d.zs = state << 4; -} - -static inline uint64_t nvme_zone_rd_boundary(NvmeNamespace *ns, NvmeZone *zone) -{ - return zone->d.zslba + ns->zone_size; -} - -static inline uint64_t nvme_zone_wr_boundary(NvmeZone *zone) -{ - return zone->d.zslba + zone->d.zcap; -} - -static inline bool nvme_wp_is_valid(NvmeZone *zone) -{ - uint8_t st = nvme_get_zone_state(zone); - - return st != NVME_ZONE_STATE_FULL && - st != NVME_ZONE_STATE_READ_ONLY && - st != NVME_ZONE_STATE_OFFLINE; -} - -static inline void nvme_aor_inc_open(NvmeNamespace *ns) -{ - assert(ns->nr_open_zones >= 0); - if (ns->params.max_open_zones) { - ns->nr_open_zones++; - assert(ns->nr_open_zones <= ns->params.max_open_zones); - } -} - -static inline void nvme_aor_dec_open(NvmeNamespace *ns) -{ - if (ns->params.max_open_zones) { - assert(ns->nr_open_zones > 0); - ns->nr_open_zones--; - } - assert(ns->nr_open_zones >= 0); -} - -static inline void nvme_aor_inc_active(NvmeNamespace *ns) -{ - assert(ns->nr_active_zones >= 0); - if (ns->params.max_active_zones) { - ns->nr_active_zones++; - assert(ns->nr_active_zones <= ns->params.max_active_zones); - } -} - -static inline void nvme_aor_dec_active(NvmeNamespace *ns) -{ - if (ns->params.max_active_zones) { - assert(ns->nr_active_zones > 0); - ns->nr_active_zones--; - assert(ns->nr_active_zones >= ns->nr_open_zones); - } - assert(ns->nr_active_zones >= 0); + return wp >> 60; } static inline void nvme_fdp_stat_inc(uint64_t *a, uint64_t b) diff --git a/include/block/block-common.h b/include/block/block-common.h index a846023a09..7690b05149 100644 --- a/include/block/block-common.h +++ b/include/block/block-common.h @@ -87,6 +87,7 @@ typedef enum BlockZoneOp { BLK_ZO_CLOSE, BLK_ZO_FINISH, BLK_ZO_RESET, + BLK_ZO_OFFLINE, } BlockZoneOp; typedef enum BlockZoneModel { @@ -266,6 +267,13 @@ typedef enum { */ #define BDRV_ZT_IS_CONV(wp) (wp & (1ULL << 63)) +/* + * Clear the zone state, type and attribute information in the wp. + */ +#define BDRV_ZP_GET_WP(wp) ((wp << 6) >> 6) +#define BDRV_ZP_GET_ZS(wp) (wp >> 60) +#define BDRV_ZP_GET_ZA(wp) (wp & ((1ULL << 8) - 1ULL) << 51) + #define BDRV_REQUEST_MAX_SECTORS MIN_CONST(SIZE_MAX >> BDRV_SECTOR_BITS, \ INT_MAX >> BDRV_SECTOR_BITS) #define BDRV_REQUEST_MAX_BYTES (BDRV_REQUEST_MAX_SECTORS << BDRV_SECTOR_BITS) diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h index 825b8dac55..9a81f99eee 100644 --- a/include/block/block_int-common.h +++ b/include/block/block_int-common.h @@ -916,6 +916,8 @@ typedef struct BlockLimits { /* size of data that is associated with a zone in bytes */ uint32_t zd_extension_size; + + uint8_t zone_attribute; } BlockLimits; typedef struct BdrvOpBlocker BdrvOpBlocker; From patchwork Mon Jan 22 19:00:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sam Li X-Patchwork-Id: 13526110 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 51FCDC47DAF for ; Mon, 22 Jan 2024 19:01:17 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rRzXB-0005gX-Ks; Mon, 22 Jan 2024 14:00:37 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rRzXA-0005fz-CR; Mon, 22 Jan 2024 14:00:36 -0500 Received: from mail-wr1-x430.google.com ([2a00:1450:4864:20::430]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rRzX8-00025H-BX; Mon, 22 Jan 2024 14:00:36 -0500 Received: by mail-wr1-x430.google.com with SMTP id ffacd0b85a97d-33934567777so1605617f8f.1; Mon, 22 Jan 2024 11:00:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1705950032; x=1706554832; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Mbe8IJdT/146B0yLIjQfTQC3deMWdsDoJBafFhCKv3w=; b=kp9NF0gXsYcCBHmo/XemZumEs7ooq0sycUuYlNHe+xFypftokrbLLlL2YaxiIgzqe3 ixfWSWCupI80HdQOh19Nsz6LaL2dEOzq1T10diF8NB2+UCZaOxb5m3OAzM7vDWTgtO8o gWF5md73KvHADrWXFAZZ1L66JV1/cBQDH2Ccrc0PfpXhGTjIOSotUUoaHkgRcvcJPoJU 2lqDwk+8wvQhfS2gCX3/RdbkNqWtpJLeVQtJrFcgoI5dainTntfhG2II7Z0W/wXfKxy/ QjaT3t/64EQfnsM34nEFTkRgwDxIr884/P7KdJ8Vr3U/gbitiSZGBH2juA688Mu/zfuQ HV6Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705950032; x=1706554832; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Mbe8IJdT/146B0yLIjQfTQC3deMWdsDoJBafFhCKv3w=; b=B3vHFR3ctrujnmnZQw2XbvD9+52KxfI1SCZNhKo2b9pntrN5o8gqFCU/cjXT4LlBvF Ql3Jmfj8LGLaKaOHgnb4PNsqb4VMafAyblDHSx7n5xHkclfhh/DNdjbJYk9ddewKTpxj Ijnx5sk+zp2VOuZOM6+gvcgy05k4i05gDaTF6ZhudmZa2aQCwiGwX3p9K4x7xzC12gW8 y8Q3Cr58ciwOdW/c5Rg6oohRC74rn4JxPAGd00SEVN0JKmTYcDqRrdZ+E7OOzJCz7lgZ RyVaDVQpBhWxdD6YzXBDF8iNlFYJpZb6IemmMKfWx7Bhu1phmtENFI5yupcQoPyPjkGr qxZw== X-Gm-Message-State: AOJu0Yz6qtjL1b4PdxLpaz+eRwdtfNjnHIHm58kXpPD460SGmJVWJAT/ udfF4tR39XV3x1Ke/+ULNOkDPW9DTThsTBvEYmBkx+g32+o1EV9qkWVCAVSl9uA= X-Google-Smtp-Source: AGHT+IG5LghvUcps8bruBNtCTjQZy3nUnqAJ2aDtzJprHV0a5cs3ubjPP2hBTRSTFUBBoIwhLc+4eA== X-Received: by 2002:a7b:c4ca:0:b0:40e:545b:80e6 with SMTP id g10-20020a7bc4ca000000b0040e545b80e6mr2534498wmk.158.1705950032060; Mon, 22 Jan 2024 11:00:32 -0800 (PST) Received: from localhost.localdomain ([2a02:2454:367:1500:fa08:d4d:b569:ac2d]) by smtp.gmail.com with ESMTPSA id k3-20020a170906a38300b00a298d735a1bsm13842413ejz.149.2024.01.22.11.00.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Jan 2024 11:00:31 -0800 (PST) From: Sam Li To: qemu-devel@nongnu.org Cc: qemu-block@nongnu.org, Kevin Wolf , Paolo Bonzini , stefanha@redhat.com, Peter Xu , David Hildenbrand , dmitry.fomichev@wdc.com, hare@suse.de, Hanna Reitz , Eric Blake , Markus Armbruster , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , dlemoal@kernel.org, Keith Busch , Klaus Jensen , Sam Li Subject: [RFC v3 6/7] hw/nvme: refactor zone append write using block layer APIs Date: Mon, 22 Jan 2024 20:00:12 +0100 Message-Id: <20240122190013.41302-7-faithilikerun@gmail.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240122190013.41302-1-faithilikerun@gmail.com> References: <20240122190013.41302-1-faithilikerun@gmail.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2a00:1450:4864:20::430; envelope-from=faithilikerun@gmail.com; helo=mail-wr1-x430.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Signed-off-by: Sam Li --- block/qcow2.c | 2 +- hw/nvme/ctrl.c | 190 ++++++++++++++++++++++++++++++++----------- include/sysemu/dma.h | 3 + system/dma-helpers.c | 17 ++++ 4 files changed, 162 insertions(+), 50 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index 0bb249fa6e..43ee0f47b9 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -2395,7 +2395,7 @@ static void qcow2_refresh_limits(BlockDriverState *bs, Error **errp) bs->bl.max_open_zones = s->zoned_header.max_open_zones; bs->bl.zone_size = s->zoned_header.zone_size; bs->bl.zone_capacity = s->zoned_header.zone_capacity; - bs->bl.write_granularity = BDRV_SECTOR_SIZE; + bs->bl.write_granularity = BDRV_SECTOR_SIZE; /* physical block size */ bs->bl.zd_extension_size = s->zoned_header.zd_extension_size; } diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index e31aa52c06..de41d8bac8 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -1726,6 +1726,95 @@ static void nvme_misc_cb(void *opaque, int ret) nvme_enqueue_req_completion(nvme_cq(req), req); } +typedef struct NvmeZoneCmdAIOCB { + NvmeRequest *req; + NvmeCmd *cmd; + NvmeCtrl *n; + + union { + struct { + uint32_t partial; + unsigned int nr_zones; + BlockZoneDescriptor *zones; + } zone_report_data; + struct { + int64_t offset; + } zone_append_data; + }; +} NvmeZoneCmdAIOCB; + +static void nvme_blk_zone_append_complete_cb(void *opaque, int ret) +{ + NvmeZoneCmdAIOCB *cb = opaque; + NvmeRequest *req = cb->req; + int64_t *offset = (int64_t *)&req->cqe; + + if (ret) { + nvme_aio_err(req, ret); + } + + *offset = nvme_b2l(req->ns, cb->zone_append_data.offset); + nvme_enqueue_req_completion(nvme_cq(req), req); + g_free(cb); +} + +static inline void nvme_blk_zone_append(BlockBackend *blk, int64_t *offset, + uint32_t align, + BlockCompletionFunc *cb, + NvmeZoneCmdAIOCB *aiocb) +{ + NvmeRequest *req = aiocb->req; + assert(req->sg.flags & NVME_SG_ALLOC); + + if (req->sg.flags & NVME_SG_DMA) { + req->aiocb = dma_blk_zone_append(blk, &req->sg.qsg, (int64_t)offset, + align, cb, aiocb); + } else { + req->aiocb = blk_aio_zone_append(blk, offset, &req->sg.iov, 0, + cb, aiocb); + } +} + +static void nvme_zone_append_cb(void *opaque, int ret) +{ + NvmeZoneCmdAIOCB *aiocb = opaque; + NvmeRequest *req = aiocb->req; + NvmeNamespace *ns = req->ns; + + BlockBackend *blk = ns->blkconf.blk; + + trace_pci_nvme_rw_cb(nvme_cid(req), blk_name(blk)); + + if (ret) { + goto out; + } + + if (ns->lbaf.ms) { + NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd; + uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb) + 1; + int64_t offset = aiocb->zone_append_data.offset; + + if (nvme_ns_ext(ns) || req->cmd.mptr) { + uint16_t status; + + nvme_sg_unmap(&req->sg); + status = nvme_map_mdata(nvme_ctrl(req), nlb, req); + if (status) { + ret = -EFAULT; + goto out; + } + + return nvme_blk_zone_append(blk, &offset, 1, + nvme_blk_zone_append_complete_cb, + aiocb); + } + } + +out: + nvme_blk_zone_append_complete_cb(aiocb, ret); +} + + void nvme_rw_complete_cb(void *opaque, int ret) { NvmeRequest *req = opaque; @@ -3052,6 +3141,9 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest *req, bool append, uint64_t mapped_size = data_size; uint64_t data_offset; BlockBackend *blk = ns->blkconf.blk; + BlockZoneWps *wps = blk_get_zone_wps(blk); + uint32_t zone_size = blk_get_zone_size(blk); + uint32_t zone_idx; uint16_t status; if (nvme_ns_ext(ns)) { @@ -3082,42 +3174,47 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest *req, bool append, } if (blk_get_zone_model(blk)) { - uint32_t zone_size = blk_get_zone_size(blk); - uint32_t zone_idx = slba / zone_size; - int64_t zone_start = zone_idx * zone_size; + assert(wps); + if (zone_size) { + zone_idx = slba / zone_size; + int64_t zone_start = zone_idx * zone_size; + + if (append) { + bool piremap = !!(ctrl & NVME_RW_PIREMAP); + + if (n->params.zasl && + data_size > (uint64_t) + n->page_size << n->params.zasl) { + trace_pci_nvme_err_zasl(data_size); + return NVME_INVALID_FIELD | NVME_DNR; + } - if (append) { - bool piremap = !!(ctrl & NVME_RW_PIREMAP); + rw->slba = cpu_to_le64(slba); - if (n->params.zasl && - data_size > (uint64_t)n->page_size << n->params.zasl) { - trace_pci_nvme_err_zasl(data_size); - return NVME_INVALID_FIELD | NVME_DNR; - } + switch (NVME_ID_NS_DPS_TYPE(ns->id_ns.dps)) { + case NVME_ID_NS_DPS_TYPE_1: + if (!piremap) { + return NVME_INVALID_PROT_INFO | NVME_DNR; + } - rw->slba = cpu_to_le64(slba); - switch (NVME_ID_NS_DPS_TYPE(ns->id_ns.dps)) { - case NVME_ID_NS_DPS_TYPE_1: - if (!piremap) { - return NVME_INVALID_PROT_INFO | NVME_DNR; - } + /* fallthrough */ - /* fallthrough */ + case NVME_ID_NS_DPS_TYPE_2: + if (piremap) { + uint32_t reftag = le32_to_cpu(rw->reftag); + rw->reftag = + cpu_to_le32(reftag + (slba - zone_start)); + } - case NVME_ID_NS_DPS_TYPE_2: - if (piremap) { - uint32_t reftag = le32_to_cpu(rw->reftag); - rw->reftag = cpu_to_le32(reftag + (slba - zone_start)); - } + break; - break; + case NVME_ID_NS_DPS_TYPE_3: + if (piremap) { + return NVME_INVALID_PROT_INFO | NVME_DNR; + } - case NVME_ID_NS_DPS_TYPE_3: - if (piremap) { - return NVME_INVALID_PROT_INFO | NVME_DNR; + break; } - - break; } } @@ -3137,9 +3234,21 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest *req, bool append, goto invalid; } - block_acct_start(blk_get_stats(blk), &req->acct, data_size, - BLOCK_ACCT_WRITE); - nvme_blk_write(blk, data_offset, BDRV_SECTOR_SIZE, nvme_rw_cb, req); + if (append) { + NvmeZoneCmdAIOCB *cb = g_malloc(sizeof(NvmeZoneCmdAIOCB)); + cb->req = req; + cb->zone_append_data.offset = data_offset; + + block_acct_start(blk_get_stats(blk), &req->acct, data_size, + BLOCK_ACCT_ZONE_APPEND); + nvme_blk_zone_append(blk, &cb->zone_append_data.offset, + blk_get_write_granularity(blk), + nvme_zone_append_cb, cb); + } else { + block_acct_start(blk_get_stats(blk), &req->acct, data_size, + BLOCK_ACCT_WRITE); + nvme_blk_write(blk, data_offset, BDRV_SECTOR_SIZE, nvme_rw_cb, req); + } } else { req->aiocb = blk_aio_pwrite_zeroes(blk, data_offset, data_size, BDRV_REQ_MAY_UNMAP, nvme_rw_cb, @@ -3163,24 +3272,7 @@ static inline uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeRequest *req) return nvme_do_write(n, req, false, true); } -typedef struct NvmeZoneCmdAIOCB { - NvmeRequest *req; - NvmeCmd *cmd; - NvmeCtrl *n; - - union { - struct { - uint32_t partial; - unsigned int nr_zones; - BlockZoneDescriptor *zones; - } zone_report_data; - struct { - int64_t offset; - } zone_append_data; - }; -} NvmeZoneCmdAIOCB; - -static inline uint16_t nvme_zone_append(NvmeCtrl *n, NvmeRequest *req) +static uint16_t nvme_zone_append(NvmeCtrl *n, NvmeRequest *req) { return nvme_do_write(n, req, true, false); } diff --git a/include/sysemu/dma.h b/include/sysemu/dma.h index a1ac5bc1b5..680e0b5477 100644 --- a/include/sysemu/dma.h +++ b/include/sysemu/dma.h @@ -301,6 +301,9 @@ BlockAIOCB *dma_blk_read(BlockBackend *blk, BlockAIOCB *dma_blk_write(BlockBackend *blk, QEMUSGList *sg, uint64_t offset, uint32_t align, BlockCompletionFunc *cb, void *opaque); +BlockAIOCB *dma_blk_zone_append(BlockBackend *blk, + QEMUSGList *sg, int64_t offset, uint32_t align, + void (*cb)(void *opaque, int ret), void *opaque); MemTxResult dma_buf_read(void *ptr, dma_addr_t len, dma_addr_t *residual, QEMUSGList *sg, MemTxAttrs attrs); MemTxResult dma_buf_write(void *ptr, dma_addr_t len, dma_addr_t *residual, diff --git a/system/dma-helpers.c b/system/dma-helpers.c index 9b221cf94e..908aff9bc0 100644 --- a/system/dma-helpers.c +++ b/system/dma-helpers.c @@ -274,6 +274,23 @@ BlockAIOCB *dma_blk_write(BlockBackend *blk, DMA_DIRECTION_TO_DEVICE); } +static +BlockAIOCB *dma_blk_zone_append_io_func(int64_t offset, QEMUIOVector *iov, + BlockCompletionFunc *cb, void *cb_opaque, + void *opaque) +{ + BlockBackend *blk = opaque; + return blk_aio_zone_append(blk, (int64_t *)offset, iov, 0, cb, cb_opaque); +} + +BlockAIOCB *dma_blk_zone_append(BlockBackend *blk, + QEMUSGList *sg, int64_t offset, uint32_t align, + void (*cb)(void *opaque, int ret), void *opaque) +{ + return dma_blk_io(blk_get_aio_context(blk), sg, offset, align, + dma_blk_zone_append_io_func, blk, cb, opaque, + DMA_DIRECTION_TO_DEVICE); +} static MemTxResult dma_buf_rw(void *buf, dma_addr_t len, dma_addr_t *residual, QEMUSGList *sg, DMADirection dir, From patchwork Mon Jan 22 19:00:13 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sam Li X-Patchwork-Id: 13526115 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3A328C47DD3 for ; Mon, 22 Jan 2024 19:02:02 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rRzXD-0005hE-D7; Mon, 22 Jan 2024 14:00:39 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rRzXB-0005gd-Qg; Mon, 22 Jan 2024 14:00:37 -0500 Received: from mail-ej1-x633.google.com ([2a00:1450:4864:20::633]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rRzX9-00025P-PO; Mon, 22 Jan 2024 14:00:37 -0500 Received: by mail-ej1-x633.google.com with SMTP id a640c23a62f3a-a2821884a09so264049966b.2; Mon, 22 Jan 2024 11:00:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1705950033; x=1706554833; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=yX93vARGE4EZ8dVsl+J/Ugdwgba6V1YzNVL4jgpJfRk=; b=P4OxwImrrls8c1M/n7Pw/t/0L0doFOnDPJqJuBwUPMMLsQX3Smj4cRhv+XLpKoAG46 to9QB4UITrT5Y+snzIm8wZToOFRocGegR299DLVNojYvzPD1cLjeLamY2YkCOhmekspy iIiCk31NJdQsk+Exx3jT4XdvIuytajaoqbJv4MaYSADU3bR0/AA6kUelgaYBizhHLufa ZNjQnQYQrNx8JwX1IQF7aeeF3xCZrg4z2JBEKfjYZuSU2BfE8TFhjD1YSLQ4KvrnJ6ea pYZgxSyut0a4F3aIbG6257NUaeLSzNS8MTWx1uLkuTL+I7fWMw01zaGT8LCPyOGm5iPB NjdQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705950033; x=1706554833; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=yX93vARGE4EZ8dVsl+J/Ugdwgba6V1YzNVL4jgpJfRk=; b=TeUmVnVdXYyE1a6FB+9K+Zf7gIG8r+L0xlQfmbDGy+8IZfQk+0HOXm9Nvxp4J8LT0S JyQVx+y/llhsg4DD6BbRs/UHQ31FqALR4KoA0Nj55Umed+CyAfY+G/kIKE03pUz5bvEw 4msDCDUUcOjcUcFT7ztmLWy6XTKj9BxqleTFVW5Q+aDRIagGlOIU3pStaqNcWCcxtuLk M+gtAkGLrDJ39Apt5ZGCAjvvp3qd4B5Vr2UKR6UAGufihVcIboIUdZqaChg0lYHeZio/ rMDparAhrCQbEndc/Jv9VwrYuC+JssZsc896hxYnuXt0ZdTjP70jIqfSI0Q4kxbGPZ/t 0H9A== X-Gm-Message-State: AOJu0Yw/PqZT41B4JwFb04uXY9VxWi7HsAa+W+B/SQkHj/bbOIgIGfUP cUeTnQDw/cIEKBLSUdcR3pg+zWh8AP5t+laQHdTBZnCGL7gJ8TpgJnt+v2UlZ4k= X-Google-Smtp-Source: AGHT+IE42KsObPWWVwtqZfQZbkSl3qEB2yLz+M/xoPEcqzhATBev0VdaQ3FX8OwE7mvBMlyfQ3VZSw== X-Received: by 2002:a17:906:1913:b0:a26:c376:d1dc with SMTP id a19-20020a170906191300b00a26c376d1dcmr2742328eje.70.1705950033257; Mon, 22 Jan 2024 11:00:33 -0800 (PST) Received: from localhost.localdomain ([2a02:2454:367:1500:fa08:d4d:b569:ac2d]) by smtp.gmail.com with ESMTPSA id k3-20020a170906a38300b00a298d735a1bsm13842413ejz.149.2024.01.22.11.00.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Jan 2024 11:00:32 -0800 (PST) From: Sam Li To: qemu-devel@nongnu.org Cc: qemu-block@nongnu.org, Kevin Wolf , Paolo Bonzini , stefanha@redhat.com, Peter Xu , David Hildenbrand , dmitry.fomichev@wdc.com, hare@suse.de, Hanna Reitz , Eric Blake , Markus Armbruster , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , dlemoal@kernel.org, Keith Busch , Klaus Jensen , Sam Li Subject: [RFC v3 7/7] hw/nvme: make ZDED persistent Date: Mon, 22 Jan 2024 20:00:13 +0100 Message-Id: <20240122190013.41302-8-faithilikerun@gmail.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240122190013.41302-1-faithilikerun@gmail.com> References: <20240122190013.41302-1-faithilikerun@gmail.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2a00:1450:4864:20::633; envelope-from=faithilikerun@gmail.com; helo=mail-ej1-x633.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Zone descriptor extension data (ZDED) is not persistent across QEMU restarts. The zone descriptor extension valid bit (ZDEV) is part of zone attributes, which sets to one when the ZDED is associated with the zone. With the qcow2 img as the backing file, the NVMe ZNS device stores the zone attributes at the following eight bit of zone type bit of write pointers for each zone. The ZDED is stored as part of zoned metadata as write pointers. Signed-off-by: Sam Li --- block/qcow2.c | 45 ++++++++++++++++++++++++++++++++++++ hw/nvme/ctrl.c | 1 + include/block/block-common.h | 1 + 3 files changed, 47 insertions(+) diff --git a/block/qcow2.c b/block/qcow2.c index 43ee0f47b9..f2d58d86c4 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -25,6 +25,7 @@ #include "qemu/osdep.h" #include "block/qdict.h" +#include "block/nvme.h" #include "sysemu/block-backend.h" #include "qemu/main-loop.h" #include "qemu/module.h" @@ -197,6 +198,17 @@ qcow2_extract_crypto_opts(QemuOpts *opts, const char *fmt, Error **errp) #define QCOW2_ZT_IS_CONV(wp) (wp & 1ULL << 59) +static inline void qcow2_set_za(uint64_t *wp, uint8_t za) +{ + /* + * The zone attribute takes up one byte. Store it after the zoned + * bit. + */ + uint64_t addr = *wp; + addr |= ((uint64_t)za << 51); + *wp = addr; +} + /* * To emulate a real zoned device, closed, empty and full states are * preserved after a power cycle. The open states are in-memory and will @@ -5053,6 +5065,36 @@ unlock: return ret; } +static int coroutine_fn GRAPH_RDLOCK +qcow2_zns_set_zded(BlockDriverState *bs, uint32_t index) +{ + BDRVQcow2State *s = bs->opaque; + int ret; + + qemu_co_mutex_lock(&bs->wps->colock); + uint64_t *wp = &bs->wps->wp[index]; + BlockZoneState zs = qcow2_get_zone_state(bs, index); + if (zs == BLK_ZS_EMPTY) { + if (!qcow2_can_activate_zone(bs)) { + goto unlock; + } + + qcow2_set_za(wp, NVME_ZA_ZD_EXT_VALID); + ret = qcow2_write_wp_at(bs, wp, index); + if (ret < 0) { + error_report("Failed to set zone extension at 0x%" PRIx64 "", *wp); + goto unlock; + } + s->nr_zones_closed++; + qemu_co_mutex_unlock(&bs->wps->colock); + return ret; + } + +unlock: + qemu_co_mutex_unlock(&bs->wps->colock); + return NVME_ZONE_INVAL_TRANSITION; +} + static int coroutine_fn GRAPH_RDLOCK qcow2_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op, int64_t offset, int64_t len) @@ -5110,6 +5152,9 @@ qcow2_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op, case BLK_ZO_OFFLINE: /* There are no transitions from the offline state to any other state */ break; + case BLK_ZO_SET_ZDED: + ret = qcow2_zns_set_zded(bs, index); + break; default: error_report("Unsupported zone op: 0x%x", op); ret = -ENOTSUP; diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index de41d8bac8..2799a3ac31 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -3465,6 +3465,7 @@ static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, NvmeRequest *req) break; case NVME_ZONE_ACTION_SET_ZD_EXT: + op = BLK_ZO_SET_ZDED; int zd_ext_size = blk_get_zd_ext_size(blk); trace_pci_nvme_set_descriptor_extension(slba, zone_idx); if (all || !zd_ext_size) { diff --git a/include/block/block-common.h b/include/block/block-common.h index 7690b05149..7c501e053e 100644 --- a/include/block/block-common.h +++ b/include/block/block-common.h @@ -88,6 +88,7 @@ typedef enum BlockZoneOp { BLK_ZO_FINISH, BLK_ZO_RESET, BLK_ZO_OFFLINE, + BLK_ZO_SET_ZDED, } BlockZoneOp; typedef enum BlockZoneModel {