From patchwork Wed Aug 16 07:08:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sam Li X-Patchwork-Id: 13354629 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id BD0CDC001E0 for ; Wed, 16 Aug 2023 07:09:22 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qWAeP-0007aE-BG; Wed, 16 Aug 2023 03:09:05 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qWAeL-0007YK-9v; Wed, 16 Aug 2023 03:09:01 -0400 Received: from mail-pf1-x42e.google.com ([2607:f8b0:4864:20::42e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qWAeI-0004de-A7; Wed, 16 Aug 2023 03:09:00 -0400 Received: by mail-pf1-x42e.google.com with SMTP id d2e1a72fcca58-6889288a31fso105609b3a.1; Wed, 16 Aug 2023 00:08:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692169736; x=1692774536; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=5Z774aEGAXt2ABHGh4Bp0a2kifeg42fp9F7VLDTZyWo=; b=Tn1VBC6zAv1yFL5boMKFPxYGDJ2C9773+mf9SiUnqSX0vd8hJfUGc7ZKkE0HqNcqbl uq9aYaRaviO/esnRvUsErNQi+JiRHWHpLrvy63Y1UvlcOQWnyhSuFYH+3JraUzHUE4VL cW0sa80jYPsySZw7Ue1cNazaKIabyQ+ZD4IyG/4OI0tTtbue58qLNrqHYfo+5S2L9oSm 2TplJqD8Brbv6CBw6lZ/fmn6B/gMzK4lXtrm9DRHW1x7iaji29VRkByNzKtiDJ4vsD6l rDnR3pAqr91VkgdfY5QwKAogc691dGHTAs4v8e9fQBGq+fQ6CfvyZv7JNp0KberZO8ob PIBQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692169736; x=1692774536; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=5Z774aEGAXt2ABHGh4Bp0a2kifeg42fp9F7VLDTZyWo=; b=d7NJeOyfbLbh34lR+xaHDtlj9NY/W6VQnJKugsTY3DlK4mvRUYLnUPFGaFYI5vBAiv E5trOhsChXKYQuw8wR5iTUPg85cIqNrb2o9aBapW2QnZzNpFvo/aD7WxX5er28VuHwGr jUmBHR7t7Yly0Slla2bv1q+/VrU8npCPsQJzCXjI2773okmLFFs/GvXu87Yh7Kf/E3LE j3qgSEHmUXHkBiSPijHyiOrsjOys1Q8iEIA82PEzN4I+mao8WIfCHhs/aWBQC1yR8NCk NuKoUvX2fJuc7kuV6HQ7nfTO2SSYQ6nX90O0RLy5igu/YVK2apEcUE5jtJeuRYoVad8n T1EA== X-Gm-Message-State: AOJu0Yx6QltDeJLuL5/Ouj7OIKj3SqffpxDXIO3hQMBSNLVaqZYtQoGi hDqzVeorh7zYT6SOzAxZmdHpxzPrQ6TfdUj9WaI= X-Google-Smtp-Source: AGHT+IHewfP6gZSegTjGX+XAsdFA7wT2xoohlzqrSPpPni53HQaDCdfzMA7xklElj2OZStuxZ29K6g== X-Received: by 2002:a05:6a00:1ca2:b0:668:8ad5:778f with SMTP id y34-20020a056a001ca200b006688ad5778fmr4518552pfw.17.1692169735393; Wed, 16 Aug 2023 00:08:55 -0700 (PDT) Received: from fedlinux.. ([106.84.130.68]) by smtp.gmail.com with ESMTPSA id bm17-20020a056a00321100b00640ddad2e0dsm10421065pfb.47.2023.08.16.00.08.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 16 Aug 2023 00:08:54 -0700 (PDT) From: Sam Li To: qemu-devel@nongnu.org Cc: qemu-block@nongnu.org, Klaus Jensen , Markus Armbruster , Hanna Reitz , Peter Xu , David Hildenbrand , dlemoal@kernel.org, Keith Busch , =?utf-8?q?Philippe_Mat?= =?utf-8?q?hieu-Daud=C3=A9?= , Eric Blake , hare@suse.de, Kevin Wolf , stefanha@redhat.com, Paolo Bonzini , dmitry.fomichev@wdc.com, Sam Li Subject: [RFC 4/5] hw/nvme: refactor zone append writes using block layer APIs Date: Wed, 16 Aug 2023 15:08:41 +0800 Message-Id: <20230816070842.5423-1-faithilikerun@gmail.com> X-Mailer: git-send-email 2.40.1 MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::42e; envelope-from=faithilikerun@gmail.com; helo=mail-pf1-x42e.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Signed-off-by: Sam Li --- block/block-backend.c | 8 ++ block/qcow2.c | 7 +- hw/nvme/ctrl.c | 195 ++++++++++++++++++++++-------- include/sysemu/block-backend-io.h | 1 + include/sysemu/dma.h | 3 + softmmu/dma-helpers.c | 17 +++ 6 files changed, 181 insertions(+), 50 deletions(-) diff --git a/block/block-backend.c b/block/block-backend.c index 9c95ae0267..2aafb4cee3 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -2426,6 +2426,14 @@ uint32_t blk_get_nr_zones(BlockBackend *blk) return bs ? bs->bl.nr_zones : 0; } +uint32_t blk_get_write_granularity(BlockBackend *blk) +{ + BlockDriverState *bs = blk_bs(blk); + IO_CODE(); + + return bs ? bs->bl.write_granularity : 0; +} + uint8_t *blk_get_zone_extension(BlockBackend *blk) { BlockDriverState * bs = blk_bs(blk); IO_CODE(); diff --git a/block/qcow2.c b/block/qcow2.c index 41549dd68b..5a038792f1 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -2198,7 +2198,7 @@ static void qcow2_refresh_limits(BlockDriverState *bs, Error **errp) bs->bl.max_active_zones = s->zoned_header.max_active_zones; bs->bl.max_open_zones = s->zoned_header.max_open_zones; bs->bl.zone_size = s->zoned_header.zone_size; - bs->bl.write_granularity = BDRV_SECTOR_SIZE; + bs->bl.write_granularity = 4096; /* physical block size */ } static int qcow2_reopen_prepare(BDRVReopenState *state, @@ -4915,6 +4915,11 @@ qcow2_co_zone_append(BlockDriverState *bs, int64_t *offset, QEMUIOVector *qiov, qemu_co_mutex_lock(&s->wps->colock); uint64_t wp = s->wps->wp[index]; uint64_t wp_i = qcow2_get_wp(wp); + printf("qcow2 offset 0x%lx\n", *offset); + printf("checking wp[%ld]: 0b%lb\n", *offset / bs->bl.zone_size, wp); + for (int i = 0; i < bs->bl.nr_zones; i++) { + printf("Listing wp[%d]: 0b%lb\n", i, s->wps->wp[i]); + } ret = qcow2_co_pwritev_part(bs, wp_i, len, qiov, 0, 0); if (ret == 0) { *offset = wp_i; diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index 8d4c08dc4c..3932b516ed 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -1740,6 +1740,95 @@ static void nvme_misc_cb(void *opaque, int ret) nvme_enqueue_req_completion(nvme_cq(req), req); } +typedef struct NvmeZoneCmdAIOCB { + NvmeRequest *req; + NvmeCmd *cmd; + NvmeCtrl *n; + + union { + struct { + uint32_t partial; + unsigned int nr_zones; + BlockZoneDescriptor *zones; + } zone_report_data; + struct { + int64_t offset; + } zone_append_data; + }; +} NvmeZoneCmdAIOCB; + +static void nvme_blk_zone_append_complete_cb(void *opaque, int ret) +{ + NvmeZoneCmdAIOCB *cb = opaque; + NvmeRequest *req = cb->req; + int64_t *offset = (int64_t *)&req->cqe; + + if (ret) { + nvme_aio_err(req, ret); + } + + *offset = nvme_b2l(req->ns, cb->zone_append_data.offset); + nvme_enqueue_req_completion(nvme_cq(req), req); + g_free(cb); +} + +static inline void nvme_blk_zone_append(BlockBackend *blk, int64_t *offset, + uint32_t align, + BlockCompletionFunc *cb, + NvmeZoneCmdAIOCB *aiocb) +{ + NvmeRequest *req = aiocb->req; + assert(req->sg.flags & NVME_SG_ALLOC); + + if (req->sg.flags & NVME_SG_DMA) { + req->aiocb = dma_blk_zone_append(blk, &req->sg.qsg, (int64_t)offset, + align, cb, aiocb); + } else { + req->aiocb = blk_aio_zone_append(blk, offset, &req->sg.iov, 0, + cb, aiocb); + } +} + +static void nvme_zone_append_cb(void *opaque, int ret) +{ + NvmeZoneCmdAIOCB *aiocb = opaque; + NvmeRequest *req = aiocb->req; + NvmeNamespace *ns = req->ns; + + BlockBackend *blk = ns->blkconf.blk; + + trace_pci_nvme_rw_cb(nvme_cid(req), blk_name(blk)); + + if (ret) { + goto out; + } + + if (ns->lbaf.ms) { + NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd; + uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb) + 1; + int64_t offset = aiocb->zone_append_data.offset; + + if (nvme_ns_ext(ns) || req->cmd.mptr) { + uint16_t status; + + nvme_sg_unmap(&req->sg); + status = nvme_map_mdata(nvme_ctrl(req), nlb, req); + if (status) { + ret = -EFAULT; + goto out; + } + + return nvme_blk_zone_append(blk, &offset, 1, + nvme_blk_zone_append_complete_cb, + aiocb); + } + } + +out: + nvme_blk_zone_append_complete_cb(aiocb, ret); +} + + void nvme_rw_complete_cb(void *opaque, int ret) { NvmeRequest *req = opaque; @@ -3067,6 +3156,9 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest *req, bool append, uint64_t mapped_size = data_size; uint64_t data_offset; BlockBackend *blk = ns->blkconf.blk; + BlockZoneWps *wps = blk_get_zone_wps(blk); + uint32_t zone_size = blk_get_zone_size(blk); + uint32_t zone_idx; uint16_t status; if (nvme_ns_ext(ns)) { @@ -3097,42 +3189,47 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest *req, bool append, } if (blk_get_zone_model(blk)) { - uint32_t zone_size = blk_get_zone_size(blk); - uint32_t zone_idx = slba / zone_size; - int64_t zone_start = zone_idx * zone_size; + assert(wps); + if (zone_size) { + zone_idx = slba / zone_size; + int64_t zone_start = zone_idx * zone_size; + + if (append) { + bool piremap = !!(ctrl & NVME_RW_PIREMAP); + + if (n->params.zasl && + data_size > (uint64_t) + n->page_size << n->params.zasl) { + trace_pci_nvme_err_zasl(data_size); + return NVME_INVALID_FIELD | NVME_DNR; + } - if (append) { - bool piremap = !!(ctrl & NVME_RW_PIREMAP); + rw->slba = cpu_to_le64(slba); - if (n->params.zasl && - data_size > (uint64_t)n->page_size << n->params.zasl) { - trace_pci_nvme_err_zasl(data_size); - return NVME_INVALID_FIELD | NVME_DNR; - } + switch (NVME_ID_NS_DPS_TYPE(ns->id_ns.dps)) { + case NVME_ID_NS_DPS_TYPE_1: + if (!piremap) { + return NVME_INVALID_PROT_INFO | NVME_DNR; + } - rw->slba = cpu_to_le64(slba); - switch (NVME_ID_NS_DPS_TYPE(ns->id_ns.dps)) { - case NVME_ID_NS_DPS_TYPE_1: - if (!piremap) { - return NVME_INVALID_PROT_INFO | NVME_DNR; - } + /* fallthrough */ - /* fallthrough */ + case NVME_ID_NS_DPS_TYPE_2: + if (piremap) { + uint32_t reftag = le32_to_cpu(rw->reftag); + rw->reftag = + cpu_to_le32(reftag + (slba - zone_start)); + } - case NVME_ID_NS_DPS_TYPE_2: - if (piremap) { - uint32_t reftag = le32_to_cpu(rw->reftag); - rw->reftag = cpu_to_le32(reftag + (slba - zone_start)); - } + break; - break; + case NVME_ID_NS_DPS_TYPE_3: + if (piremap) { + return NVME_INVALID_PROT_INFO | NVME_DNR; + } - case NVME_ID_NS_DPS_TYPE_3: - if (piremap) { - return NVME_INVALID_PROT_INFO | NVME_DNR; + break; } - - break; } } @@ -3152,9 +3249,21 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest *req, bool append, goto invalid; } - block_acct_start(blk_get_stats(blk), &req->acct, data_size, - BLOCK_ACCT_WRITE); - nvme_blk_write(blk, data_offset, BDRV_SECTOR_SIZE, nvme_rw_cb, req); + if (append) { + NvmeZoneCmdAIOCB *cb = g_malloc(sizeof(NvmeZoneCmdAIOCB)); + cb->req = req; + cb->zone_append_data.offset = data_offset; + + block_acct_start(blk_get_stats(blk), &req->acct, data_size, + BLOCK_ACCT_ZONE_APPEND); + nvme_blk_zone_append(blk, &cb->zone_append_data.offset, + blk_get_write_granularity(blk), + nvme_zone_append_cb, cb); + } else { + block_acct_start(blk_get_stats(blk), &req->acct, data_size, + BLOCK_ACCT_WRITE); + nvme_blk_write(blk, data_offset, BDRV_SECTOR_SIZE, nvme_rw_cb, req); + } } else { req->aiocb = blk_aio_pwrite_zeroes(blk, data_offset, data_size, BDRV_REQ_MAY_UNMAP, nvme_rw_cb, @@ -3178,24 +3287,7 @@ static inline uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeRequest *req) return nvme_do_write(n, req, false, true); } -typedef struct NvmeZoneCmdAIOCB { - NvmeRequest *req; - NvmeCmd *cmd; - NvmeCtrl *n; - - union { - struct { - uint32_t partial; - unsigned int nr_zones; - BlockZoneDescriptor *zones; - } zone_report_data; - struct { - int64_t offset; - } zone_append_data; - }; -} NvmeZoneCmdAIOCB; - -static inline uint16_t nvme_zone_append(NvmeCtrl *n, NvmeRequest *req) +static uint16_t nvme_zone_append(NvmeCtrl *n, NvmeRequest *req) { return nvme_do_write(n, req, true, false); } @@ -3333,6 +3425,11 @@ static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, NvmeRequest *req) NvmeNamespace *ns = req->ns; NvmeZoneMgmtAIOCB *iocb; uint64_t slba = 0; + uint64_t offset; + BlockBackend *blk = ns->blkconf.blk; + uint32_t zone_size = blk_get_zone_size(blk); + uint64_t size = zone_size * blk_get_nr_zones(blk); + int64_t len; uint32_t zone_idx = 0; uint16_t status; uint8_t action = cmd->zsa; diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-io.h index f69aa1094a..fcbdd93dea 100644 --- a/include/sysemu/block-backend-io.h +++ b/include/sysemu/block-backend-io.h @@ -109,6 +109,7 @@ uint32_t blk_get_max_append_sectors(BlockBackend *blk); uint32_t blk_get_nr_zones(BlockBackend *blk); uint8_t *blk_get_zone_extension(BlockBackend *blk); uint32_t blk_get_zd_ext_size(BlockBackend *blk); +uint32_t blk_get_write_granularity(BlockBackend *blk); BlockZoneWps *blk_get_zone_wps(BlockBackend *blk); void blk_io_plug(void); diff --git a/include/sysemu/dma.h b/include/sysemu/dma.h index a1ac5bc1b5..680e0b5477 100644 --- a/include/sysemu/dma.h +++ b/include/sysemu/dma.h @@ -301,6 +301,9 @@ BlockAIOCB *dma_blk_read(BlockBackend *blk, BlockAIOCB *dma_blk_write(BlockBackend *blk, QEMUSGList *sg, uint64_t offset, uint32_t align, BlockCompletionFunc *cb, void *opaque); +BlockAIOCB *dma_blk_zone_append(BlockBackend *blk, + QEMUSGList *sg, int64_t offset, uint32_t align, + void (*cb)(void *opaque, int ret), void *opaque); MemTxResult dma_buf_read(void *ptr, dma_addr_t len, dma_addr_t *residual, QEMUSGList *sg, MemTxAttrs attrs); MemTxResult dma_buf_write(void *ptr, dma_addr_t len, dma_addr_t *residual, diff --git a/softmmu/dma-helpers.c b/softmmu/dma-helpers.c index 2463964805..88bc13264b 100644 --- a/softmmu/dma-helpers.c +++ b/softmmu/dma-helpers.c @@ -282,6 +282,23 @@ BlockAIOCB *dma_blk_write(BlockBackend *blk, DMA_DIRECTION_TO_DEVICE); } +static +BlockAIOCB *dma_blk_zone_append_io_func(int64_t offset, QEMUIOVector *iov, + BlockCompletionFunc *cb, void *cb_opaque, + void *opaque) +{ + BlockBackend *blk = opaque; + return blk_aio_zone_append(blk, (int64_t *)offset, iov, 0, cb, cb_opaque); +} + +BlockAIOCB *dma_blk_zone_append(BlockBackend *blk, + QEMUSGList *sg, int64_t offset, uint32_t align, + void (*cb)(void *opaque, int ret), void *opaque) +{ + return dma_blk_io(blk_get_aio_context(blk), sg, offset, align, + dma_blk_zone_append_io_func, blk, cb, opaque, + DMA_DIRECTION_TO_DEVICE); +} static MemTxResult dma_buf_rw(void *buf, dma_addr_t len, dma_addr_t *residual, QEMUSGList *sg, DMADirection dir, From patchwork Wed Aug 16 07:08:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sam Li X-Patchwork-Id: 13354630 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3BD3AC001DF for ; Wed, 16 Aug 2023 07:09:27 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qWAeU-0007bo-Pj; Wed, 16 Aug 2023 03:09:10 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qWAeT-0007bH-3L; Wed, 16 Aug 2023 03:09:09 -0400 Received: from mail-pf1-x432.google.com ([2607:f8b0:4864:20::432]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qWAeQ-0004gF-LF; Wed, 16 Aug 2023 03:09:08 -0400 Received: by mail-pf1-x432.google.com with SMTP id d2e1a72fcca58-6887918ed20so715611b3a.2; Wed, 16 Aug 2023 00:09:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692169744; x=1692774544; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=SkovOXGRt2vR87i/uRJorqMFrEuKu7r1DpRh9lBafx8=; b=fuZZ6YsGUgr+L6rP1r1xoGmWdXKV4IX4cxJNMB13Im/T0QgzsJNhMuBrs72wTAYOsy sZAwk7Qqb6T+r+HM339g+3KRKhp12Rd8rM0vIWOzuKYFpyT6kzDuwmIVzTqNQKWYaCpu EYPgE2B10EAwoJOmBNrLqvObe3nZJPyBgfVY/VB4LDhPKgJzeavrc2XxL5wSndvL9DcS r4qOYi4iolOqMeKrzgovftWGXj05jdSM6N77J8rE11KQ1DPLXRw3wYaYRxlQK4O7NhiI gnNmmh0BjkL/o+ofHxtmd6V9Gz/GGqN8vp1glhhlY1JoAKSLdKL64vd7IjY8tX0sUgqH f/ew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692169744; x=1692774544; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=SkovOXGRt2vR87i/uRJorqMFrEuKu7r1DpRh9lBafx8=; b=aQ1F+KhzcvV2GzTW4Y3CI6SiXjHqbYJcLUhCq8B1VZ59UQ9rAUEILE+/s9qKcs2rlZ XmFsoDwFc7+TyFMQRa4AOG3COkZx2vzfJjmbEWxH+zAU733ekFeU7FWOjwW/q+jEycRF BAmYPBnByBWNKy/KJ629xeNe2ttE/QvsVfIrXAikAwgfGlCDg1QKfLRq974y+ZmJk0Sw 4qkW1QbgrM5XdhQfpvp0fVDT5mCzQ+yHQ5Q+fmvzMbn/mYsC9FZ18nCgK76zsxYGH6pM DSyF8J376o6npoGlGHReZDrRvOWnbo4Nch4+b2PbNBNw1aoOAbTQeFtUTla3WaqXzNCe fGjA== X-Gm-Message-State: AOJu0YwZf2yQSABZX/pPLkpYVUq2cxjE4iNYtm/1fgZm1miwfj7XOu9n DtR3hkNZrf1nMJ07TEhSjc+OeQeRni71irj7EMI= X-Google-Smtp-Source: AGHT+IGBdnBRe3avDoCxYF3qbS7cNAWXF5TOXFC5G0542dP8Y2H6Tw56s9zd5C2gHndewtTacKU+9g== X-Received: by 2002:a05:6a20:325b:b0:135:4858:683 with SMTP id hm27-20020a056a20325b00b0013548580683mr1323502pzc.48.1692169743773; Wed, 16 Aug 2023 00:09:03 -0700 (PDT) Received: from fedlinux.. ([106.84.130.68]) by smtp.gmail.com with ESMTPSA id bm17-20020a056a00321100b00640ddad2e0dsm10421065pfb.47.2023.08.16.00.08.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 16 Aug 2023 00:09:03 -0700 (PDT) From: Sam Li To: qemu-devel@nongnu.org Cc: qemu-block@nongnu.org, Klaus Jensen , Markus Armbruster , Hanna Reitz , Peter Xu , David Hildenbrand , dlemoal@kernel.org, Keith Busch , =?utf-8?q?Philippe_Mat?= =?utf-8?q?hieu-Daud=C3=A9?= , Eric Blake , hare@suse.de, Kevin Wolf , stefanha@redhat.com, Paolo Bonzini , dmitry.fomichev@wdc.com, Sam Li Subject: [RFC 5/5] hw/nvme: make ZDED persistent Date: Wed, 16 Aug 2023 15:08:42 +0800 Message-Id: <20230816070842.5423-2-faithilikerun@gmail.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230816070842.5423-1-faithilikerun@gmail.com> References: <20230816070842.5423-1-faithilikerun@gmail.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::432; envelope-from=faithilikerun@gmail.com; helo=mail-pf1-x432.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Zone descriptor extension data (ZDED) is not persistent across QEMU restarts. The zone descriptor extension valid bit (ZDEV) is part of zone attributes, which sets to one when the ZDED is associated with the zone. With the qcow2-ZNS file as the backing file, the NVMe ZNS device stores the zone attributes at the following eight bit of zoned bit of write pointers for each zone. The ZDED is stored as part of zoned metadata as write pointers. Signed-off-by: Sam Li --- block/qcow2.c | 44 +++++++++++++++++++++++++++++++++++- hw/nvme/ctrl.c | 6 +---- include/block/block-common.h | 1 + 3 files changed, 45 insertions(+), 6 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index 5a038792f1..ac5ecef559 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -25,6 +25,7 @@ #include "qemu/osdep.h" #include "block/qdict.h" +#include "block/nvme.h" #include "sysemu/block-backend.h" #include "qemu/main-loop.h" #include "qemu/module.h" @@ -214,6 +215,17 @@ static inline void qcow2_set_wp(uint64_t *wp, BlockZoneState zs) *wp = addr; } +static inline void qcow2_set_za(uint64_t *wp, uint8_t za) +{ + /* + * The zone attribute takes up one byte. Store it after the zoned + * bit. + */ + uint64_t addr = *wp; + addr |= ((uint64_t)za << 51); + *wp = addr; +} + /* * File wp tracking: reset zone, finish zone and append zone can * change the value of write pointer. All zone operations will change @@ -308,7 +320,7 @@ static int qcow2_check_open(BlockDriverState *bs) /* * The zoned device has limited zone resources of open, closed, active - * zones. + * zones. Check if we can manage a zone without exceeding those limits. */ static int qcow2_check_zone_resources(BlockDriverState *bs, BlockZoneState zs) @@ -4801,6 +4813,33 @@ unlock: return ret; } +static int qcow2_zns_set_zded(BlockDriverState *bs, uint32_t index) +{ + BDRVQcow2State *s = bs->opaque; + int ret; + + qemu_co_mutex_lock(&s->wps->colock); + uint64_t *wp = &s->wps->wp[index]; + BlockZoneState zs = qcow2_get_zs(*wp); + if (zs == BLK_ZS_EMPTY) { + ret = qcow2_check_zone_resources(bs, zs); + if (ret < 0) { + return ret; + } + + qcow2_set_za(wp, NVME_ZA_ZD_EXT_VALID); + ret = qcow2_write_wp_at(bs, wp, index, BLK_ZO_CLOSE); + if (ret < 0) { + error_report("Failed to set zone extension at 0x%" PRIx64 "", *wp); + return ret; + } + s->nr_zones_closed++; + return ret; + } + + return NVME_ZONE_INVAL_TRANSITION; +} + static int coroutine_fn qcow2_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op, int64_t offset, int64_t len) { @@ -4857,6 +4896,9 @@ static int coroutine_fn qcow2_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op, case BLK_ZO_OFFLINE: ret = qcow2_write_wp_at(bs, &wps->wp[index], index, BLK_ZO_OFFLINE); break; + case BLK_ZO_SET_ZDED: + ret = qcow2_zns_set_zded(bs, index); + break; default: error_report("Unsupported zone op: 0x%x", op); ret = -ENOTSUP; diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index 3932b516ed..fcd774e3f7 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -3425,11 +3425,6 @@ static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, NvmeRequest *req) NvmeNamespace *ns = req->ns; NvmeZoneMgmtAIOCB *iocb; uint64_t slba = 0; - uint64_t offset; - BlockBackend *blk = ns->blkconf.blk; - uint32_t zone_size = blk_get_zone_size(blk); - uint64_t size = zone_size * blk_get_nr_zones(blk); - int64_t len; uint32_t zone_idx = 0; uint16_t status; uint8_t action = cmd->zsa; @@ -3485,6 +3480,7 @@ static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, NvmeRequest *req) break; case NVME_ZONE_ACTION_SET_ZD_EXT: + op = BLK_ZO_SET_ZDED; int zd_ext_size = blk_get_zd_ext_size(blk); trace_pci_nvme_set_descriptor_extension(slba, zone_idx); if (all || !zd_ext_size) { diff --git a/include/block/block-common.h b/include/block/block-common.h index 0cbed607a8..b369e77607 100644 --- a/include/block/block-common.h +++ b/include/block/block-common.h @@ -84,6 +84,7 @@ typedef enum BlockZoneOp { BLK_ZO_FINISH, BLK_ZO_RESET, BLK_ZO_OFFLINE, + BLK_ZO_SET_ZDED, } BlockZoneOp; typedef enum BlockZoneModel {