From patchwork Mon Nov 14 22:46:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Blake X-Patchwork-Id: 13043024 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1A84EC433FE for ; Tue, 15 Nov 2022 00:56:48 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ouiee-0004JT-P3; Mon, 14 Nov 2022 18:14:17 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ouidu-0001o2-7d for qemu-devel@nongnu.org; Mon, 14 Nov 2022 18:13:30 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ouiEJ-0002Bm-M8 for qemu-devel@nongnu.org; Mon, 14 Nov 2022 17:47:05 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1668466023; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Gby92iyFLBUX8D4aDGzFb6Y56Yr37tYcymxCW49Mvac=; b=iTuICV/GYZgtQ92FqgK91lNect5XJqgIZkeCJfy5ohFS1vdhYINtmtUp8lfBd02f/Lgeln UAKuLO7nN51dZbgUr1/ORB2/H2zDHHHqHC+ZE6cuvvX+yLwyvhW11pSxCbUUoQjJvXBuNi sf2jddHR68R67B8v7zFxSeJ8hJoUzNM= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-416-gfeITTyvNcmYc4o97zyKOg-1; Mon, 14 Nov 2022 17:46:59 -0500 X-MC-Unique: gfeITTyvNcmYc4o97zyKOg-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 684D387B2A0; Mon, 14 Nov 2022 22:46:59 +0000 (UTC) Received: from green.redhat.com (unknown [10.2.16.240]) by smtp.corp.redhat.com (Postfix) with ESMTP id E52412024CC0; Mon, 14 Nov 2022 22:46:58 +0000 (UTC) From: Eric Blake To: nbd@other.debian.org Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org, libguestfs@redhat.com Subject: [PATCH v2 1/6] spec: Recommend cap on NBD_REPLY_TYPE_BLOCK_STATUS length Date: Mon, 14 Nov 2022 16:46:50 -0600 Message-Id: <20221114224655.2186173-2-eblake@redhat.com> In-Reply-To: <20221114224655.2186173-1-eblake@redhat.com> References: <20221114224141.cm5jgyxfmvie5xb5@redhat.com> <20221114224655.2186173-1-eblake@redhat.com> MIME-Version: 1.0 Content-type: text/plain X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Received-SPF: pass client-ip=170.10.133.124; envelope-from=eblake@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org The spec was silent on how many extents a server could reply with. However, both qemu and nbdkit (the two server implementations known to have implemented the NBD_CMD_BLOCK_STATUS extension) implement a hard cap, and will truncate the amount of extents in a reply to avoid sending a client a reply so large that the client would treat it as a denial of service attack. Clients currently have no way during negotiation to request such a limit of the server, so it is easier to just document this as a restriction on viable server implementations than to add yet another round of handshaking. Also, mentioning amplification effects is worthwhile. When qemu first implemented NBD_CMD_BLOCK_STATUS for the base:allocation context (qemu commit e7b1948d51, Mar 2018), it behaved as if NBD_CMD_FLAG_REQ_ONE were always passed by the client, and never responded with more than one extent. Later, when adding its qemu:dirty-bitmap:XYZ context extension (qemu commit 3d068aff16, Jun 2018), it added a cap to 128k extents (1M+4 bytes), and that cap was applied to base:allocation once qemu started sending multiple extents for that context as well (qemu commit fb7afc797e, Jul 2018). Qemu extents are never smaller than 512 bytes (other than an exception at the end of a file whose size is not aligned to 512), but even so, a request for just under 4G of block status could produce 8M extents, resulting in a reply of 64M if it were not capped smaller. When nbdkit first implemented NBD_CMD_BLOCK_STATUS (nbdkit 4ca66f70a5, Mar 2019), it did not impose any restriction on the number of extents in the reply chunk. But because it allows extents as small as one byte, it is easy to write a server that can amplify a client's request of status over 1M of the image into a reply over 8M in size, and it was very easy to demonstrate that a hard cap was needed to avoid crashing clients or otherwise killing the connection (a bad server impacting the client negatively). So nbdkit enforced a bound of 1M extents (8M+4 bytes, nbdkit commit 6e0dc839ea, Jun 2019). [Unrelated to this patch, but worth noting for history: nbdkit's situation also has to deal with the fact that it is designed for plugin server implementations; and not capping the number of extents in a reply also posed a problem to nbdkit as the server, where a plugin could exhaust memory and kill the server, unrelated to any size constraints enforced by a client.] Since the limit chosen by these two implementations is different, and since nbdkit has versions that were not limited, add this as a SHOULD NOT instead of MUST NOT constraint on servers implementing block status. It does not matter that qemu picked a smaller limit that it truncates to, since we have already documented that the server may truncate for other reasons (such as it being inefficient to collect that many extents in the first place). But documenting the limit now becomes even more important in the face of a future addition of 64-bit requests, where a client's request is no longer bounded to 4G and could thereby produce even more than 8M extents for the corner case when every 512 bytes is a new extent, if it were not for this recommendation. Reviewed-by: Vladimir Sementsov-Ogievskiy --- v2: Add wording about amplification effect --- doc/proto.md | 51 +++++++++++++++++++++++++++++++-------------------- 1 file changed, 31 insertions(+), 20 deletions(-) diff --git a/doc/proto.md b/doc/proto.md index 3a96703..8f08583 100644 --- a/doc/proto.md +++ b/doc/proto.md @@ -1818,6 +1818,12 @@ MUST initiate a hard disconnect. the different contexts need not have the same number of extents or cumulative extent length. + Servers SHOULD NOT send more than 2^20 extents in a single reply + chunk; in other words, the size of + `NBD_REPLY_TYPE_BLOCK_STATUS` should not be more than 4 + 8*2^20 + (8,388,612 bytes), even if this requires that the server truncate + the response in relation to the *length* requested by the client. + Even if the client did not use the `NBD_CMD_FLAG_REQ_ONE` flag in its request, the server MAY return fewer descriptors in the reply than would be required to fully specify the whole range of requested @@ -2180,26 +2186,31 @@ The following request types exist: `NBD_REPLY_TYPE_BLOCK_STATUS` chunk represent consecutive portions of the file starting from specified *offset*. If the client used the `NBD_CMD_FLAG_REQ_ONE` flag, each chunk contains exactly one - descriptor where the *length* of the descriptor MUST NOT be greater - than the *length* of the request; otherwise, a chunk MAY contain - multiple descriptors, and the final descriptor MAY extend beyond - the original requested size if the server can determine a larger - length without additional effort. On the other hand, the server MAY - return less data than requested. However the server MUST return at - least one status descriptor (and since each status descriptor has - a non-zero length, a client can always make progress on a - successful return). The server SHOULD use different *status* - values between consecutive descriptors where feasible, although - the client SHOULD be prepared to handle consecutive descriptors - with the same *status* value. The server SHOULD use descriptor - lengths that are an integer multiple of 512 bytes where possible - (the first and last descriptor of an unaligned query being the - most obvious places for an exception), and MUST use descriptor - lengths that are an integer multiple of any advertised minimum - block size. The status flags are intentionally defined so that a - server MAY always safely report a status of 0 for any block, - although the server SHOULD return additional status values when - they can be easily detected. + descriptor where the *length* of the descriptor MUST NOT be + greater than the *length* of the request; otherwise, a chunk MAY + contain multiple descriptors, and the final descriptor MAY extend + beyond the original requested size if the server can determine a + larger length without additional effort. On the other hand, the + server MAY return less data than requested. In particular, a + server SHOULD NOT send more than 2^20 status descriptors in a + single chunk. However the server MUST return at least one status + descriptor, and since each status descriptor has a non-zero + length, a client can always make progress on a successful return. + + The server SHOULD use different *status* values between + consecutive descriptors where feasible, although the client SHOULD + be prepared to handle consecutive descriptors with the same + *status* value. The server SHOULD use descriptor lengths that are + an integer multiple of 512 bytes where possible (the first and + last descriptor of an unaligned query being the most obvious + places for an exception), in part to avoid an amplification effect + where a series of smaller descriptors can cause the server's reply + to occupy more bytes than the *length* of the client's request. + The server MUST use descriptor lengths that are an integer + multiple of any advertised minimum block size. The status flags + are intentionally defined so that a server MAY always safely + report a status of 0 for any block, although the server SHOULD + return additional status values when they can be easily detected. If an error occurs, the server SHOULD set the appropriate error code in the error field of an error chunk. However, if the error From patchwork Mon Nov 14 22:46:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Blake X-Patchwork-Id: 13043022 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A7869C4332F for ; Tue, 15 Nov 2022 00:48:13 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ouieX-0003yL-EP; Mon, 14 Nov 2022 18:14:09 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ouidt-0001hU-Mb for qemu-devel@nongnu.org; Mon, 14 Nov 2022 18:13:29 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ouiES-0002E1-1O for qemu-devel@nongnu.org; Mon, 14 Nov 2022 17:47:13 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1668466031; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ysXeA7NeWFPSynfblrqbUhWWKU0rHfVoDoo/BboJqno=; b=PPUfwAiI9c4IfATN40o39W4zS2UJ+QHxijCBE4JqIW8DbLytpCYfjqKZYBwe1lbD10zn0d 2+98viNrW6AmnHuUuyK8gxcOxVXEfW0R7vFTMS+O6SqfZDf9lmUSGG8WB3MF9GcV2tgVJ/ T/bcDvBAnleMQaLPkPocvngQjLZzEbI= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-489-9BpxKyTBOnShZIMt7IMY4A-1; Mon, 14 Nov 2022 17:47:02 -0500 X-MC-Unique: 9BpxKyTBOnShZIMt7IMY4A-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id A8FCB101A54E; Mon, 14 Nov 2022 22:47:01 +0000 (UTC) Received: from green.redhat.com (unknown [10.2.16.240]) by smtp.corp.redhat.com (Postfix) with ESMTP id 422CE2028CE4; Mon, 14 Nov 2022 22:47:01 +0000 (UTC) From: Eric Blake To: nbd@other.debian.org Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org, libguestfs@redhat.com Subject: [PATCH v2 5/6] spec: Introduce NBD_FLAG_BLOCK_STATUS_PAYLOAD Date: Mon, 14 Nov 2022 16:46:54 -0600 Message-Id: <20221114224655.2186173-6-eblake@redhat.com> In-Reply-To: <20221114224655.2186173-1-eblake@redhat.com> References: <20221114224141.cm5jgyxfmvie5xb5@redhat.com> <20221114224655.2186173-1-eblake@redhat.com> MIME-Version: 1.0 Content-type: text/plain X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Received-SPF: pass client-ip=170.10.129.124; envelope-from=eblake@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org NBD_CMD_BLOCK_STATUS currently forces the server to reply to all metacontext ids that the client negotiated via NBD_OPT_SET_META_CONTEXT. But since extended headers make it easy for the client to pass command payloads, we can allow for a client to negotiate multiple metacontexts up front but express dynamic interest in varying subsets of those contexts over the life of the connection, for less wasted effort in responding to NBD_CMD_BLOCK_STATUS. This works by having the command payload supply an effect length and a list of ids the client is currently interested in. Signed-off-by: Eric Blake --- doc/proto.md | 62 +++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 49 insertions(+), 13 deletions(-) diff --git a/doc/proto.md b/doc/proto.md index 14af48d..645a736 100644 --- a/doc/proto.md +++ b/doc/proto.md @@ -397,17 +397,20 @@ additional bytes of payload are present), or if the flag is absent (there is no payload, and *length* instead is an effect length describing how much of the image the request operates on). The command `NBD_CMD_WRITE` MUST use the flag `NBD_CMD_FLAG_PAYLOAD_LEN` -in this mode; while other commands SHOULD avoid the flag if the -server has not indicated extension suppport for payloads on that -command. A server SHOULD initiate hard disconnect if a client sets -the `NBD_CMD_FLAG_PAYLOAD_LEN` flag and uses a *length* larger than -a server's advertised or default maximum payload length (capped at -32 bits by the constraints of `NBD_INFO_BLOCK_SIZE`); in all other -cases, a server SHOULD gracefully consume *length* bytes of payload -(even if it then replies with an `NBD_EINVAL` failure because the -particular command was not expecting a payload), and proceed with -the next client command. Thus, only when *length* is used as an -effective length will it utilize a full 64-bit value. +in this mode; most other commands omit it, although some like +`NBD_CMD_BLOCK_STATUS` optionally support the flag in order to allow +the client to pass additional information in the payload (where the +command documents what the payload will contain, including the +possibility of a separate effect length). A server SHOULD initiate +hard disconnect if a client sets the `NBD_CMD_FLAG_PAYLOAD_LEN` flag +and uses a *length* larger than a server's advertised or default +maximum payload length (capped at 32 bits by the constraints of +`NBD_INFO_BLOCK_SIZE`); in all other cases, a server SHOULD gracefully +consume *length* bytes of payload (even if it then replies with an +`NBD_EINVAL` failure because the particular command was not expecting +a payload), and proceed with the next client command. Thus, only when +*length* is used as an effective length will it utilize a full 64-bit +value. #### Simple reply message @@ -1232,6 +1235,19 @@ The field has the following format: will be faster than a regular write). Clients MUST NOT set the `NBD_CMD_FLAG_FAST_ZERO` request flag unless this transmission flag is set. +- bit 12, `NBD_FLAG_BLOCK_STATUS_PAYLOAD`: Indicates that the server + understands the use of the `NBD_CMD_FLAG_PAYLOAD_LEN` flag to + `NBD_CMD_BLOCK_STATUS` to allow the client to request that the + server filters its response to a specific subset of negotiated + metacontext ids passed in via a client payload, rather than the + default of replying to all metacontext ids. Servers MUST NOT + advertise this bit unless the client successfully negotiates + extended headers via `NBD_OPT_EXTENDED_HEADERS`, and SHOULD NOT + advertise this bit in response to `NBD_OPT_EXPORT_NAME` or + `NBD_OPT_GO` if the client does not negotiate metacontexts with + `NBD_OPT_SET_META_CONTEXT`; clients SHOULD NOT set the + `NBD_CMD_FLAG_PAYLOAD_LEN` flag for `NBD_CMD_BLOCK_STATUS` unless + this transmission flag is set. Clients SHOULD ignore unknown flags. @@ -1915,8 +1931,11 @@ valid may depend on negotiation during the handshake phase. header. With extended headers, the flag MUST be set for `NBD_CMD_WRITE` (as the write command always sends a payload of the bytes to be written); for other commands, the flag will trigger an - `NBD_EINVAL` error unless the server has advertised support for an - extension payload form for the command. + `NBD_EINVAL` error unless the command documents an optional payload + form for the command and the server has implemented that form (an + example being `NBD_CMD_BLOCK_STATUS` providing a payload form for + restricting the response to a particular metacontext id, when the + server advertises `NBD_FLAG_BLOCK_STATUS_PAYLOAD`). ##### Structured reply flags @@ -2464,6 +2483,23 @@ The following request types exist: The server MAY send chunks in a different order than the context ids were assigned in reply to `NBD_OPT_SET_META_CONTEXT`. + If extended headers were negotiated, a server MAY optionally + advertise, via the transmission flag + `NBD_FLAG_BLOCK_STATUS_PAYLOAD`, that it supports an alternative + request form where the client sets `NBD_CMD_FLAG_PAYLOAD_LEN` in + order to pass a payload that informs the server to limit its + replies to the metacontext id(s) in the client's request payload, + rather than giving an answer on all possible metacontext ids. If + the server does not support the payload form, or detects duplicate + or unknown metacontext ids in the client's payload, the server + MUST gracefully consume the client's payload before failing with + `NBD_EINVAL`. The payload form MUST occupy 8 + n*4 bytes, where n + is the number of metacontext ids the client is interested in (as + implied by the payload length), laid out as: + + 64 bits, effect length + n * 32 bits, list of metacontext ids to use + The list of block status descriptors within a given status chunk represent consecutive portions of the file starting from specified *offset*. If the client used the `NBD_CMD_FLAG_REQ_ONE` flag, From patchwork Mon Nov 14 22:46:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Blake X-Patchwork-Id: 13043023 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 75AA2C4332F for ; Tue, 15 Nov 2022 00:50:40 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ouie7-0002UN-TZ; Mon, 14 Nov 2022 18:13:43 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ouidt-0001X6-Q3 for qemu-devel@nongnu.org; Mon, 14 Nov 2022 18:13:29 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ouiEM-0002Cj-Lp for qemu-devel@nongnu.org; Mon, 14 Nov 2022 17:47:08 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1668466026; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cUuI1u+i3cvmWB9fVPbXaz+lkryi7TPySGp8A7zlIVg=; b=ALon9Ou5AD6MWEbAjdvaUPxGDzBqcn3kgxIBRd6DiTyoqvfrrUVW1rSJyqxZXCfdl7qcog Djk81QLG8GSpNBXSDbHgYMHlyif/n+xwu3l5M3aNmx8ikAVluAZ9pSPtrurNKWzzVHVAnT SenbX3SdyhORtRSGaGCFQZvsyG08Qt4= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-513-3Xmdd-qCO4yChnr5xGo6sg-1; Mon, 14 Nov 2022 17:47:02 -0500 X-MC-Unique: 3Xmdd-qCO4yChnr5xGo6sg-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 3ADBB101AA79; Mon, 14 Nov 2022 22:47:02 +0000 (UTC) Received: from green.redhat.com (unknown [10.2.16.240]) by smtp.corp.redhat.com (Postfix) with ESMTP id C795C2024CC0; Mon, 14 Nov 2022 22:47:01 +0000 (UTC) From: Eric Blake To: nbd@other.debian.org Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org, libguestfs@redhat.com Subject: [PATCH v2 6/6] RFC: spec: Introduce NBD_REPLY_TYPE_OFFSET_HOLE_EXT Date: Mon, 14 Nov 2022 16:46:55 -0600 Message-Id: <20221114224655.2186173-7-eblake@redhat.com> In-Reply-To: <20221114224655.2186173-1-eblake@redhat.com> References: <20221114224141.cm5jgyxfmvie5xb5@redhat.com> <20221114224655.2186173-1-eblake@redhat.com> MIME-Version: 1.0 Content-type: text/plain X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Received-SPF: pass client-ip=170.10.129.124; envelope-from=eblake@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Rather than requiring all servers and clients to have a 32-bit limit on maximum NBD_CMD_READ/WRITE sizes, we can choose to standardize support for a 64-bit single I/O transaction now. NBD_REPLY_TYPE_OFFSET_DATA can already handle a large reply, but NBD_REPLY_TYPE_OFFSET_HOLE needs a 64-bit counterpart. By standardizing this, all clients must be prepared to support both types of hole type replies, even though most server implementations of extended replies are likely to only send one hole type. --- As this may mean a corner-case that gets less testing, I have separated it into a separate optional patch. I implemented it in my proof-of-concept, but am happy to drop this patch for what actually goes upstream. In particular, if we foresee clients and servers that WANT to support a payload larger than 4G, it may be worth introducing an NBD_INFO_* that supplies 64-bit block sizing information, rather than our current inherent 32-bit limit of NBD_INFO_BLOCK_SIZE, at the same time as we introduce this reply type. --- doc/proto.md | 73 ++++++++++++++++++++++++++++------------------------ 1 file changed, 40 insertions(+), 33 deletions(-) diff --git a/doc/proto.md b/doc/proto.md index 645a736..9c04411 100644 --- a/doc/proto.md +++ b/doc/proto.md @@ -2008,19 +2008,25 @@ size. 64 bits: offset (unsigned) 32 bits: hole size (unsigned, MUST be nonzero) - At this time, although servers that support extended headers are - permitted to accept client requests for `NBD_CMD_READ` with an - effect length larger than any advertised maximum block payload size - by splitting the reply into multiple chunks, portable clients SHOULD - NOT request a read *length* larger than 32 bits (corresponding to - the maximum block payload constraint implied by - `NBD_INFO_BLOCK_SIZE`), and therefore a 32-bit constraint on the - *hole size* does not represent an arbitrary limitation. Should a - future scenario arise where it can be demonstrated that a client and - server would benefit from an extension allowing a maximum block - payload size to be larger than 32 bits, that extension would also - introduce a counterpart reply type that can express a 64-bit *hole - size*. +* `NBD_REPLY_TYPE_OFFSET_HOLE_EXT` (3) + + This chunk type is in the content chunk category. *length* MUST be + exactly 16. The semantics of this chunk mirror those of + `NBD_REPLY_TYPE_OFFSET_HOLE`, other than the use of a larger *hole + size* field. This chunk type MUST NOT be used unless extended + headers were negotiated with `NBD_OPT_EXTENDED_HEADERS`. + + The payload is structured as: + + 64 bits: offset (unsigned) + 64 bits: hole size (unsigned, MUST be nonzero) + + Note that even though extended headers are in use, a server may + enforce a maximum block size that is smaller than 32 bits, in which + case no valid `NBD_CMD_READ` will have a *length* large enough to + require the use of this chunk type. However, a client using + extended headers MUST be prepared for the server to use either the + compact or extended chunk type. * `NBD_REPLY_TYPE_BLOCK_STATUS` (5) @@ -2218,26 +2224,27 @@ The following request types exist: the following additional constraints. The server MAY split the reply into any number of content chunks - (`NBD_REPLY_TYPE_OFFSET_DATA` and `NBD_REPLY_TYPE_OFFSET_HOLE`); - each chunk MUST describe at least one byte, although to minimize - overhead, the server SHOULD use chunks with lengths and offsets as - an integer multiple of 512 bytes, where possible (the first and - last chunk of an unaligned read being the most obvious places for - an exception). The server MUST NOT send content chunks that - overlap with any earlier content or error chunk, and MUST NOT send - chunks that describe data outside the offset and length of the - request, but MAY send the content chunks in any order (the client - MUST reassemble content chunks into the correct order), and MAY - send additional content chunks even after reporting an error - chunk. A server MAY support read requests larger than the maximum - block payload size by splitting the response across multiple - chunks (in particular, if extended headers are not in use, a - request for more than 2^32 - 8 bytes containing data rather than - holes MUST be split to avoid overflowing the 32-bit - `NBD_REPLY_TYPE_OFFSET_DATA` length field); however, the server is - also permitted to reject large read requests up front, so a client - should be prepared to retry with smaller requests if a large - request fails. + (`NBD_REPLY_TYPE_OFFSET_DATA` and `NBD_REPLY_TYPE_OFFSET_HOLE` for + structured replies, additionally `NBD_REPLY_TYPE_OFFSET_HOLE_EXT` + for extended headers); each chunk MUST describe at least one byte, + although to minimize overhead, the server SHOULD use chunks with + lengths and offsets as an integer multiple of 512 bytes, where + possible (the first and last chunk of an unaligned read being the + most obvious places for an exception). The server MUST NOT send + content chunks that overlap with any earlier content or error + chunk, and MUST NOT send chunks that describe data outside the + offset and length of the request, but MAY send the content chunks + in any order (the client MUST reassemble content chunks into the + correct order), and MAY send additional content chunks even after + reporting an error chunk. A server MAY support read requests + larger than the maximum block payload size by splitting the + response across multiple chunks (in particular, if extended + headers are not in use, a request for more than 2^32 - 8 bytes + containing data rather than holes MUST be split to avoid + overflowing the 32-bit `NBD_REPLY_TYPE_OFFSET_DATA` length field); + however, the server is also permitted to reject large read + requests up front, so a client should be prepared to retry with + smaller requests if a large request fails. When no error is detected, the server MUST send enough data chunks to cover the entire region described by the offset and length of