From patchwork Fri Dec 3 23:14:34 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Blake X-Patchwork-Id: 12655973 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7F774C433FE for ; Fri, 3 Dec 2021 23:17:50 +0000 (UTC) Received: from localhost ([::1]:47484 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mtHoL-0000OI-7K for qemu-devel@archiver.kernel.org; Fri, 03 Dec 2021 18:17:49 -0500 Received: from eggs.gnu.org ([209.51.188.92]:37166) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mtHlS-0006PB-UJ for qemu-devel@nongnu.org; Fri, 03 Dec 2021 18:14:50 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:60855) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mtHlM-0002mJ-88 for qemu-devel@nongnu.org; Fri, 03 Dec 2021 18:14:50 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1638573283; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9fDho8t5AzIpzqegc9wrGUlVId38U8TLonjG8R3g7c0=; b=blhTMoabYzEpecrG8iYus+jp0sH9YUgDaIWh3Mrz1bNPdudMf7qeDeR1lgYXqLExjvIvms woI+ok29/76ydHCEQYbK8VajhXG5A1WFUjsW9TzG9JCMaV6UnIr/FaLvFzwf7enqdPBcUg iJ95aqMr9Vav/sUde1WH+g8mHyoJ/sM= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-522-wZUI_ZrfN1-b3nT8R0gcCQ-1; Fri, 03 Dec 2021 18:14:40 -0500 X-MC-Unique: wZUI_ZrfN1-b3nT8R0gcCQ-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 26E4885EE61; Fri, 3 Dec 2021 23:14:39 +0000 (UTC) Received: from blue.redhat.com (unknown [10.2.16.22]) by smtp.corp.redhat.com (Postfix) with ESMTP id 371A918032; Fri, 3 Dec 2021 23:14:35 +0000 (UTC) From: Eric Blake To: nbd@other.debian.org Subject: [PATCH] spec: Add NBD_OPT_EXTENDED_HEADERS Date: Fri, 3 Dec 2021 17:14:34 -0600 Message-Id: <20211203231434.3900824-1-eblake@redhat.com> In-Reply-To: <20211203231307.wmtbw7r72tyzkkax@redhat.com> References: <20211203231307.wmtbw7r72tyzkkax@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=eblake@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=170.10.133.124; envelope-from=eblake@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -34 X-Spam_score: -3.5 X-Spam_bar: --- X-Spam_report: (-3.5 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.717, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, T_SPF_TEMPERROR=0.01 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: nsoffer@redhat.com, vsementsov@virtuozzo.com, qemu-devel@nongnu.org, qemu-block@nongnu.org, libguestfs@redhat.com Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Add a new negotiation feature where the client and server agree to use larger packet headers on every packet sent during transmission phase. This has two purposes: first, it makes it possible to perform operations like trim, write zeroes, and block status on more than 2^32 bytes in a single command; this in turn requires that some structured replies from the server also be extended to match. The wording chosen here is careful to permit a server to use either flavor in its reply (that is, a request less than 32-bits can trigger an extended reply, and conversely a request larger than 32-bits can trigger a compact reply). Second, when structured replies are active, clients have to deal with the difference between 16- and 20-byte headers of simple vs. structured replies, which impacts performance if the client must perform multiple syscalls to first read the magic before knowing how many additional bytes to read. In extended header mode, all headers are the same width, so the client can read a full header before deciding whether the header describes a simple or structured reply. Similarly, by having extended mode use a power-of-2 sizing, it becomes easier to manipulate headers within a single cache line, even if it requires padding bytes sent over the wire. However, note that this change only affects the headers; as data payloads can still be unaligned (for example, a client performing 1-byte reads or writes), we would need to negotiate yet another extension if we wanted to ensure that all NBD transmission packets started on an 8-byte boundary after option haggling has completed. This spec addition was done in parallel with a proof of concept implementation in qemu (server and client) and libnbd (client), and I also have plans to implement it in nbdkit (server). Signed-off-by: Eric Blake --- Available at https://repo.or.cz/nbd/ericb.git/shortlog/refs/tags/exthdr-v1 doc/proto.md | 218 +++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 177 insertions(+), 41 deletions(-) diff --git a/doc/proto.md b/doc/proto.md index 3a877a9..46560b6 100644 --- a/doc/proto.md +++ b/doc/proto.md @@ -295,6 +295,21 @@ reply is also problematic for error handling of the `NBD_CMD_READ` request. Therefore, structured replies can be used to create a a context-free server stream; see below. +The results of client negotiation also determine whether the client +and server will utilize only compact requests and replies, or whether +both sides will use only extended packets. Compact messages are the +default, but inherently limit single transactions to a 32-bit window +starting at a 64-bit offset. Extended messages make it possible to +perform 64-bit transactions (although typically only for commands that +do not include a data payload). Furthermore, when structured replies +have been negotiated, compact messages require the client to perform +partial reads to determine which reply packet style (simple or +structured) is on the wire before knowing the length of the rest of +the reply, which can reduce client performance. With extended +messages, all packet headers have a fixed length of 32 bytes, and +although this results in more traffic over the network due to padding, +the resulting layout is friendlier for performance. + Replies need not be sent in the same order as requests (i.e., requests may be handled by the server asynchronously), and structured reply chunks from one request may be interleaved with reply messages from @@ -343,7 +358,9 @@ may be useful. #### Request message -The request message, sent by the client, looks as follows: +The compact request message, sent by the client when extended +transactions are not negotiated using `NBD_OPT_EXTENDED_HEADERS`, +looks as follows: C: 32 bits, 0x25609513, magic (`NBD_REQUEST_MAGIC`) C: 16 bits, command flags @@ -353,14 +370,26 @@ C: 64 bits, offset (unsigned) C: 32 bits, length (unsigned) C: (*length* bytes of data if the request is of type `NBD_CMD_WRITE`) +If negotiation agreed on extended transactions with +`NBD_OPT_EXTENDED_HEADERS`, the client instead uses extended requests: + +C: 32 bits, 0x21e41c71, magic (`NBD_REQUEST_EXT_MAGIC`) +C: 16 bits, command flags +C: 16 bits, type +C: 64 bits, handle +C: 64 bits, offset (unsigned) +C: 64 bits, length (unsigned) +C: (*length* bytes of data if the request is of type `NBD_CMD_WRITE`) + #### Simple reply message The simple reply message MUST be sent by the server in response to all requests if structured replies have not been negotiated using -`NBD_OPT_STRUCTURED_REPLY`. If structured replies have been negotiated, a simple -reply MAY be used as a reply to any request other than `NBD_CMD_READ`, -but only if the reply has no data payload. The message looks as -follows: +`NBD_OPT_STRUCTURED_REPLY`. If structured replies have been +negotiated, a simple reply MAY be used as a reply to any request other +than `NBD_CMD_READ`, but only if the reply has no data payload. If +extended headers were not negotiated using `NBD_OPT_EXTENDED_HEADERS`, +the message looks as follows: S: 32 bits, 0x67446698, magic (`NBD_SIMPLE_REPLY_MAGIC`; used to be `NBD_REPLY_MAGIC`) @@ -369,6 +398,16 @@ S: 64 bits, handle S: (*length* bytes of data if the request is of type `NBD_CMD_READ` and *error* is zero) +If extended headers were negotiated using `NBD_OPT_EXTENDED_HEADERS`, +the message looks like: + +S: 32 bits, 0x60d12fd6, magic (`NBD_SIMPLE_REPLY_EXT_MAGIC`) +S: 32 bits, error (MAY be zero) +S: 64 bits, handle +S: 128 bits, padding (MUST be zero) +S: (*length* bytes of data if the request is of type `NBD_CMD_READ` and + *error* is zero) + #### Structured reply chunk message Some of the major downsides of the default simple reply to @@ -410,7 +449,9 @@ considered successful only if it did not contain any error chunks, although the client MAY be able to determine partial success based on the chunks received. -A structured reply chunk message looks as follows: +If extended headers were not negotiated using +`NBD_OPT_EXTENDED_HEADERS`, a structured reply chunk message looks as +follows: S: 32 bits, 0x668e33ef, magic (`NBD_STRUCTURED_REPLY_MAGIC`) S: 16 bits, flags @@ -423,6 +464,17 @@ The use of *length* in the reply allows context-free division of the overall server traffic into individual reply messages; the *type* field describes how to further interpret the payload. +If extended headers were negotiated using `NBD_OPT_EXTENDED_HEADERS`, +the message looks like: + +S: 32 bits, 0x6e8a278c, magic (`NBD_STRUCTURED_REPLY_EXT_MAGIC`) +S: 16 bits, flags +S: 16 bits, type +S: 64 bits, handle +S: 64 bits, length of payload (unsigned) +S: 64 bits, padding (MUST be zero) +S: *length* bytes of payload data (if *length* is nonzero) + #### Terminating the transmission phase There are two methods of terminating the transmission phase: @@ -870,15 +922,19 @@ The procedure works as follows: server supports. - During transmission, a client can then indicate interest in metadata for a given region by way of the `NBD_CMD_BLOCK_STATUS` command, - where *offset* and *length* indicate the area of interest. The - server MUST then respond with the requested information, for all + where *offset* and *length* indicate the area of interest. +- The server MUST then respond with the requested information, for all contexts which were selected during negotiation. For every metadata - context, the server sends one set of extent chunks, where the sizes - of the extents MUST be less than or equal to the length as specified - in the request. Each extent comes with a *flags* field, the - semantics of which are defined by the metadata context. -- A server MUST reply to `NBD_CMD_BLOCK_STATUS` with a structured - reply of type `NBD_REPLY_TYPE_BLOCK_STATUS`. + context, the server sends one set of extent chunks, using + `NBD_REPLY_TYPE_BLOCK_STATUS` or `NBD_REPLY_TYPE_BLOCK_STATUS_EXT` + (the latter is only possible if the client also negotiated + `NBD_OPT_EXTENDED_HEADERS`). Each extent comes with a *flags* + field, the semantics of which are defined by the metadata context. + +The client's requested *size* is only a hint to the server, so the +summed size of extents in the server's reply may be shorter, or in +some cases longer, than the original request, and may even differ +between contexts when multiple metadata contexts were negotiated. A client MUST NOT use `NBD_CMD_BLOCK_STATUS` unless it selected a nonzero number of metadata contexts during negotiation, and used the @@ -1179,10 +1235,10 @@ of the newstyle negotiation. When this command succeeds, the server MUST NOT preserve any negotiation state (such as a request for - `NBD_OPT_STRUCTURED_REPLY`, or metadata contexts from - `NBD_OPT_SET_META_CONTEXT`) issued before this command. A client - SHOULD defer all stateful option requests until after it - determines whether encryption is available. + `NBD_OPT_STRUCTURED_REPLY` or `NBD_OPT_EXTENDED_HEADERS`, or + metadata contexts from `NBD_OPT_SET_META_CONTEXT`) issued before + this command. A client SHOULD defer all stateful option requests + until after it determines whether encryption is available. See the section on TLS above for further details. @@ -1460,6 +1516,26 @@ of the newstyle negotiation. option does not select any metadata context, provided the client then does not attempt to issue `NBD_CMD_BLOCK_STATUS` commands. +* `NBD_OPT_EXTENDED_HEADERS` (11) + + The client wishes to use extended headers during the transmission + phase. The client MUST NOT send any additional data with the + option, and the server SHOULD reject a request that includes data + with `NBD_REP_ERR_INVALID`. + + The server replies with the following, or with an error permitted + elsewhere in this document: + + - `NBD_REP_ACK`: Extended headers have been negotiated; the client + MUST use the 32-byte extended request header, and the server + MUST use the 32-byte extended reply header. + - For backwards compatibility, clients SHOULD be prepared to also + handle `NBD_REP_ERR_UNSUP`; in this case, only the compact + transmission headers will be used. + + If the client requests `NBD_OPT_STARTTLS` after this option, it + MUST renegotiate extended headers. + #### Option reply types These values are used in the "reply type" field, sent by the server @@ -1713,12 +1789,12 @@ unrecognized flags. #### Structured reply types -These values are used in the "type" field of a structured reply. -Some chunk types can additionally be categorized by role, such as -*error chunks* or *content chunks*. Each type determines how to -interpret the "length" bytes of payload. If the client receives -an unknown or unexpected type, other than an *error chunk*, it -MUST initiate a hard disconnect. +These values are used in the "type" field of a structured reply. Some +chunk types can additionally be categorized by role, such as *error +chunks*, *content chunks*, or *status chunks*. Each type determines +how to interpret the "length" bytes of payload. If the client +receives an unknown or unexpected type, other than an *error chunk*, +it MUST initiate a hard disconnect. * `NBD_REPLY_TYPE_NONE` (0) @@ -1761,13 +1837,34 @@ MUST initiate a hard disconnect. 64 bits: offset (unsigned) 32 bits: hole size (unsigned, MUST be nonzero) +* `NBD_REPLY_TYPE_OFFSET_HOLE_EXT` (3) + + This chunk type is in the content chunk category. *length* MUST be + exactly 16. The semantics of this chunk mirror those of + `NBD_REPLY_TYPE_OFFSET_HOLE`, other than the use of a larger *hole + size* field. This chunk type MUST NOT be used unless extended + headers were negotiated with `NBD_OPT_EXTENDED_HEADERS`. + + The payload is structured as: + + 64 bits: offset (unsigned) + 64 bits: hole size (unsigned, MUST be nonzero) + + Note that even when extended headers are in use, a server may + enforce a maximum block size that is smaller than 32 bits, in which + case no valid `NBD_CMD_READ` will have a *length* large enough to + require the use of this chunk type. However, a client using + extended headers MUST be prepared for the server to use either the + compact or extended chunk type. + * `NBD_REPLY_TYPE_BLOCK_STATUS` (5) - *length* MUST be 4 + (a positive integer multiple of 8). This reply - represents a series of consecutive block descriptors where the sum - of the length fields within the descriptors is subject to further - constraints documented below. This chunk type MUST appear - exactly once per metadata ID in a structured reply. + This chunk type is in the status chunk category. *length* MUST be + 4 + (a positive integer multiple of 8). This reply represents a + series of consecutive block descriptors where the sum of the length + fields within the descriptors is subject to further constraints + documented below. Each negotiated metadata ID must have exactly one + status chunk in the overall structured reply. The payload starts with: @@ -1796,9 +1893,36 @@ MUST initiate a hard disconnect. information to the client, if looking up the information would be too resource-intensive for the server, so long as at least one extent is returned. Servers should however be aware that most - clients implementations will then simply ask for the next extent + client implementations will then simply ask for the next extent instead. +* `NBD_REPLY_TYPE_BLOCK_STATUS_EXT` (6) + + This chunk type is in the status chunk category. *length* MUST be + 4 + (a positive multiple of 16). The semantics of this chunk mirror + those of `NBD_REPLY_TYPE_BLOCK_STATUS`, other than the use of a + larger *extent length* field, as well as added padding to ease + alignment. This chunk type MUST NOT be used unless extended headers + were negotiated with `NBD_OPT_EXTENDED_HEADERS`. + + The payload starts with: + + 32 bits, metadata context ID + + and is followed by a list of one or more descriptors, each with this + layout: + + 64 bits, length of the extent to which the status below + applies (unsigned, MUST be nonzero) + 32 bits, status flags + 32 bits, padding (MUST be zero) + + Note that even when extended headers are in use, the client MUST be + prepared for the server to use either the compact or extended chunk + type, regardless of whether the client's hinted length was more or + less than 32 bits, but the server MUST use exactly one of the two + chunk types per negotiated metacontext ID. + All error chunk types have bit 15 set, and begin with the same *error*, *message length*, and optional *message* fields as `NBD_REPLY_TYPE_ERROR`. If nonzero, *message length* indicates @@ -1812,7 +1936,10 @@ remaining structured fields at the end. be at least 6. This chunk represents that an error occurred, and the client MAY NOT make any assumptions about partial success. This type SHOULD NOT be used more than once in a - structured reply. Valid as a reply to any request. + structured reply. Valid as a reply to any request. Note that + *message length* MUST NOT exceed the 4096 bytes string length limit, + and therefore there is no need for a counterpart extended-length + error chunk type. The payload is structured as: @@ -1867,7 +1994,8 @@ The following request types exist: If structured replies were not negotiated, then a read request MUST always be answered by a simple reply, as documented above - (using magic 0x67446698 `NBD_SIMPLE_REPLY_MAGIC`, and containing + (using `NBD_SIMPLE_REPLY_MAGIC` or `NBD_SIMPLE_REPLY_EXT_MAGIC` + according to whether extended headers are in use, and containing length bytes of data according to the client's request). If an error occurs, the server SHOULD set the appropriate error code @@ -1883,7 +2011,8 @@ The following request types exist: If structured replies are negotiated, then a read request MUST result in a structured reply with one or more chunks (each using - magic 0x668e33ef `NBD_STRUCTURED_REPLY_MAGIC`), where the final + `NBD_STRUCTURED_REPLY_MAGIC` or `NBD_STRUCTURED_REPLY_EXT_MAGIC` + according to whether extended headers are in use), where the final chunk has the flag `NBD_REPLY_FLAG_DONE`, and with the following additional constraints. @@ -1897,13 +2026,14 @@ The following request types exist: chunks that describe data outside the offset and length of the request, but MAY send the content chunks in any order (the client MUST reassemble content chunks into the correct order), and MAY - send additional content chunks even after reporting an error chunk. - Note that a request for more than 2^32 - 8 bytes MUST be split - into at least two chunks, so as not to overflow the length field - of a reply while still allowing space for the offset of each - chunk. When no error is detected, the server MUST send enough - data chunks to cover the entire region described by the offset and - length of the client's request. + send additional content chunks even after reporting an error + chunk. Note that if extended headers are not in use, a request + for more than 2^32 - 8 bytes MUST be split into at least two + chunks, so as not to overflow the length field of a reply while + still allowing space for the offset of each chunk. When no error + is detected, the server MUST send enough data chunks to cover the + entire region described by the offset and length of the client's + request. To minimize traffic, the server MAY use a content or error chunk as the final chunk by setting the `NBD_REPLY_FLAG_DONE` flag, but @@ -2136,13 +2266,19 @@ The following request types exist: server returned at least one metadata context without an error. This in turn requires the client to first negotiate structured replies. For a successful return, the server MUST use a structured - reply, containing exactly one chunk of type + reply, containing exactly one status chunk of type `NBD_REPLY_TYPE_BLOCK_STATUS` per selected context id, where the status field of each descriptor is determined by the flags field as defined by the metadata context. The server MAY send chunks in a different order than the context ids were assigned in reply to `NBD_OPT_SET_META_CONTEXT`. + If extended headers were negotiated via + `NBD_OPT_EXTENDED_HEADERS`, the server may use + `NBD_REPLY_TYPE_BLOCK_STATUS_EXT` instead of + `NBD_REPLY_TYPE_BLOCK_STATUS` as the reply chunk for a metacontext + id. + The list of block status descriptors within the `NBD_REPLY_TYPE_BLOCK_STATUS` chunk represent consecutive portions of the file starting from specified *offset*. If the client used From patchwork Fri Dec 3 23:15:27 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Blake X-Patchwork-Id: 12655975 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id EE564C433EF for ; Fri, 3 Dec 2021 23:18:18 +0000 (UTC) Received: from localhost ([::1]:49434 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mtHoo-0001hF-0S for qemu-devel@archiver.kernel.org; Fri, 03 Dec 2021 18:18:18 -0500 Received: from eggs.gnu.org ([209.51.188.92]:37464) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mtHmg-0007HM-Fs for qemu-devel@nongnu.org; Fri, 03 Dec 2021 18:16:06 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:33364) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mtHmc-00034D-TL for qemu-devel@nongnu.org; Fri, 03 Dec 2021 18:16:05 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1638573362; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TjYgrIAEtIty4X+aptzEXKj4wdU3TPHUuNxJCtjWAdU=; b=M2oL+2db6MCJBMyeko1oe64MGPJfHGlLw7eWmwdU30/qVC2I5qjCPXpLPTmPs7tizw2rIp a1javQVngMBAyy3oKm+0dPqwnxmbcZV2RU72Uf6O96wvIhBJ8LIRKusPmCHYMK8hss6Ko6 IaKMP6TFWdctUZtRtu/LUEcQaQKea8M= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-402-ImXpeQSlP8yuRF2XULVQ-Q-1; Fri, 03 Dec 2021 18:15:59 -0500 X-MC-Unique: ImXpeQSlP8yuRF2XULVQ-Q-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 20F8F801B01; Fri, 3 Dec 2021 23:15:58 +0000 (UTC) Received: from blue.redhat.com (unknown [10.2.16.22]) by smtp.corp.redhat.com (Postfix) with ESMTP id 061D95D9D5; Fri, 3 Dec 2021 23:15:56 +0000 (UTC) From: Eric Blake To: qemu-devel@nongnu.org Subject: [PATCH 02/14] qemu-io: Utilize 64-bit status during map Date: Fri, 3 Dec 2021 17:15:27 -0600 Message-Id: <20211203231539.3900865-3-eblake@redhat.com> In-Reply-To: <20211203231539.3900865-1-eblake@redhat.com> References: <20211203231307.wmtbw7r72tyzkkax@redhat.com> <20211203231539.3900865-1-eblake@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=eblake@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=170.10.133.124; envelope-from=eblake@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -34 X-Spam_score: -3.5 X-Spam_bar: --- X-Spam_report: (-3.5 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.717, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kevin Wolf , vsementsov@virtuozzo.com, qemu-block@nongnu.org, nbd@other.debian.org, nsoffer@redhat.com, Hanna Reitz , libguestfs@redhat.com Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" The block layer has supported 64-bit block status from drivers since commit 86a3d5c688 ("block: Add .bdrv_co_block_status() callback", v2.12) and friends, with individual driver callbacks responsible for capping things where necessary. Artificially capping things below 2G in the qemu-io 'map' command, added in commit d6a644bbfe ("block: Make bdrv_is_allocated() byte-based", v2.10) is thus no longer necessary. One way to test this is with qemu-nbd as server on a raw file larger than 4G (the entire file should show as allocated), plus 'qemu-io -f raw -c map nbd://localhost --trace=nbd_\*' as client. Prior to this patch, the NBD_CMD_BLOCK_STATUS requests are fragmented at 0x7ffffe00 distances; with this patch, the fragmenting changes to 0x7fffffff (since the NBD protocol is currently still limited to 32-bit transactions - see block/nbd.c:nbd_client_co_block_status). Then in later patches, once I add an NBD extension for a 64-bit block status, the same map command completes with just one NBD_CMD_BLOCK_STATUS. Signed-off-by: Eric Blake Reviewed-by: Vladimir Sementsov-Ogievskiy --- qemu-io-cmds.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c index 46593d632d8f..954955c12fb9 100644 --- a/qemu-io-cmds.c +++ b/qemu-io-cmds.c @@ -1993,11 +1993,9 @@ static int map_is_allocated(BlockDriverState *bs, int64_t offset, int64_t bytes, int64_t *pnum) { int64_t num; - int num_checked; int ret, firstret; - num_checked = MIN(bytes, BDRV_REQUEST_MAX_BYTES); - ret = bdrv_is_allocated(bs, offset, num_checked, &num); + ret = bdrv_is_allocated(bs, offset, bytes, &num); if (ret < 0) { return ret; } @@ -2009,8 +2007,7 @@ static int map_is_allocated(BlockDriverState *bs, int64_t offset, offset += num; bytes -= num; - num_checked = MIN(bytes, BDRV_REQUEST_MAX_BYTES); - ret = bdrv_is_allocated(bs, offset, num_checked, &num); + ret = bdrv_is_allocated(bs, offset, bytes, &num); if (ret == firstret && num) { *pnum += num; } else { From patchwork Fri Dec 3 23:17:33 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Blake X-Patchwork-Id: 12656037 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3D2BBC433F5 for ; Fri, 3 Dec 2021 23:27:37 +0000 (UTC) Received: from localhost ([::1]:51616 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mtHxo-00064e-4G for qemu-devel@archiver.kernel.org; Fri, 03 Dec 2021 18:27:36 -0500 Received: from eggs.gnu.org ([209.51.188.92]:38196) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mtHoa-0002Jj-Rw for qemu-devel@nongnu.org; Fri, 03 Dec 2021 18:18:04 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:28982) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mtHoX-0003J8-Lh for qemu-devel@nongnu.org; Fri, 03 Dec 2021 18:18:04 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1638573480; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZTpBsqjX/tDSRP89cOo6/sPp8ri6Mo4ddFoj0TZLdkA=; b=Rq0OG1ZfUrIjksh53MeyoB7KJSa9Ip/mQSxu3elt4UAhGlBhyNYJcLlUDUswgESUju9kXx sWo8tBxaV7ktz6fZqn9J7j9K/OqksdlvOfM4gbUbWQBSwLLqo5y8GKpdfORf5LzpNUXPU+ kX5Ft4KxcbINs6UwNVeeV9h4GWv6NOU= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-287-V1nvNtiePZaZ2BbwnDcmZw-1; Fri, 03 Dec 2021 18:17:59 -0500 X-MC-Unique: V1nvNtiePZaZ2BbwnDcmZw-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id E0B3F801B01; Fri, 3 Dec 2021 23:17:57 +0000 (UTC) Received: from blue.redhat.com (unknown [10.2.16.22]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0AB2C5DF2B; Fri, 3 Dec 2021 23:17:56 +0000 (UTC) From: Eric Blake To: libguestfs@redhat.com Subject: [libnbd PATCH 05/13] protocol: Prepare to receive 64-bit replies Date: Fri, 3 Dec 2021 17:17:33 -0600 Message-Id: <20211203231741.3901263-6-eblake@redhat.com> In-Reply-To: <20211203231741.3901263-1-eblake@redhat.com> References: <20211203231307.wmtbw7r72tyzkkax@redhat.com> <20211203231741.3901263-1-eblake@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=eblake@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=170.10.129.124; envelope-from=eblake@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -34 X-Spam_score: -3.5 X-Spam_bar: --- X-Spam_report: (-3.5 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.717, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: nsoffer@redhat.com, vsementsov@virtuozzo.com, qemu-devel@nongnu.org, qemu-block@nongnu.org, nbd@other.debian.org Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Support receiving headers for 64-bit replies if extended headers were negotiated. We already insist that the server not send us too much payload in one reply, so we can exploit that and merge the 64-bit length back into a normalized 32-bit field for the rest of the payload length calculations. The NBD protocol specifically made extended simple and structured replies both occupy 32 bytes, while the handle field is still in the same offset between all reply types. Note that if we negotiate extended headers, but a non-compliant server replies with a non-extended header, we will stall waiting for the server to send more bytes rather than noticing that the magic number is wrong. The alternative would be to read just the first 4 bytes of magic, then determine how many more bytes to expect; but that would require more states and syscalls, and not worth it since the typical server will be compliant. At this point, h->extended_headers is permanently false (we can't enable it until all other aspects of the protocol have likewise been converted). --- lib/internal.h | 8 +++- generator/states-reply-structured.c | 59 +++++++++++++++++++---------- generator/states-reply.c | 31 +++++++++++---- 3 files changed, 68 insertions(+), 30 deletions(-) diff --git a/lib/internal.h b/lib/internal.h index 07378588..c9f84441 100644 --- a/lib/internal.h +++ b/lib/internal.h @@ -222,8 +222,12 @@ struct nbd_handle { } __attribute__((packed)) or; struct nbd_export_name_option_reply export_name_reply; struct nbd_simple_reply simple_reply; + struct nbd_simple_reply_ext simple_reply_ext; struct { - struct nbd_structured_reply structured_reply; + union { + struct nbd_structured_reply structured_reply; + struct nbd_structured_reply_ext structured_reply_ext; + } hdr; union { struct nbd_structured_reply_offset_data offset_data; struct nbd_structured_reply_offset_hole offset_hole; @@ -233,7 +237,7 @@ struct nbd_handle { uint64_t offset; /* Only used for NBD_REPLY_TYPE_ERROR_OFFSET */ } __attribute__((packed)) error; } payload; - } __attribute__((packed)) sr; + } sr; uint16_t gflags; uint32_t cflags; uint32_t len; diff --git a/generator/states-reply-structured.c b/generator/states-reply-structured.c index 5524e000..1b675e8d 100644 --- a/generator/states-reply-structured.c +++ b/generator/states-reply-structured.c @@ -45,19 +45,23 @@ structured_reply_in_bounds (uint64_t offset, uint32_t length, STATE_MACHINE { REPLY.STRUCTURED_REPLY.START: - /* We've only read the simple_reply. The structured_reply is longer, - * so read the remaining part. + /* We've only read the simple_reply. Unless we have extended headers, + * the structured_reply is longer, so read the remaining part. */ if (!h->structured_replies) { set_error (0, "server sent unexpected structured reply"); SET_NEXT_STATE(%.DEAD); return 0; } - h->rbuf = &h->sbuf; - h->rbuf += sizeof h->sbuf.simple_reply; - h->rlen = sizeof h->sbuf.sr.structured_reply; - h->rlen -= sizeof h->sbuf.simple_reply; - SET_NEXT_STATE (%RECV_REMAINING); + if (h->extended_headers) + SET_NEXT_STATE (%CHECK); + else { + h->rbuf = &h->sbuf; + h->rbuf += sizeof h->sbuf.simple_reply; + h->rlen = sizeof h->sbuf.sr.hdr.structured_reply; + h->rlen -= sizeof h->sbuf.simple_reply; + SET_NEXT_STATE (%RECV_REMAINING); + } return 0; REPLY.STRUCTURED_REPLY.RECV_REMAINING: @@ -75,12 +79,21 @@ STATE_MACHINE { struct command *cmd = h->reply_cmd; uint16_t flags, type; uint64_t cookie; - uint32_t length; + uint64_t length; - flags = be16toh (h->sbuf.sr.structured_reply.flags); - type = be16toh (h->sbuf.sr.structured_reply.type); - cookie = be64toh (h->sbuf.sr.structured_reply.handle); - length = be32toh (h->sbuf.sr.structured_reply.length); + flags = be16toh (h->sbuf.sr.hdr.structured_reply.flags); + type = be16toh (h->sbuf.sr.hdr.structured_reply.type); + cookie = be64toh (h->sbuf.sr.hdr.structured_reply.handle); + if (h->extended_headers) { + length = be64toh (h->sbuf.sr.hdr.structured_reply_ext.length); + if (h->sbuf.sr.hdr.structured_reply_ext.pad) { + set_error (0, "server sent non-zero padding in structured reply header"); + SET_NEXT_STATE(%.DEAD); + return 0; + } + } + else + length = be32toh (h->sbuf.sr.hdr.structured_reply.length); assert (cmd); assert (cmd->cookie == cookie); @@ -97,6 +110,10 @@ STATE_MACHINE { SET_NEXT_STATE (%.DEAD); return 0; } + /* For convenience, we now normalize extended replies into compact, + * doable since we validated length fits in 32 bits. + */ + h->sbuf.sr.hdr.structured_reply.length = length; if (NBD_REPLY_TYPE_IS_ERR (type)) { if (length < sizeof h->sbuf.sr.payload.error.error) { @@ -207,7 +224,7 @@ STATE_MACHINE { SET_NEXT_STATE (%.READY); return 0; case 0: - length = be32toh (h->sbuf.sr.structured_reply.length); + length = h->sbuf.sr.hdr.structured_reply.length; /* normalized in CHECK */ msglen = be16toh (h->sbuf.sr.payload.error.error.len); if (msglen > length - sizeof h->sbuf.sr.payload.error.error || msglen > sizeof h->sbuf.sr.payload.error.msg) { @@ -233,9 +250,9 @@ STATE_MACHINE { SET_NEXT_STATE (%.READY); return 0; case 0: - length = be32toh (h->sbuf.sr.structured_reply.length); + length = h->sbuf.sr.hdr.structured_reply.length; /* normalized in CHECK */ msglen = be16toh (h->sbuf.sr.payload.error.error.len); - type = be16toh (h->sbuf.sr.structured_reply.type); + type = be16toh (h->sbuf.sr.hdr.structured_reply.type); length -= sizeof h->sbuf.sr.payload.error.error + msglen; @@ -281,7 +298,7 @@ STATE_MACHINE { return 0; case 0: error = be32toh (h->sbuf.sr.payload.error.error.error); - type = be16toh (h->sbuf.sr.structured_reply.type); + type = be16toh (h->sbuf.sr.hdr.structured_reply.type); assert (cmd); /* guaranteed by CHECK */ error = nbd_internal_errno_of_nbd_error (error); @@ -339,7 +356,7 @@ STATE_MACHINE { SET_NEXT_STATE (%.READY); return 0; case 0: - length = be32toh (h->sbuf.sr.structured_reply.length); + length = h->sbuf.sr.hdr.structured_reply.length; /* normalized in CHECK */ offset = be64toh (h->sbuf.sr.payload.offset_data.offset); assert (cmd); /* guaranteed by CHECK */ @@ -377,7 +394,7 @@ STATE_MACHINE { SET_NEXT_STATE (%.READY); return 0; case 0: - length = be32toh (h->sbuf.sr.structured_reply.length); + length = h->sbuf.sr.hdr.structured_reply.length; /* normalized in CHECK */ offset = be64toh (h->sbuf.sr.payload.offset_data.offset); assert (cmd); /* guaranteed by CHECK */ @@ -454,7 +471,7 @@ STATE_MACHINE { SET_NEXT_STATE (%.READY); return 0; case 0: - length = be32toh (h->sbuf.sr.structured_reply.length); + length = h->sbuf.sr.hdr.structured_reply.length; /* normalized in CHECK */ assert (cmd); /* guaranteed by CHECK */ assert (cmd->type == NBD_CMD_BLOCK_STATUS); @@ -489,7 +506,7 @@ STATE_MACHINE { SET_NEXT_STATE (%.READY); return 0; case 0: - length = be32toh (h->sbuf.sr.structured_reply.length); + length = h->sbuf.sr.hdr.structured_reply.length; /* normalized in CHECK */ assert (cmd); /* guaranteed by CHECK */ assert (cmd->type == NBD_CMD_BLOCK_STATUS); @@ -535,7 +552,7 @@ STATE_MACHINE { REPLY.STRUCTURED_REPLY.FINISH: uint16_t flags; - flags = be16toh (h->sbuf.sr.structured_reply.flags); + flags = be16toh (h->sbuf.sr.hdr.structured_reply.flags); if (flags & NBD_REPLY_FLAG_DONE) { SET_NEXT_STATE (%^FINISH_COMMAND); } diff --git a/generator/states-reply.c b/generator/states-reply.c index 9099a76a..949e982e 100644 --- a/generator/states-reply.c +++ b/generator/states-reply.c @@ -1,5 +1,5 @@ /* nbd client library in userspace: state machine - * Copyright (C) 2013-2019 Red Hat Inc. + * Copyright (C) 2013-2021 Red Hat Inc. * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public @@ -68,7 +68,10 @@ STATE_MACHINE { assert (h->rlen == 0); h->rbuf = &h->sbuf; - h->rlen = sizeof h->sbuf.simple_reply; + if (h->extended_headers) + h->rlen = sizeof h->sbuf.simple_reply_ext; + else + h->rlen = sizeof h->sbuf.simple_reply; r = h->sock->ops->recv (h, h->sock, h->rbuf, h->rlen); if (r == -1) { @@ -113,13 +116,27 @@ STATE_MACHINE { uint64_t cookie; magic = be32toh (h->sbuf.simple_reply.magic); - if (magic == NBD_SIMPLE_REPLY_MAGIC) { + switch (magic) { + case NBD_SIMPLE_REPLY_MAGIC: + case NBD_SIMPLE_REPLY_EXT_MAGIC: + if ((magic == NBD_SIMPLE_REPLY_MAGIC) == h->extended_headers) + goto invalid; + if (magic == NBD_SIMPLE_REPLY_EXT_MAGIC && + (h->sbuf.simple_reply_ext.pad1 || h->sbuf.simple_reply_ext.pad2)) { + set_error (0, "server sent non-zero padding in simple reply header"); + SET_NEXT_STATE (%.DEAD); + return 0; + } SET_NEXT_STATE (%SIMPLE_REPLY.START); - } - else if (magic == NBD_STRUCTURED_REPLY_MAGIC) { + break; + case NBD_STRUCTURED_REPLY_MAGIC: + case NBD_STRUCTURED_REPLY_EXT_MAGIC: + if ((magic == NBD_STRUCTURED_REPLY_MAGIC) == h->extended_headers) + goto invalid; SET_NEXT_STATE (%STRUCTURED_REPLY.START); - } - else { + break; + default: + invalid: SET_NEXT_STATE (%.DEAD); /* We've probably lost synchronization. */ set_error (0, "invalid reply magic"); return 0; From patchwork Fri Dec 3 23:15:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Blake X-Patchwork-Id: 12655977 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3BDA9C433F5 for ; Fri, 3 Dec 2021 23:18:52 +0000 (UTC) Received: from localhost ([::1]:51160 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mtHpL-00030n-9V for qemu-devel@archiver.kernel.org; Fri, 03 Dec 2021 18:18:51 -0500 Received: from eggs.gnu.org ([209.51.188.92]:37574) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mtHmq-0007Od-A0 for qemu-devel@nongnu.org; Fri, 03 Dec 2021 18:16:17 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:21238) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mtHmn-00035i-KT for qemu-devel@nongnu.org; Fri, 03 Dec 2021 18:16:16 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1638573373; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7Xn4eGnXhtbCzT+Y2C8YtDpc6Mjj6t9VR75rJyrRD5o=; b=Tj8jsOHU2tC92lXgoDerNgR/VzbP+2YazjeyailQd3igJJOwluBmwCGS0rob0x4qQ8nYLp tteistXFmjJFHsDH1MdwH6RRPfmKnVh7gHG0Og2wxgbJuic/hSms+So7xu3Iut7VM0jQh6 Pk46zTNqbrI7VLWP2WfpDyZcmJMjjII= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-352-U1AN9G9NMzqAf9BLlE8gyA-1; Fri, 03 Dec 2021 18:16:09 -0500 X-MC-Unique: U1AN9G9NMzqAf9BLlE8gyA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id D2488190A7A0; Fri, 3 Dec 2021 23:16:08 +0000 (UTC) Received: from blue.redhat.com (unknown [10.2.16.22]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9D28E5D9D5; Fri, 3 Dec 2021 23:16:07 +0000 (UTC) From: Eric Blake To: qemu-devel@nongnu.org Subject: [PATCH 06/14] nbd: Prepare for 64-bit requests Date: Fri, 3 Dec 2021 17:15:31 -0600 Message-Id: <20211203231539.3900865-7-eblake@redhat.com> In-Reply-To: <20211203231539.3900865-1-eblake@redhat.com> References: <20211203231307.wmtbw7r72tyzkkax@redhat.com> <20211203231539.3900865-1-eblake@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=eblake@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=170.10.133.124; envelope-from=eblake@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -34 X-Spam_score: -3.5 X-Spam_bar: --- X-Spam_report: (-3.5 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.717, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kevin Wolf , vsementsov@virtuozzo.com, qemu-block@nongnu.org, nbd@other.debian.org, nsoffer@redhat.com, Hanna Reitz , libguestfs@redhat.com Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Widen the length field of NBDRequest to 64-bits, although we can assert that all current uses are still under 32 bits. Move the request magic number to nbd.h, to live alongside the reply magic number. Add a bool that will eventually track whether the client successfully negotiated extended headers with the server, allowing the nbd driver to pass larger requests along where possible; although in this patch it always remains false for no semantic change yet. Signed-off-by: Eric Blake --- include/block/nbd.h | 19 +++++++++++-------- nbd/nbd-internal.h | 3 +-- block/nbd.c | 35 ++++++++++++++++++++++++----------- nbd/client.c | 8 +++++--- nbd/server.c | 11 ++++++++--- nbd/trace-events | 8 ++++---- 6 files changed, 53 insertions(+), 31 deletions(-) diff --git a/include/block/nbd.h b/include/block/nbd.h index 3d0689b69367..732314aaba11 100644 --- a/include/block/nbd.h +++ b/include/block/nbd.h @@ -52,17 +52,16 @@ typedef struct NBDOptionReplyMetaContext { /* Transmission phase structs * - * Note: these are _NOT_ the same as the network representation of an NBD - * request and reply! + * Note: NBDRequest is _NOT_ the same as the network representation of an NBD + * request! */ -struct NBDRequest { +typedef struct NBDRequest { uint64_t handle; uint64_t from; - uint32_t len; + uint64_t len; /* Must fit 32 bits unless extended headers negotiated */ uint16_t flags; /* NBD_CMD_FLAG_* */ - uint16_t type; /* NBD_CMD_* */ -}; -typedef struct NBDRequest NBDRequest; + uint16_t type; /* NBD_CMD_* */ +} NBDRequest; typedef struct NBDSimpleReply { uint32_t magic; /* NBD_SIMPLE_REPLY_MAGIC */ @@ -235,6 +234,9 @@ enum { */ #define NBD_MAX_STRING_SIZE 4096 +/* Transmission request structure */ +#define NBD_REQUEST_MAGIC 0x25609513 + /* Two types of reply structures */ #define NBD_SIMPLE_REPLY_MAGIC 0x67446698 #define NBD_STRUCTURED_REPLY_MAGIC 0x668e33ef @@ -293,6 +295,7 @@ struct NBDExportInfo { /* In-out fields, set by client before nbd_receive_negotiate() and * updated by server results during nbd_receive_negotiate() */ bool structured_reply; + bool extended_headers; bool base_allocation; /* base:allocation context for NBD_CMD_BLOCK_STATUS */ /* Set by server results during nbd_receive_negotiate() and @@ -322,7 +325,7 @@ int nbd_receive_export_list(QIOChannel *ioc, QCryptoTLSCreds *tlscreds, Error **errp); int nbd_init(int fd, QIOChannelSocket *sioc, NBDExportInfo *info, Error **errp); -int nbd_send_request(QIOChannel *ioc, NBDRequest *request); +int nbd_send_request(QIOChannel *ioc, NBDRequest *request, bool ext_hdr); int coroutine_fn nbd_receive_reply(BlockDriverState *bs, QIOChannel *ioc, NBDReply *reply, Error **errp); int nbd_client(int fd); diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h index 1b2141ab4b2d..0016793ff4b1 100644 --- a/nbd/nbd-internal.h +++ b/nbd/nbd-internal.h @@ -1,7 +1,7 @@ /* * NBD Internal Declarations * - * Copyright (C) 2016 Red Hat, Inc. + * Copyright (C) 2016-2021 Red Hat, Inc. * * This work is licensed under the terms of the GNU GPL, version 2 or later. * See the COPYING file in the top-level directory. @@ -45,7 +45,6 @@ #define NBD_OLDSTYLE_NEGOTIATE_SIZE (8 + 8 + 8 + 4 + 124) #define NBD_INIT_MAGIC 0x4e42444d41474943LL /* ASCII "NBDMAGIC" */ -#define NBD_REQUEST_MAGIC 0x25609513 #define NBD_OPTS_MAGIC 0x49484156454F5054LL /* ASCII "IHAVEOPT" */ #define NBD_CLIENT_MAGIC 0x0000420281861253LL #define NBD_REP_MAGIC 0x0003e889045565a9LL diff --git a/block/nbd.c b/block/nbd.c index 5ef462db1b7f..3e9875241bec 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -2,7 +2,7 @@ * QEMU Block driver for NBD * * Copyright (c) 2019 Virtuozzo International GmbH. - * Copyright (C) 2016 Red Hat, Inc. + * Copyright (C) 2016-2021 Red Hat, Inc. * Copyright (C) 2008 Bull S.A.S. * Author: Laurent Vivier * @@ -300,7 +300,7 @@ int coroutine_fn nbd_co_do_establish_connection(BlockDriverState *bs, */ NBDRequest request = { .type = NBD_CMD_DISC }; - nbd_send_request(s->ioc, &request); + nbd_send_request(s->ioc, &request, s->info.extended_headers); yank_unregister_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name), nbd_yank, bs); @@ -470,7 +470,7 @@ static int nbd_co_send_request(BlockDriverState *bs, if (qiov) { qio_channel_set_cork(s->ioc, true); - rc = nbd_send_request(s->ioc, request); + rc = nbd_send_request(s->ioc, request, s->info.extended_headers); if (nbd_client_connected(s) && rc >= 0) { if (qio_channel_writev_all(s->ioc, qiov->iov, qiov->niov, NULL) < 0) { @@ -481,7 +481,7 @@ static int nbd_co_send_request(BlockDriverState *bs, } qio_channel_set_cork(s->ioc, false); } else { - rc = nbd_send_request(s->ioc, request); + rc = nbd_send_request(s->ioc, request, s->info.extended_headers); } err: @@ -1252,10 +1252,11 @@ static int nbd_client_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset, NBDRequest request = { .type = NBD_CMD_WRITE_ZEROES, .from = offset, - .len = bytes, /* .len is uint32_t actually */ + .len = bytes, }; - assert(bytes <= UINT32_MAX); /* rely on max_pwrite_zeroes */ + /* rely on max_pwrite_zeroes */ + assert(bytes <= UINT32_MAX || s->info.extended_headers); assert(!(s->info.flags & NBD_FLAG_READ_ONLY)); if (!(s->info.flags & NBD_FLAG_SEND_WRITE_ZEROES)) { @@ -1302,10 +1303,11 @@ static int nbd_client_co_pdiscard(BlockDriverState *bs, int64_t offset, NBDRequest request = { .type = NBD_CMD_TRIM, .from = offset, - .len = bytes, /* len is uint32_t */ + .len = bytes, }; - assert(bytes <= UINT32_MAX); /* rely on max_pdiscard */ + /* rely on max_pdiscard */ + assert(bytes <= UINT32_MAX || s->info.extended_headers); assert(!(s->info.flags & NBD_FLAG_READ_ONLY)); if (!(s->info.flags & NBD_FLAG_SEND_TRIM) || !bytes) { @@ -1327,8 +1329,7 @@ static int coroutine_fn nbd_client_co_block_status( NBDRequest request = { .type = NBD_CMD_BLOCK_STATUS, .from = offset, - .len = MIN(QEMU_ALIGN_DOWN(INT_MAX, bs->bl.request_alignment), - MIN(bytes, s->info.size - offset)), + .len = MIN(bytes, s->info.size - offset), .flags = NBD_CMD_FLAG_REQ_ONE, }; @@ -1338,6 +1339,10 @@ static int coroutine_fn nbd_client_co_block_status( *file = bs; return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID; } + if (!s->info.extended_headers) { + request.len = MIN(QEMU_ALIGN_DOWN(INT_MAX, bs->bl.request_alignment), + request.len); + } /* * Work around the fact that the block layer doesn't do @@ -1415,7 +1420,7 @@ static void nbd_client_close(BlockDriverState *bs) NBDRequest request = { .type = NBD_CMD_DISC }; if (s->ioc) { - nbd_send_request(s->ioc, &request); + nbd_send_request(s->ioc, &request, s->info.extended_headers); } nbd_teardown_connection(bs); @@ -1877,6 +1882,14 @@ static void nbd_refresh_limits(BlockDriverState *bs, Error **errp) bs->bl.max_pwrite_zeroes = max; bs->bl.max_transfer = max; + /* + * Assume that if the server supports extended headers, it also + * supports unlimited size zero and trim commands. + */ + if (s->info.extended_headers) { + bs->bl.max_pdiscard = bs->bl.max_pwrite_zeroes = 0; + } + if (s->info.opt_block && s->info.opt_block > bs->bl.opt_transfer) { bs->bl.opt_transfer = s->info.opt_block; diff --git a/nbd/client.c b/nbd/client.c index 8f137c2320bb..aa162b9d08d5 100644 --- a/nbd/client.c +++ b/nbd/client.c @@ -1,5 +1,5 @@ /* - * Copyright (C) 2016-2019 Red Hat, Inc. + * Copyright (C) 2016-2021 Red Hat, Inc. * Copyright (C) 2005 Anthony Liguori * * Network Block Device Client Side @@ -1221,7 +1221,7 @@ int nbd_receive_export_list(QIOChannel *ioc, QCryptoTLSCreds *tlscreds, if (nbd_drop(ioc, 124, NULL) == 0) { NBDRequest request = { .type = NBD_CMD_DISC }; - nbd_send_request(ioc, &request); + nbd_send_request(ioc, &request, false); } break; default: @@ -1345,10 +1345,12 @@ int nbd_disconnect(int fd) #endif /* __linux__ */ -int nbd_send_request(QIOChannel *ioc, NBDRequest *request) +int nbd_send_request(QIOChannel *ioc, NBDRequest *request, bool ext_hdr) { uint8_t buf[NBD_REQUEST_SIZE]; + assert(!ext_hdr); + assert(request->len <= UINT32_MAX); trace_nbd_send_request(request->from, request->len, request->handle, request->flags, request->type, nbd_cmd_lookup(request->type)); diff --git a/nbd/server.c b/nbd/server.c index 64845542fd6b..4306fa7b426c 100644 --- a/nbd/server.c +++ b/nbd/server.c @@ -1436,7 +1436,7 @@ static int nbd_receive_request(NBDClient *client, NBDRequest *request, request->type = lduw_be_p(buf + 6); request->handle = ldq_be_p(buf + 8); request->from = ldq_be_p(buf + 16); - request->len = ldl_be_p(buf + 24); + request->len = ldl_be_p(buf + 24); /* widen 32 to 64 bits */ trace_nbd_receive_request(magic, request->flags, request->type, request->from, request->len); @@ -2324,7 +2324,7 @@ static int nbd_co_receive_request(NBDRequestData *req, NBDRequest *request, request->type == NBD_CMD_CACHE) { if (request->len > NBD_MAX_BUFFER_SIZE) { - error_setg(errp, "len (%" PRIu32" ) is larger than max len (%u)", + error_setg(errp, "len (%" PRIu64" ) is larger than max len (%u)", request->len, NBD_MAX_BUFFER_SIZE); return -EINVAL; } @@ -2340,6 +2340,7 @@ static int nbd_co_receive_request(NBDRequestData *req, NBDRequest *request, } if (request->type == NBD_CMD_WRITE) { + assert(request->len <= UINT32_MAX); if (nbd_read(client->ioc, req->data, request->len, "CMD_WRITE data", errp) < 0) { @@ -2361,7 +2362,7 @@ static int nbd_co_receive_request(NBDRequestData *req, NBDRequest *request, } if (request->from > client->exp->size || request->len > client->exp->size - request->from) { - error_setg(errp, "operation past EOF; From: %" PRIu64 ", Len: %" PRIu32 + error_setg(errp, "operation past EOF; From: %" PRIu64 ", Len: %" PRIu64 ", Size: %" PRIu64, request->from, request->len, client->exp->size); return (request->type == NBD_CMD_WRITE || @@ -2424,6 +2425,7 @@ static coroutine_fn int nbd_do_cmd_read(NBDClient *client, NBDRequest *request, NBDExport *exp = client->exp; assert(request->type == NBD_CMD_READ); + assert(request->len <= INT_MAX); /* XXX: NBD Protocol only documents use of FUA with WRITE */ if (request->flags & NBD_CMD_FLAG_FUA) { @@ -2475,6 +2477,7 @@ static coroutine_fn int nbd_do_cmd_cache(NBDClient *client, NBDRequest *request, NBDExport *exp = client->exp; assert(request->type == NBD_CMD_CACHE); + assert(request->len <= INT_MAX); ret = blk_co_preadv(exp->common.blk, request->from, request->len, NULL, BDRV_REQ_COPY_ON_READ | BDRV_REQ_PREFETCH); @@ -2508,6 +2511,7 @@ static coroutine_fn int nbd_handle_request(NBDClient *client, if (request->flags & NBD_CMD_FLAG_FUA) { flags |= BDRV_REQ_FUA; } + assert(request->len <= INT_MAX); ret = blk_pwrite(exp->common.blk, request->from, data, request->len, flags); return nbd_send_generic_reply(client, request->handle, ret, @@ -2551,6 +2555,7 @@ static coroutine_fn int nbd_handle_request(NBDClient *client, return nbd_send_generic_reply(client, request->handle, -EINVAL, "need non-zero length", errp); } + assert(request->len <= UINT32_MAX); if (client->export_meta.count) { bool dont_fragment = request->flags & NBD_CMD_FLAG_REQ_ONE; int contexts_remaining = client->export_meta.count; diff --git a/nbd/trace-events b/nbd/trace-events index c4919a2dd581..d18da8b0b743 100644 --- a/nbd/trace-events +++ b/nbd/trace-events @@ -31,7 +31,7 @@ nbd_client_loop(void) "Doing NBD loop" nbd_client_loop_ret(int ret, const char *error) "NBD loop returned %d: %s" nbd_client_clear_queue(void) "Clearing NBD queue" nbd_client_clear_socket(void) "Clearing NBD socket" -nbd_send_request(uint64_t from, uint32_t len, uint64_t handle, uint16_t flags, uint16_t type, const char *name) "Sending request to server: { .from = %" PRIu64", .len = %" PRIu32 ", .handle = %" PRIu64 ", .flags = 0x%" PRIx16 ", .type = %" PRIu16 " (%s) }" +nbd_send_request(uint64_t from, uint64_t len, uint64_t handle, uint16_t flags, uint16_t type, const char *name) "Sending request to server: { .from = %" PRIu64", .len = %" PRIu64 ", .handle = %" PRIu64 ", .flags = 0x%" PRIx16 ", .type = %" PRIu16 " (%s) }" nbd_receive_simple_reply(int32_t error, const char *errname, uint64_t handle) "Got simple reply: { .error = %" PRId32 " (%s), handle = %" PRIu64" }" nbd_receive_structured_reply_chunk(uint16_t flags, uint16_t type, const char *name, uint64_t handle, uint32_t length) "Got structured reply chunk: { flags = 0x%" PRIx16 ", type = %d (%s), handle = %" PRIu64 ", length = %" PRIu32 " }" @@ -60,7 +60,7 @@ nbd_negotiate_options_check_option(uint32_t option, const char *name) "Checking nbd_negotiate_begin(void) "Beginning negotiation" nbd_negotiate_new_style_size_flags(uint64_t size, unsigned flags) "advertising size %" PRIu64 " and flags 0x%x" nbd_negotiate_success(void) "Negotiation succeeded" -nbd_receive_request(uint32_t magic, uint16_t flags, uint16_t type, uint64_t from, uint32_t len) "Got request: { magic = 0x%" PRIx32 ", .flags = 0x%" PRIx16 ", .type = 0x%" PRIx16 ", from = %" PRIu64 ", len = %" PRIu32 " }" +nbd_receive_request(uint32_t magic, uint16_t flags, uint16_t type, uint64_t from, uint64_t len) "Got request: { magic = 0x%" PRIx32 ", .flags = 0x%" PRIx16 ", .type = 0x%" PRIx16 ", from = %" PRIu64 ", len = %" PRIu64 " }" nbd_blk_aio_attached(const char *name, void *ctx) "Export %s: Attaching clients to AIO context %p" nbd_blk_aio_detach(const char *name, void *ctx) "Export %s: Detaching clients from AIO context %p" nbd_co_send_simple_reply(uint64_t handle, uint32_t error, const char *errname, int len) "Send simple reply: handle = %" PRIu64 ", error = %" PRIu32 " (%s), len = %d" @@ -70,6 +70,6 @@ nbd_co_send_structured_read_hole(uint64_t handle, uint64_t offset, size_t size) nbd_co_send_extents(uint64_t handle, unsigned int extents, uint32_t id, uint64_t length, int last) "Send block status reply: handle = %" PRIu64 ", extents = %u, context = %d (extents cover %" PRIu64 " bytes, last chunk = %d)" nbd_co_send_structured_error(uint64_t handle, int err, const char *errname, const char *msg) "Send structured error reply: handle = %" PRIu64 ", error = %d (%s), msg = '%s'" nbd_co_receive_request_decode_type(uint64_t handle, uint16_t type, const char *name) "Decoding type: handle = %" PRIu64 ", type = %" PRIu16 " (%s)" -nbd_co_receive_request_payload_received(uint64_t handle, uint32_t len) "Payload received: handle = %" PRIu64 ", len = %" PRIu32 -nbd_co_receive_align_compliance(const char *op, uint64_t from, uint32_t len, uint32_t align) "client sent non-compliant unaligned %s request: from=0x%" PRIx64 ", len=0x%" PRIx32 ", align=0x%" PRIx32 +nbd_co_receive_request_payload_received(uint64_t handle, uint64_t len) "Payload received: handle = %" PRIu64 ", len = %" PRIu64 +nbd_co_receive_align_compliance(const char *op, uint64_t from, uint64_t len, uint32_t align) "client sent non-compliant unaligned %s request: from=0x%" PRIx64 ", len=0x%" PRIx64 ", align=0x%" PRIx32 nbd_trip(void) "Reading request" From patchwork Fri Dec 3 23:17:36 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Blake X-Patchwork-Id: 12656043 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id AF726C433F5 for ; Fri, 3 Dec 2021 23:29:59 +0000 (UTC) Received: from localhost ([::1]:58716 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mtI06-0002WB-Hv for qemu-devel@archiver.kernel.org; Fri, 03 Dec 2021 18:29:58 -0500 Received: from eggs.gnu.org ([209.51.188.92]:38312) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mtHoe-0002XS-IK for qemu-devel@nongnu.org; Fri, 03 Dec 2021 18:18:08 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:39551) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mtHoc-0003LX-Lb for qemu-devel@nongnu.org; Fri, 03 Dec 2021 18:18:08 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1638573486; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gLV12rQlPT4V/6VRD0TZQiSGadb3bwCF2gDh8grwgFQ=; b=Vm7My1GAYK2AGX5HttLtgLS7AcPG//4OjKmqt55u8coCuQxIjtBO16IqGa8h59IAavoa5H 881H15pq9x0TBCI3eS9zwGBW38q8lJHFpgHB2ruvdH1M07oXeknyYPb+Wo0/q0h+4z8hMf D+YrjsFViS3HGR2UyF6yvog1ARbubQU= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-264-PEQk-d6FMgmnaKAAabSLrg-1; Fri, 03 Dec 2021 18:18:03 -0500 X-MC-Unique: PEQk-d6FMgmnaKAAabSLrg-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 4A5E42F38; Fri, 3 Dec 2021 23:18:01 +0000 (UTC) Received: from blue.redhat.com (unknown [10.2.16.22]) by smtp.corp.redhat.com (Postfix) with ESMTP id 67CBC5DF2B; Fri, 3 Dec 2021 23:18:00 +0000 (UTC) From: Eric Blake To: libguestfs@redhat.com Subject: [libnbd PATCH 08/13] block_status: Track 64-bit extents internally Date: Fri, 3 Dec 2021 17:17:36 -0600 Message-Id: <20211203231741.3901263-9-eblake@redhat.com> In-Reply-To: <20211203231741.3901263-1-eblake@redhat.com> References: <20211203231307.wmtbw7r72tyzkkax@redhat.com> <20211203231741.3901263-1-eblake@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=eblake@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=170.10.133.124; envelope-from=eblake@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -34 X-Spam_score: -3.5 X-Spam_bar: --- X-Spam_report: (-3.5 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.717, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: nsoffer@redhat.com, vsementsov@virtuozzo.com, qemu-devel@nongnu.org, qemu-block@nongnu.org, nbd@other.debian.org Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" When extended headers are in use, the server can send us 64-bit extents, even for a 32-bit query (if the server knows the entire image is data, for example). For maximum flexibility, we are thus better off storing 64-bit lengths internally, even if we have to convert it back to 32-bit lengths when invoking the user's 32-bit callback. The next patch will then add a new API for letting the user access the full 64-bit extent information. The goal is to let both APIs work all the time, regardless of the size extents that the server actually answered with. Note that when using the old nbd_block_status() API with a server that lacks extended headers, we now add a double-conversion speed penalty (converting the server's 32-bit answer into 64-bit internally and back to 32-bit for the callback). But the speed penalty will not be a problem for applications using the new nbd_block_status_64() API (we have to give a 64-bit answer no matter what the server answered), and ideally the situation will become less common as more servers learn extended headers. So for now I chose to unconditionally use a 64-bit internal representation; but if it turns out to have noticeable degredation, we could tweak things to conditionally retain 32-bit internal representation for servers lacking extended headers at the expense of more code maintenance. One of the trickier aspects of this patch is auditing that both the user's extent and our malloc'd shim get cleaned up once on all possible paths, so that there is neither a leak nor a double free. --- lib/internal.h | 7 +++- generator/states-reply-structured.c | 31 ++++++++++----- lib/handle.c | 4 +- lib/rw.c | 59 ++++++++++++++++++++++++++++- 4 files changed, 85 insertions(+), 16 deletions(-) diff --git a/lib/internal.h b/lib/internal.h index 06f3a65c..4800df83 100644 --- a/lib/internal.h +++ b/lib/internal.h @@ -75,7 +75,7 @@ struct export { struct command_cb { union { - nbd_extent_callback extent; + nbd_extent64_callback extent; nbd_chunk_callback chunk; nbd_list_callback list; nbd_context_callback context; @@ -286,7 +286,10 @@ struct nbd_handle { /* When receiving block status, this is used. */ uint32_t bs_contextid; - uint32_t *bs_entries; + union { + nbd_extent *normal; /* Our 64-bit preferred internal form */ + uint32_t *narrow; /* 32-bit form of NBD_REPLY_TYPE_BLOCK_STATUS */ + } bs_entries; /* Commands which are waiting to be issued [meaning the request * packet is sent to the server]. This is used as a simple linked diff --git a/generator/states-reply-structured.c b/generator/states-reply-structured.c index a3e0e2ac..71c761e9 100644 --- a/generator/states-reply-structured.c +++ b/generator/states-reply-structured.c @@ -494,6 +494,7 @@ STATE_MACHINE { REPLY.STRUCTURED_REPLY.RECV_BS_CONTEXTID: struct command *cmd = h->reply_cmd; uint32_t length; + uint32_t count; switch (recv_into_rbuf (h)) { case -1: SET_NEXT_STATE (%.DEAD); return 0; @@ -508,15 +509,19 @@ STATE_MACHINE { assert (cmd->type == NBD_CMD_BLOCK_STATUS); assert (length >= 12); length -= sizeof h->bs_contextid; + count = length / (2 * sizeof (uint32_t)); - free (h->bs_entries); - h->bs_entries = malloc (length); - if (h->bs_entries == NULL) { + /* Read raw data into a subset of h->bs_entries, then expand it + * into place later later during byte-swapping. + */ + free (h->bs_entries.normal); + h->bs_entries.normal = malloc (count * sizeof *h->bs_entries.normal); + if (h->bs_entries.normal == NULL) { SET_NEXT_STATE (%.DEAD); set_error (errno, "malloc"); return 0; } - h->rbuf = h->bs_entries; + h->rbuf = h->bs_entries.narrow; h->rlen = length; SET_NEXT_STATE (%RECV_BS_ENTRIES); } @@ -528,6 +533,7 @@ STATE_MACHINE { uint32_t count; size_t i; uint32_t context_id; + uint32_t *raw; struct meta_context *meta_context; switch (recv_into_rbuf (h)) { @@ -542,15 +548,20 @@ STATE_MACHINE { assert (cmd); /* guaranteed by CHECK */ assert (cmd->type == NBD_CMD_BLOCK_STATUS); assert (CALLBACK_IS_NOT_NULL (cmd->cb.fn.extent)); - assert (h->bs_entries); + assert (h->bs_entries.normal); assert (length >= 12); - count = (length - sizeof h->bs_contextid) / sizeof *h->bs_entries; + count = (length - sizeof h->bs_contextid) / (2 * sizeof (uint32_t)); /* Need to byte-swap the entries returned, but apart from that we - * don't validate them. + * don't validate them. Reverse order is essential, since we are + * expanding in-place from narrow to wider type. */ - for (i = 0; i < count; ++i) - h->bs_entries[i] = be32toh (h->bs_entries[i]); + raw = h->bs_entries.narrow; + for (i = count; i > 0; ) { + --i; + h->bs_entries.normal[i].flags = be32toh (raw[i * 2 + 1]); + h->bs_entries.normal[i].length = be32toh (raw[i * 2]); + } /* Look up the context ID. */ context_id = be32toh (h->bs_contextid); @@ -566,7 +577,7 @@ STATE_MACHINE { if (CALL_CALLBACK (cmd->cb.fn.extent, meta_context->name, cmd->offset, - h->bs_entries, count, + h->bs_entries.normal, count, &error) == -1) if (cmd->error == 0) cmd->error = error ? error : EPROTO; diff --git a/lib/handle.c b/lib/handle.c index cbb37e89..74fe87ec 100644 --- a/lib/handle.c +++ b/lib/handle.c @@ -1,5 +1,5 @@ /* NBD client library in userspace - * Copyright (C) 2013-2020 Red Hat Inc. + * Copyright (C) 2013-2021 Red Hat Inc. * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public @@ -123,7 +123,7 @@ nbd_close (struct nbd_handle *h) /* Free user callbacks first. */ nbd_unlocked_clear_debug_callback (h); - free (h->bs_entries); + free (h->bs_entries.normal); nbd_internal_reset_size_and_flags (h); nbd_internal_free_option (h); free_cmd_list (h->cmds_to_issue); diff --git a/lib/rw.c b/lib/rw.c index 16c2e848..f36f4e15 100644 --- a/lib/rw.c +++ b/lib/rw.c @@ -42,6 +42,50 @@ wait_for_command (struct nbd_handle *h, int64_t cookie) return r == -1 ? -1 : 0; } +/* Convert from 64-bit to 32-bit extent callback. */ +static int +nbd_convert_extent (void *data, const char *metacontext, uint64_t offset, + nbd_extent *entries, size_t nr_entries, int *error) +{ + nbd_extent_callback *cb = data; + uint32_t *array = malloc (nr_entries * 2 * sizeof *array); + size_t i; + int ret; + + if (array == NULL) { + set_error (*error = errno, "malloc"); + return -1; + } + + for (i = 0; i < nr_entries; i++) { + array[i * 2] = entries[i].length; + array[i * 2 + 1] = entries[i].flags; + /* If an extent is larger than 32 bits, silently truncate the rest + * of the server's response. Technically, such a server was + * non-compliant if the client did not negotiate extended headers, + * but it is easier to let the caller make progress than to make + * the call fail. Rather than track the connection's alignment, + * just blindly truncate the large extent to 4G-64M. + */ + if (entries[i].length > UINT32_MAX) { + array[i++ * 2] = -MAX_REQUEST_SIZE; + break; + } + } + + ret = CALL_CALLBACK (*cb, metacontext, offset, array, i * 2, error); + free (array); + return ret; +} + +static void +nbd_convert_extent_free (void *data) +{ + nbd_extent_callback *cb = data; + FREE_CALLBACK (*cb); + free (cb); +} + /* Issue a read command and wait for the reply. */ int nbd_unlocked_pread (struct nbd_handle *h, void *buf, @@ -469,12 +513,23 @@ nbd_unlocked_aio_block_status (struct nbd_handle *h, nbd_completion_callback *completion, uint32_t flags) { - struct command_cb cb = { .fn.extent = *extent, + nbd_extent_callback *shim = malloc (sizeof *shim); + struct command_cb cb = { .fn.extent.callback = nbd_convert_extent, + .fn.extent.user_data = shim, + .fn.extent.free = nbd_convert_extent_free, .completion = *completion }; + if (shim == NULL) { + set_error (errno, "malloc"); + return -1; + } + *shim = *extent; + SET_CALLBACK_TO_NULL (*extent); + if (h->strict & LIBNBD_STRICT_COMMANDS) { if (!h->structured_replies) { set_error (ENOTSUP, "server does not support structured replies"); + FREE_CALLBACK (cb.fn.extent); return -1; } @@ -482,11 +537,11 @@ nbd_unlocked_aio_block_status (struct nbd_handle *h, set_error (ENOTSUP, "did not negotiate any metadata contexts, " "either you did not call nbd_add_meta_context before " "connecting or the server does not support it"); + FREE_CALLBACK (cb.fn.extent); return -1; } } - SET_CALLBACK_TO_NULL (*extent); SET_CALLBACK_TO_NULL (*completion); return nbd_internal_command_common (h, flags, NBD_CMD_BLOCK_STATUS, offset, count, EINVAL, NULL, &cb);