diff mbox series

[v2,6/6] RFC: spec: Introduce NBD_REPLY_TYPE_OFFSET_HOLE_EXT

Message ID 20221114224655.2186173-7-eblake@redhat.com (mailing list archive)
State New, archived
Headers show
Series NBD spec changes for 64-bit extensions | expand

Commit Message

Eric Blake Nov. 14, 2022, 10:46 p.m. UTC
Rather than requiring all servers and clients to have a 32-bit limit
on maximum NBD_CMD_READ/WRITE sizes, we can choose to standardize
support for a 64-bit single I/O transaction now.
NBD_REPLY_TYPE_OFFSET_DATA can already handle a large reply, but
NBD_REPLY_TYPE_OFFSET_HOLE needs a 64-bit counterpart.

By standardizing this, all clients must be prepared to support both
types of hole type replies, even though most server implementations of
extended replies are likely to only send one hole type.

---

As this may mean a corner-case that gets less testing, I have
separated it into a separate optional patch.  I implemented it in my
proof-of-concept, but am happy to drop this patch for what actually
goes upstream.

In particular, if we foresee clients and servers that WANT to support
a payload larger than 4G, it may be worth introducing an NBD_INFO_*
that supplies 64-bit block sizing information, rather than our current
inherent 32-bit limit of NBD_INFO_BLOCK_SIZE, at the same time as we
introduce this reply type.
---
 doc/proto.md | 73 ++++++++++++++++++++++++++++------------------------
 1 file changed, 40 insertions(+), 33 deletions(-)
diff mbox series

Patch

diff --git a/doc/proto.md b/doc/proto.md
index 645a736..9c04411 100644
--- a/doc/proto.md
+++ b/doc/proto.md
@@ -2008,19 +2008,25 @@  size.
   64 bits: offset (unsigned)  
   32 bits: hole size (unsigned, MUST be nonzero)  

-  At this time, although servers that support extended headers are
-  permitted to accept client requests for `NBD_CMD_READ` with an
-  effect length larger than any advertised maximum block payload size
-  by splitting the reply into multiple chunks, portable clients SHOULD
-  NOT request a read *length* larger than 32 bits (corresponding to
-  the maximum block payload constraint implied by
-  `NBD_INFO_BLOCK_SIZE`), and therefore a 32-bit constraint on the
-  *hole size* does not represent an arbitrary limitation.  Should a
-  future scenario arise where it can be demonstrated that a client and
-  server would benefit from an extension allowing a maximum block
-  payload size to be larger than 32 bits, that extension would also
-  introduce a counterpart reply type that can express a 64-bit *hole
-  size*.
+* `NBD_REPLY_TYPE_OFFSET_HOLE_EXT` (3)
+
+  This chunk type is in the content chunk category.  *length* MUST be
+  exactly 16.  The semantics of this chunk mirror those of
+  `NBD_REPLY_TYPE_OFFSET_HOLE`, other than the use of a larger *hole
+  size* field.  This chunk type MUST NOT be used unless extended
+  headers were negotiated with `NBD_OPT_EXTENDED_HEADERS`.
+
+  The payload is structured as:
+
+  64 bits: offset (unsigned)  
+  64 bits: hole size (unsigned, MUST be nonzero)  
+
+  Note that even though extended headers are in use, a server may
+  enforce a maximum block size that is smaller than 32 bits, in which
+  case no valid `NBD_CMD_READ` will have a *length* large enough to
+  require the use of this chunk type.  However, a client using
+  extended headers MUST be prepared for the server to use either the
+  compact or extended chunk type.

 * `NBD_REPLY_TYPE_BLOCK_STATUS` (5)

@@ -2218,26 +2224,27 @@  The following request types exist:
     the following additional constraints.

     The server MAY split the reply into any number of content chunks
-    (`NBD_REPLY_TYPE_OFFSET_DATA` and `NBD_REPLY_TYPE_OFFSET_HOLE`);
-    each chunk MUST describe at least one byte, although to minimize
-    overhead, the server SHOULD use chunks with lengths and offsets as
-    an integer multiple of 512 bytes, where possible (the first and
-    last chunk of an unaligned read being the most obvious places for
-    an exception).  The server MUST NOT send content chunks that
-    overlap with any earlier content or error chunk, and MUST NOT send
-    chunks that describe data outside the offset and length of the
-    request, but MAY send the content chunks in any order (the client
-    MUST reassemble content chunks into the correct order), and MAY
-    send additional content chunks even after reporting an error
-    chunk.  A server MAY support read requests larger than the maximum
-    block payload size by splitting the response across multiple
-    chunks (in particular, if extended headers are not in use, a
-    request for more than 2^32 - 8 bytes containing data rather than
-    holes MUST be split to avoid overflowing the 32-bit
-    `NBD_REPLY_TYPE_OFFSET_DATA` length field); however, the server is
-    also permitted to reject large read requests up front, so a client
-    should be prepared to retry with smaller requests if a large
-    request fails.
+    (`NBD_REPLY_TYPE_OFFSET_DATA` and `NBD_REPLY_TYPE_OFFSET_HOLE` for
+    structured replies, additionally `NBD_REPLY_TYPE_OFFSET_HOLE_EXT`
+    for extended headers); each chunk MUST describe at least one byte,
+    although to minimize overhead, the server SHOULD use chunks with
+    lengths and offsets as an integer multiple of 512 bytes, where
+    possible (the first and last chunk of an unaligned read being the
+    most obvious places for an exception).  The server MUST NOT send
+    content chunks that overlap with any earlier content or error
+    chunk, and MUST NOT send chunks that describe data outside the
+    offset and length of the request, but MAY send the content chunks
+    in any order (the client MUST reassemble content chunks into the
+    correct order), and MAY send additional content chunks even after
+    reporting an error chunk.  A server MAY support read requests
+    larger than the maximum block payload size by splitting the
+    response across multiple chunks (in particular, if extended
+    headers are not in use, a request for more than 2^32 - 8 bytes
+    containing data rather than holes MUST be split to avoid
+    overflowing the 32-bit `NBD_REPLY_TYPE_OFFSET_DATA` length field);
+    however, the server is also permitted to reject large read
+    requests up front, so a client should be prepared to retry with
+    smaller requests if a large request fails.

     When no error is detected, the server MUST send enough data chunks
     to cover the entire region described by the offset and length of