Message ID | 1459779314-12266-1-git-send-email-eblake@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 04/04/2016 05:15 PM, Eric Blake wrote: > qemu already has an existing server implementation option that will > explicitly search the payload of NBD_CMD_WRITE for large blocks of > zeroes, and punch holes in the underlying file. For old clients > that don't know how to use the new NBD_CMD_WRITE_ZEROES, this is a > workaround to keep the server's destination file approximately as > sparse as the client's source. However, for new clients that know > how to explicitly request holes, it is unnecessary overhead; and > can lead to the server punching a hole and risking fragmentation or > future ENOSPC even when the client explicitly wanted to write > zeroes rather than a hole. So it makes sense to let the new > NBD_CMD_FLAG_NO_HOLE work for WRITE as well as WRITE_ZEROES. > > Signed-off-by: Eric Blake <eblake@redhat.com> what behaviour do you expect for QCOW2 file? We should fully provision that image as far as I could understand, i.e. allocate data blocks with zero content. I think that this would work.
On 04/04/2016 08:47 AM, Denis V. Lunev wrote: > On 04/04/2016 05:15 PM, Eric Blake wrote: >> qemu already has an existing server implementation option that will >> explicitly search the payload of NBD_CMD_WRITE for large blocks of >> zeroes, and punch holes in the underlying file. For old clients >> that don't know how to use the new NBD_CMD_WRITE_ZEROES, this is a >> workaround to keep the server's destination file approximately as >> sparse as the client's source. However, for new clients that know >> how to explicitly request holes, it is unnecessary overhead; and >> can lead to the server punching a hole and risking fragmentation or >> future ENOSPC even when the client explicitly wanted to write >> zeroes rather than a hole. So it makes sense to let the new >> NBD_CMD_FLAG_NO_HOLE work for WRITE as well as WRITE_ZEROES. >> >> Signed-off-by: Eric Blake <eblake@redhat.com> > > what behaviour do you expect for QCOW2 file? > We should fully provision that image as far as I could understand, > i.e. allocate data blocks with zero content. For an old client (that doesn't know how to use WRITE_ZEROES), the client will send ALL data via CMD_WRITE; and then the server pays attention to its command-line flags on whether to parse for zeroes (which is already tri-state, between always allocate, use WRITE_SAME where possible, and TRIM where possible). The result is that the server can be commanded at startup to make its file sparse, even when the client did not have a sparse file to start with. For a new client, the client will always use WRITE_ZEROES when it is okay with the server punching holes, and WRITE_ZEROES+NO_HOLE when it wants to compress the network stream but still force allocation. Therefore, the server spending time looking for zeroes during WRITE is wasted effort, and the client should always send WRITE+NO_HOLE for data that is not a hole on the client's side of things (even if that content is all zeroes). The result is that the server can now create a file with the same sparseness pattern as the client, assuming that client and server have the same granularity of hole sizing. With a qcow2 file as the underlying file of the qemu nbd server, the behavior then depends on whether the qcow2 file is version 2 (no efficient ways to represent all zeroes, but there ARE some shortcuts that can be taken when you know the file is backed by something else that is all zero) or version 3 (where there is an explicit and efficient way to mark an all-zero cluster). > > I think that this would work. >
On 4 Apr 2016, at 15:15, Eric Blake <eblake@redhat.com> wrote: > qemu already has an existing server implementation option that will > explicitly search the payload of NBD_CMD_WRITE for large blocks of > zeroes, and punch holes in the underlying file. For old clients > that don't know how to use the new NBD_CMD_WRITE_ZEROES, this is a > workaround to keep the server's destination file approximately as > sparse as the client's source. However, for new clients that know > how to explicitly request holes, it is unnecessary overhead; and > can lead to the server punching a hole and risking fragmentation or > future ENOSPC even when the client explicitly wanted to write > zeroes rather than a hole. So it makes sense to let the new > NBD_CMD_FLAG_NO_HOLE work for WRITE as well as WRITE_ZEROES. > > Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alex Bligh <alex@alex.org.uk> +1 Alex > --- > doc/proto.md | 33 +++++++++++++++++++++++++++------ > 1 file changed, 27 insertions(+), 6 deletions(-) > > diff --git a/doc/proto.md b/doc/proto.md > index 35a3266..fb97217 100644 > --- a/doc/proto.md > +++ b/doc/proto.md > @@ -737,8 +737,14 @@ by a sparse file. With current NBD command set, the client has to issue > through the wire. The server has to write the data onto disk, effectively > losing the sparseness. > > -To remedy this, a `WRITE_ZEROES` extension is envisioned. This extension adds > -one new command and one new command flag. > +To remedy this, a `WRITE_ZEROES` extension is envisioned. This > +extension adds one new transmission flag, one new command, and one new > +command flag; and refines an existing command. > + > +* `NBD_FLAG_SEND_WRITE_ZEROES` > + > + The server SHOULD set this transmission flag to 1 if the > + `NBD_CMD_WRITE_ZEROES` request is supported. > > * `NBD_CMD_WRITE_ZEROES` > > @@ -772,12 +778,27 @@ The server SHOULD return `ENOSPC` if it receives a write zeroes request > including one or more sectors beyond the size of the device. It SHOULD > return `EPERM` if it receives a write zeroes request on a read-only export. > > +* `NBD_CMD_WRITE` > + > + By default, the server MAY search for large contiguous blocks of > + all zero content, and use trimming to zero out those portions of > + the write, even if it did not advertise `NBD_FLAG_SEND_TRIM`; but > + it MUST ensure that any trimmed areas of data read back as zero. > + However, the client MAY set the command flag > + `NBD_CMD_FLAG_NO_HOLE` to inform the server that the entire > + written area MUST be fully provisioned, ensuring that future > + writes to the same area will not cause fragmentation or cause > + failure due to insufficient space. Clients SHOULD NOT set this > + flag unless the server advertised `NBD_FLAG_SEND_WRITE_ZEROES` in > + the transmisison flags. > + > The extension adds the following new command flag: > > -- `NBD_CMD_FLAG_NO_HOLE`; valid during `NBD_CMD_WRITE_ZEROES`. > - SHOULD be set to 1 if the client wants to ensure that the server does > - not create a hole. The client MAY send `NBD_CMD_FLAG_NO_HOLE` even > - if `NBD_FLAG_SEND_TRIM` was not set in the transmission flags field. > +- `NBD_CMD_FLAG_NO_HOLE`; valid during `NBD_CMD_WRITE` and > + `NBD_CMD_WRITE_ZEROES`. SHOULD be set to 1 if the client wants to > + ensure that the server does not create a hole. The client MAY send > + `NBD_CMD_FLAG_NO_HOLE` even if `NBD_FLAG_SEND_TRIM` was not set in > + the transmission flags field. > > ### `STRUCTURED_REPLY` extension > > -- > 2.5.5 >
On 04/04/2016 16:15, Eric Blake wrote: > qemu already has an existing server implementation option that will > explicitly search the payload of NBD_CMD_WRITE for large blocks of > zeroes, and punch holes in the underlying file. For old clients > that don't know how to use the new NBD_CMD_WRITE_ZEROES, this is a > workaround to keep the server's destination file approximately as > sparse as the client's source. I don't think this is the case; the flag is explicitly meant to override the client. Paolo > However, for new clients that know > how to explicitly request holes, it is unnecessary overhead; and > can lead to the server punching a hole and risking fragmentation or > future ENOSPC even when the client explicitly wanted to write > zeroes rather than a hole. So it makes sense to let the new > NBD_CMD_FLAG_NO_HOLE work for WRITE as well as WRITE_ZEROES. > > Signed-off-by: Eric Blake <eblake-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > --- > doc/proto.md | 33 +++++++++++++++++++++++++++------ > 1 file changed, 27 insertions(+), 6 deletions(-) > > diff --git a/doc/proto.md b/doc/proto.md > index 35a3266..fb97217 100644 > --- a/doc/proto.md > +++ b/doc/proto.md > @@ -737,8 +737,14 @@ by a sparse file. With current NBD command set, the client has to issue > through the wire. The server has to write the data onto disk, effectively > losing the sparseness. > > -To remedy this, a `WRITE_ZEROES` extension is envisioned. This extension adds > -one new command and one new command flag. > +To remedy this, a `WRITE_ZEROES` extension is envisioned. This > +extension adds one new transmission flag, one new command, and one new > +command flag; and refines an existing command. > + > +* `NBD_FLAG_SEND_WRITE_ZEROES` > + > + The server SHOULD set this transmission flag to 1 if the > + `NBD_CMD_WRITE_ZEROES` request is supported. > > * `NBD_CMD_WRITE_ZEROES` > > @@ -772,12 +778,27 @@ The server SHOULD return `ENOSPC` if it receives a write zeroes request > including one or more sectors beyond the size of the device. It SHOULD > return `EPERM` if it receives a write zeroes request on a read-only export. > > +* `NBD_CMD_WRITE` > + > + By default, the server MAY search for large contiguous blocks of > + all zero content, and use trimming to zero out those portions of > + the write, even if it did not advertise `NBD_FLAG_SEND_TRIM`; but > + it MUST ensure that any trimmed areas of data read back as zero. > + However, the client MAY set the command flag > + `NBD_CMD_FLAG_NO_HOLE` to inform the server that the entire > + written area MUST be fully provisioned, ensuring that future > + writes to the same area will not cause fragmentation or cause > + failure due to insufficient space. Clients SHOULD NOT set this > + flag unless the server advertised `NBD_FLAG_SEND_WRITE_ZEROES` in > + the transmisison flags. > + > The extension adds the following new command flag: > > -- `NBD_CMD_FLAG_NO_HOLE`; valid during `NBD_CMD_WRITE_ZEROES`. > - SHOULD be set to 1 if the client wants to ensure that the server does > - not create a hole. The client MAY send `NBD_CMD_FLAG_NO_HOLE` even > - if `NBD_FLAG_SEND_TRIM` was not set in the transmission flags field. > +- `NBD_CMD_FLAG_NO_HOLE`; valid during `NBD_CMD_WRITE` and > + `NBD_CMD_WRITE_ZEROES`. SHOULD be set to 1 if the client wants to > + ensure that the server does not create a hole. The client MAY send > + `NBD_CMD_FLAG_NO_HOLE` even if `NBD_FLAG_SEND_TRIM` was not set in > + the transmission flags field. > > ### `STRUCTURED_REPLY` extension >
diff --git a/doc/proto.md b/doc/proto.md index 35a3266..fb97217 100644 --- a/doc/proto.md +++ b/doc/proto.md @@ -737,8 +737,14 @@ by a sparse file. With current NBD command set, the client has to issue through the wire. The server has to write the data onto disk, effectively losing the sparseness. -To remedy this, a `WRITE_ZEROES` extension is envisioned. This extension adds -one new command and one new command flag. +To remedy this, a `WRITE_ZEROES` extension is envisioned. This +extension adds one new transmission flag, one new command, and one new +command flag; and refines an existing command. + +* `NBD_FLAG_SEND_WRITE_ZEROES` + + The server SHOULD set this transmission flag to 1 if the + `NBD_CMD_WRITE_ZEROES` request is supported. * `NBD_CMD_WRITE_ZEROES` @@ -772,12 +778,27 @@ The server SHOULD return `ENOSPC` if it receives a write zeroes request including one or more sectors beyond the size of the device. It SHOULD return `EPERM` if it receives a write zeroes request on a read-only export. +* `NBD_CMD_WRITE` + + By default, the server MAY search for large contiguous blocks of + all zero content, and use trimming to zero out those portions of + the write, even if it did not advertise `NBD_FLAG_SEND_TRIM`; but + it MUST ensure that any trimmed areas of data read back as zero. + However, the client MAY set the command flag + `NBD_CMD_FLAG_NO_HOLE` to inform the server that the entire + written area MUST be fully provisioned, ensuring that future + writes to the same area will not cause fragmentation or cause + failure due to insufficient space. Clients SHOULD NOT set this + flag unless the server advertised `NBD_FLAG_SEND_WRITE_ZEROES` in + the transmisison flags. + The extension adds the following new command flag: -- `NBD_CMD_FLAG_NO_HOLE`; valid during `NBD_CMD_WRITE_ZEROES`. - SHOULD be set to 1 if the client wants to ensure that the server does - not create a hole. The client MAY send `NBD_CMD_FLAG_NO_HOLE` even - if `NBD_FLAG_SEND_TRIM` was not set in the transmission flags field. +- `NBD_CMD_FLAG_NO_HOLE`; valid during `NBD_CMD_WRITE` and + `NBD_CMD_WRITE_ZEROES`. SHOULD be set to 1 if the client wants to + ensure that the server does not create a hole. The client MAY send + `NBD_CMD_FLAG_NO_HOLE` even if `NBD_FLAG_SEND_TRIM` was not set in + the transmission flags field. ### `STRUCTURED_REPLY` extension
qemu already has an existing server implementation option that will explicitly search the payload of NBD_CMD_WRITE for large blocks of zeroes, and punch holes in the underlying file. For old clients that don't know how to use the new NBD_CMD_WRITE_ZEROES, this is a workaround to keep the server's destination file approximately as sparse as the client's source. However, for new clients that know how to explicitly request holes, it is unnecessary overhead; and can lead to the server punching a hole and risking fragmentation or future ENOSPC even when the client explicitly wanted to write zeroes rather than a hole. So it makes sense to let the new NBD_CMD_FLAG_NO_HOLE work for WRITE as well as WRITE_ZEROES. Signed-off-by: Eric Blake <eblake@redhat.com> --- doc/proto.md | 33 +++++++++++++++++++++++++++------ 1 file changed, 27 insertions(+), 6 deletions(-)