diff mbox

Improve documentation of FUA and FLUSH

Message ID 1459465399-56203-1-git-send-email-alex@alex.org.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Alex Bligh March 31, 2016, 11:03 p.m. UTC
Improve the documentation of NBD_CMD_FLUSH and NBD_CMD_FLAG_FUA. Specifically
the latter may be set on any command, and its semantics on commands other
than NBD_CMD_WRITE need explaining. Further, explain how these relate to
reordering of commands.

Signed-off-by: Alex Bligh <alex@alex.org.uk>
---
 doc/proto.md | 52 ++++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 42 insertions(+), 10 deletions(-)

Comments

Eric Blake April 1, 2016, 3:23 a.m. UTC | #1
On 03/31/2016 05:03 PM, Alex Bligh wrote:
> Improve the documentation of NBD_CMD_FLUSH and NBD_CMD_FLAG_FUA. Specifically
> the latter may be set on any command, and its semantics on commands other
> than NBD_CMD_WRITE need explaining. Further, explain how these relate to
> reordering of commands.
> 
> Signed-off-by: Alex Bligh <alex@alex.org.uk>
> ---
>  doc/proto.md | 52 ++++++++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 42 insertions(+), 10 deletions(-)
> 
> diff --git a/doc/proto.md b/doc/proto.md
> index c1e05c5..bc4483d 100644
> --- a/doc/proto.md
> +++ b/doc/proto.md
> @@ -197,6 +197,37 @@ handle as was sent by the client in the corresponding request.  In
>  this way, the client can correlate which request is receiving a
>  response.
>  
> +#### Ordering of messages and writes
> +
> +The server MAY process commands out of order, and MAY reply out of
> +order, save that:
> +
> +* All write commands (that includes both `NBD_CMD_WRITE` and
> +  `NBD_CMD_TRIM`) that the server completes (i.e. replies to)
> +  prior to processing to a `NBD_CMD_FLUSH` MUST be written to non-volatile
> +  storage prior to replying to that `NBD_CMD_FLUSH`. The server SHOULD ensure
> +  that all write command received prior to processing the `NBD_CMD_FLUSH`
> +  (whether they are replied to or not) are written to non-volatile
> +  storage prior to processing an `NBD_CMD_FLUSH`; note this is a
> +  stronger condition than the previous 'MUST' condition. This
> +  paragram only applies if `NBD_FLAG_SEND_FLUSH` is set within

s/paragram/paragraph/

> +  the transmission flags, as otherwise `NBD_CMD_FLUSH` will never
> +  be sent by the client to the server.
> +
> +* A server MUST NOT reply to a command that has `NBD_CMD_FLAG_FUA` set
> +  in its command flags until   the data area referred to by that command

why multiple spaces?

> +  is persisted to non-volatile storage. This only applies if
> +  `NBD_FLAG_SEND_FLUSH` is set within the transmission flags, as otherwise

s/FLUSH/FUA/

> +  `NBD_CMD_FLAG_FUA` will not be set on any commands sent to the server
> +  by the client.
> +
> +`NBD_CMD_FLUSH` is modelled on the Linux kernel empty bio with
> +`REQ_FLUSH` set. `NBD_CMD_FLAG_FUA` is modelled on the Linux
> +kernel bio with `REQ_FUA` set. In case of ambiguity in this
> +specification, the
> +[kernel documentation](https://www.kernel.org/doc/Documentation/block/writeback_cache_control.txt)
> +may be useful.
> +
>  #### Request message
>  
>  The request message, sent by the client, looks as follows:
> @@ -444,10 +475,17 @@ affects a particular command.  Clients MUST NOT set a command flag bit
>  that is not documented for the particular command; and whether a flag is
>  valid may depend on negotiation during the handshake phase.
>  
> -- bit 0, `NBD_CMD_FLAG_FUA`; valid during `NBD_CMD_WRITE`.  SHOULD be
> -  set to 1 if the client requires "Force Unit Access" mode of
> -  operation.  MUST NOT be set unless transmission flags included
> -  `NBD_FLAG_SEND_FUA`.
> +- bit 0, `NBD_CMD_FLAG_FUA`. This bit
> +  MUST be set to 0 unless the `NBD_FLAG_SEND_FUA` flag ("Force Unit Access")
> +  was set in the transmission flags field. If the `NBD_FLAG_SEND_FUA`
> +  is set in the transmission flags field, the client MAY set
> +  `NBD_CMD_FLAG_FUA` in any request. If this bit is set, the server
> +  MUST NOT send a reply until it has ensured that any data referred to
> +  by this request (i.e. written data on a write or trim, read data on
> +  a read) has reached permanent storage. There will be certain commands
> +  (e.g. `NBD_CMD_DISC`) for which this flag will thus not alter behaviour
> +  (as the command does not refer to any data), in which case the server
> +  MUST ignore this bit.

Makes sense, but we now need to fix the reference implementation  to
match (the recent commit ab22e082 rejects NBD_CMD_FLAG_FUA on all but
writes).
Wouter Verhelst April 1, 2016, 8:35 a.m. UTC | #2
On Fri, Apr 01, 2016 at 12:03:19AM +0100, Alex Bligh wrote:
> Improve the documentation of NBD_CMD_FLUSH and NBD_CMD_FLAG_FUA. Specifically
> the latter may be set on any command, and its semantics on commands other
> than NBD_CMD_WRITE need explaining. Further, explain how these relate to
> reordering of commands.
> 
> Signed-off-by: Alex Bligh <alex@alex.org.uk>
> ---
>  doc/proto.md | 52 ++++++++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 42 insertions(+), 10 deletions(-)
> 
> diff --git a/doc/proto.md b/doc/proto.md
> index c1e05c5..bc4483d 100644
> --- a/doc/proto.md
> +++ b/doc/proto.md
> @@ -197,6 +197,37 @@ handle as was sent by the client in the corresponding request.  In
>  this way, the client can correlate which request is receiving a
>  response.
>  
> +#### Ordering of messages and writes
> +
> +The server MAY process commands out of order, and MAY reply out of
> +order, save that:
> +
> +* All write commands (that includes both `NBD_CMD_WRITE` and
> +  `NBD_CMD_TRIM`) that the server completes (i.e. replies to)
> +  prior to processing to a `NBD_CMD_FLUSH` MUST be written to non-volatile
> +  storage prior to replying to that `NBD_CMD_FLUSH`. The server SHOULD ensure
> +  that all write command received prior to processing the `NBD_CMD_FLUSH`
> +  (whether they are replied to or not) are written to non-volatile
> +  storage prior to processing an `NBD_CMD_FLUSH`; note this is a
> +  stronger condition than the previous 'MUST' condition. This

This seems to make little sense. Are you saying that suddenly now
sending a reply for FLUSH with outstanding writes is wrong? If not, the
above should be clarified.
Alex Bligh April 1, 2016, 9:28 a.m. UTC | #3
On 1 Apr 2016, at 09:35, Wouter Verhelst <w@uter.be> wrote:

>> +* All write commands (that includes both `NBD_CMD_WRITE` and
>> +  `NBD_CMD_TRIM`) that the server completes (i.e. replies to)
>> +  prior to processing to a `NBD_CMD_FLUSH` MUST be written to non-volatile
>> +  storage prior to replying to that `NBD_CMD_FLUSH`. The server SHOULD ensure
>> +  that all write command received prior to processing the `NBD_CMD_FLUSH`
>> +  (whether they are replied to or not) are written to non-volatile
>> +  storage prior to processing an `NBD_CMD_FLUSH`; note this is a
>> +  stronger condition than the previous 'MUST' condition. This
> 
> This seems to make little sense. Are you saying that suddenly now
> sending a reply for FLUSH with outstanding writes is wrong? If not, the
> above should be clarified.

The MUST sentence does not cover that situation as it only refers
to completed writes.

The SHOULD sentence says that's a 'SHOULD NOT' situation in respect
of writes that have PROCESSED (i.e actioned) whether or not they
have been replied to. Of course the client has no way of knowing
whether they have been PROCESSED without a reply.

Personally I think the SHOULD clause is pretty pointless and is
unnecessary, but that's where the conversation got to n years
ago I believe.

Happy to delete the last sentence if that's wrong.
Wouter Verhelst April 1, 2016, 10:10 a.m. UTC | #4
On Fri, Apr 01, 2016 at 10:28:03AM +0100, Alex Bligh wrote:
> 
> On 1 Apr 2016, at 09:35, Wouter Verhelst <w@uter.be> wrote:
> 
> >> +* All write commands (that includes both `NBD_CMD_WRITE` and
> >> +  `NBD_CMD_TRIM`) that the server completes (i.e. replies to)
> >> +  prior to processing to a `NBD_CMD_FLUSH` MUST be written to non-volatile
> >> +  storage prior to replying to that `NBD_CMD_FLUSH`. The server SHOULD ensure
> >> +  that all write command received prior to processing the `NBD_CMD_FLUSH`
> >> +  (whether they are replied to or not) are written to non-volatile
> >> +  storage prior to processing an `NBD_CMD_FLUSH`; note this is a
> >> +  stronger condition than the previous 'MUST' condition. This
> > 
> > This seems to make little sense. Are you saying that suddenly now
> > sending a reply for FLUSH with outstanding writes is wrong? If not, the
> > above should be clarified.
> 
> The MUST sentence does not cover that situation as it only refers
> to completed writes.
> 
> The SHOULD sentence says that's a 'SHOULD NOT' situation in respect
> of writes that have PROCESSED (i.e actioned) whether or not they
> have been replied to. Of course the client has no way of knowing
> whether they have been PROCESSED without a reply.
> 
> Personally I think the SHOULD clause is pretty pointless and is
> unnecessary, but that's where the conversation got to n years
> ago I believe.

I'm still not sure what it's supposed to mean, though. Clearly, you
should at the very least reword it, if not...

> Happy to delete the last sentence if that's wrong.

... remove it instead.
Alex Bligh April 1, 2016, 10:17 a.m. UTC | #5
On 1 Apr 2016, at 11:10, Wouter Verhelst <w@uter.be> wrote:

> On Fri, Apr 01, 2016 at 10:28:03AM +0100, Alex Bligh wrote:
>> 
>> On 1 Apr 2016, at 09:35, Wouter Verhelst <w@uter.be> wrote:
>> 
>>>> +* All write commands (that includes both `NBD_CMD_WRITE` and
>>>> +  `NBD_CMD_TRIM`) that the server completes (i.e. replies to)
>>>> +  prior to processing to a `NBD_CMD_FLUSH` MUST be written to non-volatile
>>>> +  storage prior to replying to that `NBD_CMD_FLUSH`. The server SHOULD ensure
>>>> +  that all write command received prior to processing the `NBD_CMD_FLUSH`
>>>> +  (whether they are replied to or not) are written to non-volatile
>>>> +  storage prior to processing an `NBD_CMD_FLUSH`; note this is a
>>>> +  stronger condition than the previous 'MUST' condition. This
>>> 
>>> This seems to make little sense. Are you saying that suddenly now
>>> sending a reply for FLUSH with outstanding writes is wrong? If not, the
>>> above should be clarified.
>> 
>> The MUST sentence does not cover that situation as it only refers
>> to completed writes.
>> 
>> The SHOULD sentence says that's a 'SHOULD NOT' situation in respect
>> of writes that have PROCESSED (i.e actioned) whether or not they
>> have been replied to. Of course the client has no way of knowing
>> whether they have been PROCESSED without a reply.
>> 
>> Personally I think the SHOULD clause is pretty pointless and is
>> unnecessary, but that's where the conversation got to n years
>> ago I believe.
> 
> I'm still not sure what it's supposed to mean, though. Clearly, you
> should at the very least reword it, if not...
> 
>> Happy to delete the last sentence if that's wrong.
> 
> ... remove it instead.

If I can't even explain it, it doesn't bode well!

I think there are three types of writes that are relevant at the point
of replying to a FLUSH:

Type A: writes that are truly 'in flight', i.e. have been sent, but have
not been 'processed' (i.e. write(2) has not been called).

Type B: writes that are have been processed (i.e. write(2) has not been
called) but the reply has not yet been sent to the client - either the
reply hasn't been made yet or (more likely) it's sitting in some queue.

Type C: writes that have been processed and a reply sent.


The first sentence (the 'MUST') refers to Type C writes and says these
MUST be persisted to permanent storage prior to replying to the FLUSH.

Type A writes may be in a buffer server side or even client side, so
no one expects anything to happen with respect to those.

The 'SHOULD' bit concerns Type B writes. It's saying that if you've
actually done the write call (but not yet replied) you SHOULD persist
these to disk prior to replying to the FLUSH. This type of situation
doesn't (last time I looked) happen in nbd-server.c because it does
not process requests in parallel. However it may happen elsewhere.
The question is what to do with them.

I'm happy dropping anything about these writes and only dealing with
Type C writes, but the 'SHOULD' is where I thought we got to before
(I think I sent the quote from the mailing list).
Wouter Verhelst April 1, 2016, 11:11 a.m. UTC | #6
On Fri, Apr 01, 2016 at 11:17:29AM +0100, Alex Bligh wrote:
> On 1 Apr 2016, at 11:10, Wouter Verhelst <w@uter.be> wrote:
> > On Fri, Apr 01, 2016 at 10:28:03AM +0100, Alex Bligh wrote:
> >> On 1 Apr 2016, at 09:35, Wouter Verhelst <w@uter.be> wrote:
> >>>> +* All write commands (that includes both `NBD_CMD_WRITE` and
> >>>> +  `NBD_CMD_TRIM`) that the server completes (i.e. replies to)
> >>>> +  prior to processing to a `NBD_CMD_FLUSH` MUST be written to non-volatile
> >>>> +  storage prior to replying to that `NBD_CMD_FLUSH`. The server SHOULD ensure
> >>>> +  that all write command received prior to processing the `NBD_CMD_FLUSH`
> >>>> +  (whether they are replied to or not) are written to non-volatile
> >>>> +  storage prior to processing an `NBD_CMD_FLUSH`; note this is a
> >>>> +  stronger condition than the previous 'MUST' condition. This
> >>> 
> >>> This seems to make little sense. Are you saying that suddenly now
> >>> sending a reply for FLUSH with outstanding writes is wrong? If not, the
> >>> above should be clarified.
> >> 
> >> The MUST sentence does not cover that situation as it only refers
> >> to completed writes.
> >> 
> >> The SHOULD sentence says that's a 'SHOULD NOT' situation in respect
> >> of writes that have PROCESSED (i.e actioned) whether or not they
> >> have been replied to. Of course the client has no way of knowing
> >> whether they have been PROCESSED without a reply.
> >> 
> >> Personally I think the SHOULD clause is pretty pointless and is
> >> unnecessary, but that's where the conversation got to n years
> >> ago I believe.
> > 
> > I'm still not sure what it's supposed to mean, though. Clearly, you
> > should at the very least reword it, if not...
> > 
> >> Happy to delete the last sentence if that's wrong.
> > 
> > ... remove it instead.
> 
> If I can't even explain it, it doesn't bode well!

:-)

> I think there are three types of writes that are relevant at the point
> of replying to a FLUSH:
> 
> Type A: writes that are truly 'in flight', i.e. have been sent, but have
> not been 'processed' (i.e. write(2) has not been called).

Right.

> Type B: writes that are have been processed (i.e. write(2) has not been

ITYM s/not// here?

> called) but the reply has not yet been sent to the client - either the
> reply hasn't been made yet or (more likely) it's sitting in some queue.
>
> Type C: writes that have been processed and a reply sent.
> 
> 
> The first sentence (the 'MUST') refers to Type C writes and says these
> MUST be persisted to permanent storage prior to replying to the FLUSH.
> 
> Type A writes may be in a buffer server side or even client side, so
> no one expects anything to happen with respect to those.
> 
> The 'SHOULD' bit concerns Type B writes. It's saying that if you've
> actually done the write call (but not yet replied) you SHOULD persist
> these to disk prior to replying to the FLUSH. This type of situation
> doesn't (last time I looked) happen in nbd-server.c because it does
> not process requests in parallel.

Actually, it does, since a few months or so :-)

> However it may happen elsewhere.  The question is what to do with them.

Right.

As far as the client (and as a direct result, the protocol) is concerned,
however, there is no substantial difference between type A and type B.
The request has been sent, and the reply hasn't been received yet. This
is the same for A as well as for B.

I don't think it is at all useful to tell the server what it should do
in that situation. If the server implements FLUSH with fsync(), then
type B replies are handled exactly like type C replies. If the server
implements FLUSH with something else, however, then flushing type B
replies could potentially be much more expensive then flushing type C
replies.

I think this is one of those cases where the protocol spec simply shouldn't
say anything (at most, the protocol spec could explicitly say that it
doesn't prescribe any particular behaviour). Requests that have been
sent to the server MAY be flushed by a FLUSH request, but MAY equally
well not be. There is no reason to use SHOULD regarding anything about
that.

(I believe that is what the spec currently says, fwiw, but I didn't
check and may be wrong)
Alex Bligh April 1, 2016, 11:56 a.m. UTC | #7
On 1 Apr 2016, at 12:11, Wouter Verhelst <w@uter.be> wrote:

> I don't think it is at all useful to tell the server what it should do
> in that situation.

Agree. I will delete it.
diff mbox

Patch

diff --git a/doc/proto.md b/doc/proto.md
index c1e05c5..bc4483d 100644
--- a/doc/proto.md
+++ b/doc/proto.md
@@ -197,6 +197,37 @@  handle as was sent by the client in the corresponding request.  In
 this way, the client can correlate which request is receiving a
 response.
 
+#### Ordering of messages and writes
+
+The server MAY process commands out of order, and MAY reply out of
+order, save that:
+
+* All write commands (that includes both `NBD_CMD_WRITE` and
+  `NBD_CMD_TRIM`) that the server completes (i.e. replies to)
+  prior to processing to a `NBD_CMD_FLUSH` MUST be written to non-volatile
+  storage prior to replying to that `NBD_CMD_FLUSH`. The server SHOULD ensure
+  that all write command received prior to processing the `NBD_CMD_FLUSH`
+  (whether they are replied to or not) are written to non-volatile
+  storage prior to processing an `NBD_CMD_FLUSH`; note this is a
+  stronger condition than the previous 'MUST' condition. This
+  paragram only applies if `NBD_FLAG_SEND_FLUSH` is set within
+  the transmission flags, as otherwise `NBD_CMD_FLUSH` will never
+  be sent by the client to the server.
+
+* A server MUST NOT reply to a command that has `NBD_CMD_FLAG_FUA` set
+  in its command flags until   the data area referred to by that command
+  is persisted to non-volatile storage. This only applies if
+  `NBD_FLAG_SEND_FLUSH` is set within the transmission flags, as otherwise
+  `NBD_CMD_FLAG_FUA` will not be set on any commands sent to the server
+  by the client.
+
+`NBD_CMD_FLUSH` is modelled on the Linux kernel empty bio with
+`REQ_FLUSH` set. `NBD_CMD_FLAG_FUA` is modelled on the Linux
+kernel bio with `REQ_FUA` set. In case of ambiguity in this
+specification, the
+[kernel documentation](https://www.kernel.org/doc/Documentation/block/writeback_cache_control.txt)
+may be useful.
+
 #### Request message
 
 The request message, sent by the client, looks as follows:
@@ -444,10 +475,17 @@  affects a particular command.  Clients MUST NOT set a command flag bit
 that is not documented for the particular command; and whether a flag is
 valid may depend on negotiation during the handshake phase.
 
-- bit 0, `NBD_CMD_FLAG_FUA`; valid during `NBD_CMD_WRITE`.  SHOULD be
-  set to 1 if the client requires "Force Unit Access" mode of
-  operation.  MUST NOT be set unless transmission flags included
-  `NBD_FLAG_SEND_FUA`.
+- bit 0, `NBD_CMD_FLAG_FUA`. This bit
+  MUST be set to 0 unless the `NBD_FLAG_SEND_FUA` flag ("Force Unit Access")
+  was set in the transmission flags field. If the `NBD_FLAG_SEND_FUA`
+  is set in the transmission flags field, the client MAY set
+  `NBD_CMD_FLAG_FUA` in any request. If this bit is set, the server
+  MUST NOT send a reply until it has ensured that any data referred to
+  by this request (i.e. written data on a write or trim, read data on
+  a read) has reached permanent storage. There will be certain commands
+  (e.g. `NBD_CMD_DISC`) for which this flag will thus not alter behaviour
+  (as the command does not refer to any data), in which case the server
+  MUST ignore this bit.
 
 #### Request types
 
@@ -479,12 +517,6 @@  The following request types exist:
     message. The server MAY send the reply message before the data has
     reached permanent storage.
 
-    If the `NBD_FLAG_SEND_FUA` flag ("Force Unit Access") was set in the
-    transmission flags field, the client MAY set the flag `NBD_CMD_FLAG_FUA` in
-    the command flags field. If this flag was set, the server MUST NOT send
-    the reply until it has ensured that the newly-written data has reached
-    permanent storage.
-
     If an error occurs, the server SHOULD set the appropriate error code
     in the error field. The server MAY then close the connection.