diff mbox series

qapi: Reintroduce CommandDisabled error class

Message ID f307829a5e64121b5cb8ad1aefff09f41cac9699.1567070002.git.mprivozn@redhat.com (mailing list archive)
State New, archived
Headers show
Series qapi: Reintroduce CommandDisabled error class | expand

Commit Message

Michal Privoznik Aug. 29, 2019, 9:14 a.m. UTC
If there was a disabled command, then qemu-ga used to report
CommandDisabled error class (among with human readable
description). This changed in v1.2.0-rc0~28^2~16 in favor of
GenericError class. While the change might work for other
classes, this one should not have been dropped because it helps
callers distinguish the root cause of the error.

A bit of background: up until very recently libvirt used qemu-ga
in all or nothing way. It didn't care why a qemu-ga command
failed. But very recently a new API was introduced which
implements 'best effort' approach (in some cases) and thus
libvirt must differentiate between: {CommandNotFound,
CommandDisabled} and some generic error. While the former classes
mean the API can issue some other commands the latter raises a
red flag causing the API to fail.

This reverts df1e608a01 partially.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
---
 include/qapi/error.h | 1 +
 qapi/error.json      | 4 +++-
 qapi/qmp-dispatch.c  | 5 +++--
 3 files changed, 7 insertions(+), 3 deletions(-)

Comments

Markus Armbruster Aug. 29, 2019, 12:10 p.m. UTC | #1
Michal Privoznik <mprivozn@redhat.com> writes:

> If there was a disabled command, then qemu-ga used to report
> CommandDisabled error class (among with human readable
> description). This changed in v1.2.0-rc0~28^2~16 in favor of
> GenericError class.

Really?  I believe it was slightly earlier in the same series:

93b91c59db qemu-ga: switch to the new error format on the wire
de253f1491 qmp: switch to the new error format on the wire

The commit you mention (df1e608a01e) is merely follow-up simplification.

>                     While the change might work for other
> classes, this one should not have been dropped because it helps
> callers distinguish the root cause of the error.
>
> A bit of background: up until very recently libvirt used qemu-ga
> in all or nothing way. It didn't care why a qemu-ga command
> failed. But very recently a new API was introduced which
> implements 'best effort' approach (in some cases) and thus
> libvirt must differentiate between: {CommandNotFound,
> CommandDisabled} and some generic error. While the former classes
> mean the API can issue some other commands the latter raises a
> red flag causing the API to fail.

Why do you need to distinguish CommandNotFound from CommandDisabled?

> This reverts df1e608a01 partially.
>
> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Michal Privoznik Aug. 29, 2019, 1:04 p.m. UTC | #2
On 8/29/19 2:10 PM, Markus Armbruster wrote:
> Michal Privoznik <mprivozn@redhat.com> writes:
> 
>> If there was a disabled command, then qemu-ga used to report
>> CommandDisabled error class (among with human readable
>> description). This changed in v1.2.0-rc0~28^2~16 in favor of
>> GenericError class.
> 
> Really?  I believe it was slightly earlier in the same series:
> 
> 93b91c59db qemu-ga: switch to the new error format on the wire
> de253f1491 qmp: switch to the new error format on the wire

Ah, you're right. It's the first commit that you reference.

> 
> The commit you mention (df1e608a01e) is merely follow-up simplification.
> 
>>                      While the change might work for other
>> classes, this one should not have been dropped because it helps
>> callers distinguish the root cause of the error.
>>
>> A bit of background: up until very recently libvirt used qemu-ga
>> in all or nothing way. It didn't care why a qemu-ga command
>> failed. But very recently a new API was introduced which
>> implements 'best effort' approach (in some cases) and thus
>> libvirt must differentiate between: {CommandNotFound,
>> CommandDisabled} and some generic error. While the former classes
>> mean the API can issue some other commands the latter raises a
>> red flag causing the API to fail.
> 
> Why do you need to distinguish CommandNotFound from CommandDisabled?

I don't. That's why I've put them both in curly braces. Perhaps this 
says its better:

switch (klass) {
  case CommandNotFound:
  case CommandDisabled:
        /* okay */
        break;

  default:
        /* bad, error out */
        break;
}

> 
>> This reverts df1e608a01 partially.
>>
>> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>

Michal
Eric Blake Aug. 29, 2019, 1:12 p.m. UTC | #3
On 8/29/19 8:04 AM, Michal Privoznik wrote:

>>> A bit of background: up until very recently libvirt used qemu-ga
>>> in all or nothing way. It didn't care why a qemu-ga command
>>> failed. But very recently a new API was introduced which
>>> implements 'best effort' approach (in some cases) and thus
>>> libvirt must differentiate between: {CommandNotFound,
>>> CommandDisabled} and some generic error. While the former classes
>>> mean the API can issue some other commands the latter raises a
>>> red flag causing the API to fail.
>>
>> Why do you need to distinguish CommandNotFound from CommandDisabled?
> 
> I don't. That's why I've put them both in curly braces. Perhaps this 
> says its better:
> 
> switch (klass) {
>   case CommandNotFound:
>   case CommandDisabled:
>         /* okay */
>         break;
> 

So the obvious counter-question - why not use class CommandNotFound for
a command that was disabled, rather than readding another class that has
no distinctive purpose?
Michal Privoznik Aug. 29, 2019, 1:24 p.m. UTC | #4
On 8/29/19 3:12 PM, Eric Blake wrote:
> On 8/29/19 8:04 AM, Michal Privoznik wrote:
> 
>>>> A bit of background: up until very recently libvirt used qemu-ga
>>>> in all or nothing way. It didn't care why a qemu-ga command
>>>> failed. But very recently a new API was introduced which
>>>> implements 'best effort' approach (in some cases) and thus
>>>> libvirt must differentiate between: {CommandNotFound,
>>>> CommandDisabled} and some generic error. While the former classes
>>>> mean the API can issue some other commands the latter raises a
>>>> red flag causing the API to fail.
>>>
>>> Why do you need to distinguish CommandNotFound from CommandDisabled?
>>
>> I don't. That's why I've put them both in curly braces. Perhaps this
>> says its better:
>>
>> switch (klass) {
>>    case CommandNotFound:
>>    case CommandDisabled:
>>          /* okay */
>>          break;
>>
> 
> So the obvious counter-question - why not use class CommandNotFound for
> a command that was disabled, rather than readding another class that has
> no distinctive purpose?
> 
> 

Because disabling a command is not the same as nonexistent command. 
While a command can be disabled by user/sysadmin, they are disabled at 
runtime by qemu-ga itself for a short period of time (e.g. on FS freeze 
some commands are disabled - typically those which require write disk 
access). And I guess reporting CommandNotFound for a command that does 
exist only is disabled temporarily doesn't reflect the reality, does it?

On the other hand, CommandNotFound would fix the issue for libvirt, so 
if you don't want to invent a new error class, then that's the way to go.

Michal
Markus Armbruster Aug. 30, 2019, 11:52 a.m. UTC | #5
Michal Privoznik <mprivozn@redhat.com> writes:

> On 8/29/19 3:12 PM, Eric Blake wrote:
>> On 8/29/19 8:04 AM, Michal Privoznik wrote:
>>
>>>>> A bit of background: up until very recently libvirt used qemu-ga
>>>>> in all or nothing way. It didn't care why a qemu-ga command
>>>>> failed. But very recently a new API was introduced which
>>>>> implements 'best effort' approach (in some cases) and thus
>>>>> libvirt must differentiate between: {CommandNotFound,
>>>>> CommandDisabled} and some generic error. While the former classes
>>>>> mean the API can issue some other commands the latter raises a
>>>>> red flag causing the API to fail.
>>>>
>>>> Why do you need to distinguish CommandNotFound from CommandDisabled?
>>>
>>> I don't. That's why I've put them both in curly braces. Perhaps this
>>> says its better:
>>>
>>> switch (klass) {
>>>    case CommandNotFound:
>>>    case CommandDisabled:
>>>          /* okay */
>>>          break;
>>>
>>
>> So the obvious counter-question - why not use class CommandNotFound for
>> a command that was disabled, rather than readding another class that has
>> no distinctive purpose?
>>
>>
>
> Because disabling a command is not the same as nonexistent
> command. While a command can be disabled by user/sysadmin, they are
> disabled at runtime by qemu-ga itself for a short period of time
> (e.g. on FS freeze some commands are disabled - typically those which
> require write disk access). And I guess reporting CommandNotFound for
> a command that does exist only is disabled temporarily doesn't reflect
> the reality, does it?
>
> On the other hand, CommandNotFound would fix the issue for libvirt, so
> if you don't want to invent a new error class, then that's the way to
> go.

I'm fine with changing the error to CommandNotFound.

I'm reluctant to add back CommandDisabled.  I doubt it's necessary.

To arrive at an informed opinion, I had to figure out how this command
disablement stuff works.  I can just as well send it out, so here goes.

Let's review our command disable feature.

Commands are enabled on registration, see qmp_register_command().

To disable, call qmp_disable_command().  Only qga/main.c does, in two
places:

* ga_disable_non_whitelisted(): disable all commands except for
  ga_freeze_whitelist[], which is documented as /* commands that are
  safe to issue while filesystems are frozen */

* initialize_agent(): disable blacklisted commands.  I figure these are
  the ones blacklisted with -b, plus commands blacklisted due to build
  configuration.  The latter feels inappropriate; we should use QAPI
  schema conditionals to compile them out instead (QAPI conditionals
  didn't exist when the blacklisting code was written).

Disabled commands can be re-enabled with qmp_enable_command().  Only
qga/main.c does, in ga_enable_non_blacklisted().  I figure it re-enables
the commands ga_disable_non_whitelisted() disables.  Gets called when
guest-fsfreeze-freeze freezes nothing[1], and when guest-fsfreeze-thaw
succeeds[2].

Command dispatch fails when the command is disabled, in
do_qmp_dispatch().  The proposed patch changes the error reply.

QGA's guest-info shows whether a command is disabled
(GuestAgentCommandInfo member @enabled, set in qmp_command_info()).

QMP's query-commands skips disabled commands, in query_commands_cb().
Dead, as nothing ever disables QMP commands.  Skipping feels like a bad
idea anyway.

Analysis:

There are three kinds of disabled commands: compile-time (should be
compiled out instead), permanently blacklisted with -b, temporarily
disabled while filesystems are frozen.

There are two states: thawed (first two kinds disabled) and frozen (all
three kinds disabled).

Command guest-fsfreeze-freeze[3] goes to state frozen or else fails.

Command guest-fsfreeze-thaw goes to state thawed or else fails.

guest-fsfreeze-status reports the state.

Note that the transition to frozen (and thus the temporary command
disablement) is under the control of the QGA client.  There is no
TOCTTOU between guest-info telling you which commands are disabled and
executing the next command.  My point is: the client can figure out
whether a command is disabled before executing it.

Of course, that doesn't mean we should make it figure it out.


[1] POSIX only, WTF?

[2] Except for execute_fsfreeze_hook(), which can still fail the command
    on POSIX, WTF?

[3] guest-fsfreeze-freeze's doc comment notes "The frozen state is
    limited for up to 10 seconds by VSS."  Sounds like some spontaneous
    transition back to thawed.  If this is actually true, GAState member
    @frozen is not updated to reflect the spontaneous thaw.  WTF?
Michal Privoznik Aug. 30, 2019, 1:29 p.m. UTC | #6
On 8/30/19 1:52 PM, Markus Armbruster wrote:
> Michal Privoznik <mprivozn@redhat.com> writes:
> 
>> On 8/29/19 3:12 PM, Eric Blake wrote:
>>> On 8/29/19 8:04 AM, Michal Privoznik wrote:
>>>
>>>>>> A bit of background: up until very recently libvirt used qemu-ga
>>>>>> in all or nothing way. It didn't care why a qemu-ga command
>>>>>> failed. But very recently a new API was introduced which
>>>>>> implements 'best effort' approach (in some cases) and thus
>>>>>> libvirt must differentiate between: {CommandNotFound,
>>>>>> CommandDisabled} and some generic error. While the former classes
>>>>>> mean the API can issue some other commands the latter raises a
>>>>>> red flag causing the API to fail.
>>>>>
>>>>> Why do you need to distinguish CommandNotFound from CommandDisabled?
>>>>
>>>> I don't. That's why I've put them both in curly braces. Perhaps this
>>>> says its better:
>>>>
>>>> switch (klass) {
>>>>     case CommandNotFound:
>>>>     case CommandDisabled:
>>>>           /* okay */
>>>>           break;
>>>>
>>>
>>> So the obvious counter-question - why not use class CommandNotFound for
>>> a command that was disabled, rather than readding another class that has
>>> no distinctive purpose?
>>>
>>>
>>
>> Because disabling a command is not the same as nonexistent
>> command. While a command can be disabled by user/sysadmin, they are
>> disabled at runtime by qemu-ga itself for a short period of time
>> (e.g. on FS freeze some commands are disabled - typically those which
>> require write disk access). And I guess reporting CommandNotFound for
>> a command that does exist only is disabled temporarily doesn't reflect
>> the reality, does it?
>>
>> On the other hand, CommandNotFound would fix the issue for libvirt, so
>> if you don't want to invent a new error class, then that's the way to
>> go.
> 
> I'm fine with changing the error to CommandNotFound.
> 
> I'm reluctant to add back CommandDisabled.  I doubt it's necessary.
> 
> To arrive at an informed opinion, I had to figure out how this command
> disablement stuff works.  I can just as well send it out, so here goes.
> 
> Let's review our command disable feature.
> 
> Commands are enabled on registration, see qmp_register_command().
> 
> To disable, call qmp_disable_command().  Only qga/main.c does, in two
> places:
> 
> * ga_disable_non_whitelisted(): disable all commands except for
>    ga_freeze_whitelist[], which is documented as /* commands that are
>    safe to issue while filesystems are frozen */
> 
> * initialize_agent(): disable blacklisted commands.  I figure these are
>    the ones blacklisted with -b, plus commands blacklisted due to build
>    configuration.  The latter feels inappropriate; we should use QAPI
>    schema conditionals to compile them out instead (QAPI conditionals
>    didn't exist when the blacklisting code was written).
> 
> Disabled commands can be re-enabled with qmp_enable_command().  Only
> qga/main.c does, in ga_enable_non_blacklisted().  I figure it re-enables
> the commands ga_disable_non_whitelisted() disables.  Gets called when
> guest-fsfreeze-freeze freezes nothing[1], and when guest-fsfreeze-thaw
> succeeds[2].
> 
> Command dispatch fails when the command is disabled, in
> do_qmp_dispatch().  The proposed patch changes the error reply.
> 
> QGA's guest-info shows whether a command is disabled
> (GuestAgentCommandInfo member @enabled, set in qmp_command_info()).
> 
> QMP's query-commands skips disabled commands, in query_commands_cb().
> Dead, as nothing ever disables QMP commands.  Skipping feels like a bad
> idea anyway.
> 
> Analysis:
> 
> There are three kinds of disabled commands: compile-time (should be
> compiled out instead), permanently blacklisted with -b, temporarily
> disabled while filesystems are frozen.
> 
> There are two states: thawed (first two kinds disabled) and frozen (all
> three kinds disabled).
> 
> Command guest-fsfreeze-freeze[3] goes to state frozen or else fails.
> 
> Command guest-fsfreeze-thaw goes to state thawed or else fails.
> 
> guest-fsfreeze-status reports the state.
> 
> Note that the transition to frozen (and thus the temporary command
> disablement) is under the control of the QGA client.  There is no
> TOCTTOU between guest-info telling you which commands are disabled and
> executing the next command.  My point is: the client can figure out
> whether a command is disabled before executing it.

Alright then, I'll respin with CommandNotFound. Both work for libvirt.

Michal
diff mbox series

Patch

diff --git a/include/qapi/error.h b/include/qapi/error.h
index 3f95141a01..7116b86a92 100644
--- a/include/qapi/error.h
+++ b/include/qapi/error.h
@@ -129,6 +129,7 @@ 
 typedef enum ErrorClass {
     ERROR_CLASS_GENERIC_ERROR = QAPI_ERROR_CLASS_GENERICERROR,
     ERROR_CLASS_COMMAND_NOT_FOUND = QAPI_ERROR_CLASS_COMMANDNOTFOUND,
+    ERROR_CLASS_COMMAND_DISABLED = QAPI_ERROR_CLASS_COMMANDDISABLED,
     ERROR_CLASS_DEVICE_NOT_ACTIVE = QAPI_ERROR_CLASS_DEVICENOTACTIVE,
     ERROR_CLASS_DEVICE_NOT_FOUND = QAPI_ERROR_CLASS_DEVICENOTFOUND,
     ERROR_CLASS_KVM_MISSING_CAP = QAPI_ERROR_CLASS_KVMMISSINGCAP,
diff --git a/qapi/error.json b/qapi/error.json
index 3fad08f506..334d481399 100644
--- a/qapi/error.json
+++ b/qapi/error.json
@@ -14,6 +14,8 @@ 
 #
 # @CommandNotFound: the requested command has not been found
 #
+# @CommandDisabled: the requested command has been disabled
+#
 # @DeviceNotActive: a device has failed to be become active
 #
 # @DeviceNotFound: the requested device has not been found
@@ -25,5 +27,5 @@ 
 ##
 { 'enum': 'QapiErrorClass',
   # Keep this in sync with ErrorClass in error.h
-  'data': [ 'GenericError', 'CommandNotFound',
+  'data': [ 'GenericError', 'CommandNotFound', 'CommandDisabled',
             'DeviceNotActive', 'DeviceNotFound', 'KVMMissingCap' ] }
diff --git a/qapi/qmp-dispatch.c b/qapi/qmp-dispatch.c
index 3037d353a4..913b3363cb 100644
--- a/qapi/qmp-dispatch.c
+++ b/qapi/qmp-dispatch.c
@@ -104,8 +104,9 @@  static QObject *do_qmp_dispatch(QmpCommandList *cmds, QObject *request,
         return NULL;
     }
     if (!cmd->enabled) {
-        error_setg(errp, "The command %s has been disabled for this instance",
-                   command);
+        error_set(errp, ERROR_CLASS_COMMAND_DISABLED,
+                  "The command %s has been disabled for this instance",
+                  command);
         return NULL;
     }
     if (oob && !(cmd->options & QCO_ALLOW_OOB)) {