diff mbox series

[v3,2/9] qapi: make blockdev-add a coroutine command

Message ID 20210906190654.183421-3-vsementsov@virtuozzo.com (mailing list archive)
State New, archived
Headers show
Series nbd reconnect on open | expand

Commit Message

Vladimir Sementsov-Ogievskiy Sept. 6, 2021, 7:06 p.m. UTC
We are going to support nbd reconnect on open in a next commit. This
means that we want to do several connection attempts during some time.
And this should be done in a coroutine, otherwise we'll stuck.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 qapi/block-core.json | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Markus Armbruster Sept. 6, 2021, 7:28 p.m. UTC | #1
Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> writes:

> We are going to support nbd reconnect on open in a next commit. This
> means that we want to do several connection attempts during some time.
> And this should be done in a coroutine, otherwise we'll stuck.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  qapi/block-core.json | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 06674c25c9..6e4042530a 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -4219,7 +4219,8 @@
>  # <- { "return": {} }
>  #
>  ##
> -{ 'command': 'blockdev-add', 'data': 'BlockdevOptions', 'boxed': true }
> +{ 'command': 'blockdev-add', 'data': 'BlockdevOptions', 'boxed': true,
> +  'coroutine': true }
>  
>  ##
>  # @blockdev-reopen:

Why is this safe?

Prior discusson:
Message-ID: <87lfq0yp9v.fsf@dusky.pond.sub.org>
https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg04921.html
Vladimir Sementsov-Ogievskiy Sept. 6, 2021, 9:40 p.m. UTC | #2
06.09.2021 22:28, Markus Armbruster wrote:
> Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> writes:
> 
>> We are going to support nbd reconnect on open in a next commit. This
>> means that we want to do several connection attempts during some time.
>> And this should be done in a coroutine, otherwise we'll stuck.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   qapi/block-core.json | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index 06674c25c9..6e4042530a 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -4219,7 +4219,8 @@
>>   # <- { "return": {} }
>>   #
>>   ##
>> -{ 'command': 'blockdev-add', 'data': 'BlockdevOptions', 'boxed': true }
>> +{ 'command': 'blockdev-add', 'data': 'BlockdevOptions', 'boxed': true,
>> +  'coroutine': true }
>>   
>>   ##
>>   # @blockdev-reopen:
> 
> Why is this safe?
> 
> Prior discusson:
> Message-ID: <87lfq0yp9v.fsf@dusky.pond.sub.org>
> https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg04921.html
> 

Hmm.. I'm afraid, that I can't prove that it's safe. At least it will mean to audit .bdrv_open() of all block drivers.. And nothing prevents creating new incompatible drivers in future..

On the other hand, looking at qmp_blockdev_add, bdrv_open() is the only thing of interest.

And theoretically, bdrv_open() should work in coroutine context. We do call this function from coroutine_fn functions sometimes. So, maybe, if in some circumstances, bdrv_open() is not compatible with coroutine context, we can consider it as a bug? And fix it later, if it happen?
Markus Armbruster Sept. 7, 2021, 6:14 a.m. UTC | #3
Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> writes:

> 06.09.2021 22:28, Markus Armbruster wrote:
>> Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> writes:
>> 
>>> We are going to support nbd reconnect on open in a next commit. This
>>> means that we want to do several connection attempts during some time.
>>> And this should be done in a coroutine, otherwise we'll stuck.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   qapi/block-core.json | 3 ++-
>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>>> index 06674c25c9..6e4042530a 100644
>>> --- a/qapi/block-core.json
>>> +++ b/qapi/block-core.json
>>> @@ -4219,7 +4219,8 @@
>>>   # <- { "return": {} }
>>>   #
>>>   ##
>>> -{ 'command': 'blockdev-add', 'data': 'BlockdevOptions', 'boxed': true }
>>> +{ 'command': 'blockdev-add', 'data': 'BlockdevOptions', 'boxed': true,
>>> +  'coroutine': true }
>>>   
>>>   ##
>>>   # @blockdev-reopen:
>> 
>> Why is this safe?
>> 
>> Prior discusson:
>> Message-ID: <87lfq0yp9v.fsf@dusky.pond.sub.org>
>> https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg04921.html
>> 
>
> Hmm.. I'm afraid, that I can't prove that it's safe. At least it will mean to audit .bdrv_open() of all block drivers.. And nothing prevents creating new incompatible drivers in future..

Breaking existing block drivers is more serious than adding new drivers
broken.

> On the other hand, looking at qmp_blockdev_add, bdrv_open() is the only thing of interest.
>
> And theoretically, bdrv_open() should work in coroutine context. We do call this function from coroutine_fn functions sometimes. So, maybe, if in some circumstances, bdrv_open() is not compatible with coroutine context, we can consider it as a bug? And fix it later, if it happen?

If it's already used in ways that require coroutine-compatibility, we'd
merely add another way for existing bugs to bite.  Kevin, is it?

In general, the less coroutine-incompatibility we have, the better.
From the thread I quoted:

    Kevin Wolf <kwolf@redhat.com> writes:

    > AM 22.01.2020 um 13:15 hat Markus Armbruster geschrieben:
    [...]
    >> Is coroutine-incompatible command handler code necessary or accidental?
    >> 
    >> By "necessary" I mean there are (and likely always will be) commands
    >> that need to do stuff that cannot or should not be done on coroutine
    >> context.
    >> 
    >> "Accidental" is the opposite: coroutine-incompatibility can be regarded
    >> as a fixable flaw.
    >
    > I expect it's mostly accidental, but maybe not all of it.

I'm inclinded to regard accidental coroutine-incompatibility as a bug.

As long as the command doesn't have the coroutine flag set, it's a
latent bug.

Setting the coroutine flag without auditing the code risks making such
latent bugs actual bugs.  A typical outcome is deadlock.

Whether we're willing to accept such risk is not for me to decide.

We weren't when we merged the coroutine work, at least not wholesale.
The risk is perhaps less scary for a single command.  On the other hand,
code audit is less daunting, too.

Kevin, what do you think?
Vladimir Sementsov-Ogievskiy Sept. 27, 2021, 11:25 a.m. UTC | #4
07.09.2021 09:14, Markus Armbruster wrote:
> Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> writes:
> 
>> 06.09.2021 22:28, Markus Armbruster wrote:
>>> Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> writes:
>>>
>>>> We are going to support nbd reconnect on open in a next commit. This
>>>> means that we want to do several connection attempts during some time.
>>>> And this should be done in a coroutine, otherwise we'll stuck.
>>>>
>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>> ---
>>>>    qapi/block-core.json | 3 ++-
>>>>    1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>>>> index 06674c25c9..6e4042530a 100644
>>>> --- a/qapi/block-core.json
>>>> +++ b/qapi/block-core.json
>>>> @@ -4219,7 +4219,8 @@
>>>>    # <- { "return": {} }
>>>>    #
>>>>    ##
>>>> -{ 'command': 'blockdev-add', 'data': 'BlockdevOptions', 'boxed': true }
>>>> +{ 'command': 'blockdev-add', 'data': 'BlockdevOptions', 'boxed': true,
>>>> +  'coroutine': true }
>>>>    
>>>>    ##
>>>>    # @blockdev-reopen:
>>>
>>> Why is this safe?
>>>
>>> Prior discusson:
>>> Message-ID: <87lfq0yp9v.fsf@dusky.pond.sub.org>
>>> https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg04921.html
>>>
>>
>> Hmm.. I'm afraid, that I can't prove that it's safe. At least it will mean to audit .bdrv_open() of all block drivers.. And nothing prevents creating new incompatible drivers in future..
> 
> Breaking existing block drivers is more serious than adding new drivers
> broken.
> 
>> On the other hand, looking at qmp_blockdev_add, bdrv_open() is the only thing of interest.
>>
>> And theoretically, bdrv_open() should work in coroutine context. We do call this function from coroutine_fn functions sometimes. So, maybe, if in some circumstances, bdrv_open() is not compatible with coroutine context, we can consider it as a bug? And fix it later, if it happen?
> 
> If it's already used in ways that require coroutine-compatibility, we'd
> merely add another way for existing bugs to bite.  Kevin, is it?
> 
> In general, the less coroutine-incompatibility we have, the better.
>  From the thread I quoted:
> 
>      Kevin Wolf <kwolf@redhat.com> writes:
> 
>      > AM 22.01.2020 um 13:15 hat Markus Armbruster geschrieben:
>      [...]
>      >> Is coroutine-incompatible command handler code necessary or accidental?
>      >>
>      >> By "necessary" I mean there are (and likely always will be) commands
>      >> that need to do stuff that cannot or should not be done on coroutine
>      >> context.
>      >>
>      >> "Accidental" is the opposite: coroutine-incompatibility can be regarded
>      >> as a fixable flaw.
>      >
>      > I expect it's mostly accidental, but maybe not all of it.
> 
> I'm inclinded to regard accidental coroutine-incompatibility as a bug.
> 
> As long as the command doesn't have the coroutine flag set, it's a
> latent bug.
> 
> Setting the coroutine flag without auditing the code risks making such
> latent bugs actual bugs.  A typical outcome is deadlock.
> 
> Whether we're willing to accept such risk is not for me to decide.
> 
> We weren't when we merged the coroutine work, at least not wholesale.
> The risk is perhaps less scary for a single command.  On the other hand,
> code audit is less daunting, too.
> 
> Kevin, what do you think?
> 


Any thoughts?

I think, we can simply proceed without this patch. That just means that blockdev-add remains blocking, and using it to add nbd node with long open-timeout when guest is running [*] will be inconvenient (we don't want to block the running guest). Still commandline interface, and running blockdev-add when guest is paused is OK.

Anyway, this case [*] is not super useful: OK, guest isn't blocked, but you can't issue more qmp commands until open-timeout passed. That's not very convenient for running vm.

Side thought: don't we have/plan async qmp commands or something like this? So, that command is started in a coroutine, but user don't have to wait for finish to run more QMP commands? It should be more useful for command that may take long time. We have jobs, but implementing new job for such command seems too heavy.
diff mbox series

Patch

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 06674c25c9..6e4042530a 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -4219,7 +4219,8 @@ 
 # <- { "return": {} }
 #
 ##
-{ 'command': 'blockdev-add', 'data': 'BlockdevOptions', 'boxed': true }
+{ 'command': 'blockdev-add', 'data': 'BlockdevOptions', 'boxed': true,
+  'coroutine': true }
 
 ##
 # @blockdev-reopen: