diff mbox series

[v1,2/2] system/cpus: Fix resume_all_vcpus() under vCPU hotplug condition

Message ID 20240317083704.23244-3-zhukeqian1@huawei.com (mailing list archive)
State New, archived
Headers show
Series Some fixes for pause and resume all vcpus | expand

Commit Message

zhukeqian March 17, 2024, 8:37 a.m. UTC
For vCPU being hotplugged, qemu_init_vcpu() is called. In this
function, we set vcpu state as stopped, and then wait vcpu thread
to be created.

As the vcpu state is stopped, it will inform us it has been created
and then wait on halt_cond. After we has realized vcpu object, we
will resume the vcpu thread.

However, during we wait vcpu thread to be created, the bql is
unlocked, and other thread is allowed to call resume_all_vcpus(),
which will resume the un-realized vcpu.

This fixes the issue by filter out un-realized vcpu during
resume_all_vcpus().

Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
---
 system/cpus.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

David Hildenbrand March 18, 2024, 10:14 a.m. UTC | #1
On 17.03.24 09:37, Keqian Zhu via wrote:
> For vCPU being hotplugged, qemu_init_vcpu() is called. In this
> function, we set vcpu state as stopped, and then wait vcpu thread
> to be created.
> 
> As the vcpu state is stopped, it will inform us it has been created
> and then wait on halt_cond. After we has realized vcpu object, we
> will resume the vcpu thread.
> 
> However, during we wait vcpu thread to be created, the bql is
> unlocked, and other thread is allowed to call resume_all_vcpus(),
> which will resume the un-realized vcpu.
> 
> This fixes the issue by filter out un-realized vcpu during
> resume_all_vcpus().

Similar question: is there a reproducer?

How could we currently hotplug a VCPU, and while it is being created, 
see pause_all_vcpus()/resume_all_vcpus() getting claled.

If I am not getting this wrong, there seems to be some other mechanism 
missing that makes sure that this cannot happen. Dropping the BQL 
half-way through creating a VCPU might be the problem.
zhukeqian March 19, 2024, 5:11 a.m. UTC | #2
Hi David,

On 17.03.24 09:37, Keqian Zhu via wrote:
>> For vCPU being hotplugged, qemu_init_vcpu() is called. In this 
>> function, we set vcpu state as stopped, and then wait vcpu thread to 
>> be created.
>> 
>> As the vcpu state is stopped, it will inform us it has been created 
>> and then wait on halt_cond. After we has realized vcpu object, we will 
>> resume the vcpu thread.
>> 
>> However, during we wait vcpu thread to be created, the bql is 
>> unlocked, and other thread is allowed to call resume_all_vcpus(), 
>> which will resume the un-realized vcpu.
>> 
>> This fixes the issue by filter out un-realized vcpu during 
>> resume_all_vcpus().
>
>Similar question: is there a reproducer? 
>
>How could we currently hotplug a VCPU, and while it is being created, see pause_all_vcpus()/resume_all_vcpus() getting claled. 
>
I described the reason for this at patch 1.

>If I am not getting this wrong, there seems to be some other mechanism missing that makes sure that this cannot happen. Dropping the BQL half-way through creating a VCPU might be the problem.
>
When we add retry mechanism in pause_all_vcpus(), we can solve this problem. With the sematic unchanged for user, which means:
With bql, we can make sure all vcpus are paused after pause_all_vcpus() finish,  and all vcpus are resumed after resume_all_vcpus() finish.

Thanks,
Keqian

>
>
--
Cheers,

David / dhildenb
David Hildenbrand March 19, 2024, 9:25 a.m. UTC | #3
On 19.03.24 06:11, zhukeqian wrote:
> Hi David,
> 
> On 17.03.24 09:37, Keqian Zhu via wrote:
>>> For vCPU being hotplugged, qemu_init_vcpu() is called. In this
>>> function, we set vcpu state as stopped, and then wait vcpu thread to
>>> be created.
>>>
>>> As the vcpu state is stopped, it will inform us it has been created
>>> and then wait on halt_cond. After we has realized vcpu object, we will
>>> resume the vcpu thread.
>>>
>>> However, during we wait vcpu thread to be created, the bql is
>>> unlocked, and other thread is allowed to call resume_all_vcpus(),
>>> which will resume the un-realized vcpu.
>>>
>>> This fixes the issue by filter out un-realized vcpu during
>>> resume_all_vcpus().
>>
>> Similar question: is there a reproducer?
>>
>> How could we currently hotplug a VCPU, and while it is being created, see pause_all_vcpus()/resume_all_vcpus() getting claled.
>>
> I described the reason for this at patch 1.
> 
>> If I am not getting this wrong, there seems to be some other mechanism missing that makes sure that this cannot happen. Dropping the BQL half-way through creating a VCPU might be the problem.
>>
> When we add retry mechanism in pause_all_vcpus(), we can solve this problem. With the sematic unchanged for user, which means:
> With bql, we can make sure all vcpus are paused after pause_all_vcpus() finish,  and all vcpus are resumed after resume_all_vcpus() finish.

Okay, got it. As just replied to #1, please see if you can avoid messing 
with pause_all_vcpus() by inhibiting KVM IOCTLs like KVM does. That 
would be preferable.
diff mbox series

Patch

diff --git a/system/cpus.c b/system/cpus.c
index 4e41abe23e..8871f5dfa9 100644
--- a/system/cpus.c
+++ b/system/cpus.c
@@ -638,6 +638,9 @@  void resume_all_vcpus(void)
 
     qemu_clock_enable(QEMU_CLOCK_VIRTUAL, true);
     CPU_FOREACH(cpu) {
+        if (!object_property_get_bool(OBJECT(cpu), "realized", &error_abort)) {
+            continue;
+        }
         cpu_resume(cpu);
     }
 }