diff mbox

[2/2] migration: send and check the devices between source and distination at the begining

Message ID 1475147192-31228-2-git-send-email-lizhijian@cn.fujitsu.com (mailing list archive)
State New, archived
Headers show

Commit Message

Li Zhijian Sept. 29, 2016, 11:06 a.m. UTC
Priviously, if the source and distination have different devices, source could goto
the status "paused (postmigrate)", and the distination will exit that means no qemu
is alive.

After this patch, at above case, source can dectect the some error early from distination
and stop the migration, source keep in status "running".

Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
 migration/savevm.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

Comments

Amit Shah Sept. 30, 2016, 6:15 a.m. UTC | #1
Hi,

On (Thu) 29 Sep 2016 [19:06:32], Li Zhijian wrote:
> Priviously, if the source and distination have different devices, source could goto
> the status "paused (postmigrate)", and the distination will exit that means no qemu
> is alive.
> 
> After this patch, at above case, source can dectect the some error early from distination
> and stop the migration, source keep in status "running".

How would incoming migrations from previous versions work?


		Amit
Li Zhijian Sept. 30, 2016, 7:53 a.m. UTC | #2
On 09/30/2016 02:15 PM, Amit Shah wrote:
> Hi,
>
> On (Thu) 29 Sep 2016 [19:06:32], Li Zhijian wrote:
>> Priviously, if the source and distination have different devices, source could goto
>> the status "paused (postmigrate)", and the distination will exit that means no qemu
>> is alive.
>>
>> After this patch, at above case, source can dectect the some error early from distination
>> and stop the migration, source keep in status "running".
>
> How would incoming migrations from previous versions work?

You are right. we need to consider more.

How about that:
we need to introduce a new section type(e.g: QEMU_VM_SECTION_DEVICE_LIST).

source side:
- at the beginning of qemu_savevm_state_begin(), send QEMU_VM_SECTION_DEVICE_LIST first
- original path

dst side:
- if we got the QEMU_VM_SECTION_DEVICE_LIST, have a check with the devices(name,version)
- otherwise original path

Please correct me.

Thanks
Zhijian

>
>
> 		Amit
>
>
>
Dr. David Alan Gilbert Sept. 30, 2016, 9:58 a.m. UTC | #3
* Li Zhijian (lizhijian@cn.fujitsu.com) wrote:
> 
> 
> On 09/30/2016 02:15 PM, Amit Shah wrote:
> > Hi,
> > 
> > On (Thu) 29 Sep 2016 [19:06:32], Li Zhijian wrote:
> > > Priviously, if the source and distination have different devices, source could goto
> > > the status "paused (postmigrate)", and the distination will exit that means no qemu
> > > is alive.
> > > 
> > > After this patch, at above case, source can dectect the some error early from distination
> > > and stop the migration, source keep in status "running".
> > 
> > How would incoming migrations from previous versions work?
> 
> You are right. we need to consider more.
> 
> How about that:
> we need to introduce a new section type(e.g: QEMU_VM_SECTION_DEVICE_LIST).
> 
> source side:
> - at the beginning of qemu_savevm_state_begin(), send QEMU_VM_SECTION_DEVICE_LIST first
> - original path
> 
> dst side:
> - if we got the QEMU_VM_SECTION_DEVICE_LIST, have a check with the devices(name,version)
> - otherwise original path

Yes, and only send it on new machine types.
I think a list could be a good idea; I don't worry too much about the 'paused (postmigrate)'
case, because libvirt can spot that the destination failed and then restart the source,
however a list would also fix the opposite case; where the destination has an extra device
that the source does not have, in most cases I think that doesn't cause a failure at the moment
but the device has an unusual state.

A QEMU_VM_SECTION_DEVICE_LIST would work, but perhaps a subsection of vmstate_globalstate
would work?

Dave

> 
> Please correct me.
> 
> Thanks
> Zhijian
> 
> > 
> > 
> > 		Amit
> > 
> > 
> > 
> 
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Li Zhijian Oct. 6, 2016, 2:51 a.m. UTC | #4
On 09/30/2016 05:58 PM, Dr. David Alan Gilbert wrote:
> * Li Zhijian (lizhijian@cn.fujitsu.com) wrote:
>>
>>
>> On 09/30/2016 02:15 PM, Amit Shah wrote:
>>> Hi,
>>>
>>> On (Thu) 29 Sep 2016 [19:06:32], Li Zhijian wrote:
>>>> Priviously, if the source and distination have different devices, source could goto
>>>> the status "paused (postmigrate)", and the distination will exit that means no qemu
>>>> is alive.
>>>>
>>>> After this patch, at above case, source can dectect the some error early from distination
>>>> and stop the migration, source keep in status "running".
>>>
>>> How would incoming migrations from previous versions work?
>>
>> You are right. we need to consider more.
>>
>> How about that:
>> we need to introduce a new section type(e.g: QEMU_VM_SECTION_DEVICE_LIST).
>>
>> source side:
>> - at the beginning of qemu_savevm_state_begin(), send QEMU_VM_SECTION_DEVICE_LIST first
>> - original path
>>
>> dst side:
>> - if we got the QEMU_VM_SECTION_DEVICE_LIST, have a check with the devices(name,version)
>> - otherwise original path
>
> Yes, and only send it on new machine types.
> I think a list could be a good idea; I don't worry too much about the 'paused (postmigrate)'
> case, because libvirt can spot that the destination failed and then restart the source,
Oh, that's what I didn't know before. Thank for your information.


> however a list would also fix the opposite case; where the destination has an extra device
> that the source does not have, in most cases I think that doesn't cause a failure at the moment
What's our expected behavior? I think in most cases(where the destination has an extra device
that the source does not have), destination seems fine after migration.

If we expect migration failure, it needs more check.


> but the device has an unusual state.
>
> A QEMU_VM_SECTION_DEVICE_LIST would work, but perhaps a subsection of vmstate_globalstate
> would work?

Currently, the vmstate_globalstate was saved/restored at the migration completion stage while I expect
the beginning of migration.

And i cook the V2, with a new section type called SECTION_HEADER, any comment is welcomed.

Thanks

>
> Dave
>
>>
>> Please correct me.
>>
>> Thanks
>> Zhijian
>>
>>>
>>>
>>> 		Amit
>>>
>>>
>>>
>>
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
>
> .
>
Dr. David Alan Gilbert Oct. 6, 2016, 12:17 p.m. UTC | #5
* Li Zhijian (lizhijian@cn.fujitsu.com) wrote:
> 
> 
> On 09/30/2016 05:58 PM, Dr. David Alan Gilbert wrote:
> > * Li Zhijian (lizhijian@cn.fujitsu.com) wrote:
> > > 
> > > 
> > > On 09/30/2016 02:15 PM, Amit Shah wrote:
> > > > Hi,
> > > > 
> > > > On (Thu) 29 Sep 2016 [19:06:32], Li Zhijian wrote:
> > > > > Priviously, if the source and distination have different devices, source could goto
> > > > > the status "paused (postmigrate)", and the distination will exit that means no qemu
> > > > > is alive.
> > > > > 
> > > > > After this patch, at above case, source can dectect the some error early from distination
> > > > > and stop the migration, source keep in status "running".
> > > > 
> > > > How would incoming migrations from previous versions work?
> > > 
> > > You are right. we need to consider more.
> > > 
> > > How about that:
> > > we need to introduce a new section type(e.g: QEMU_VM_SECTION_DEVICE_LIST).
> > > 
> > > source side:
> > > - at the beginning of qemu_savevm_state_begin(), send QEMU_VM_SECTION_DEVICE_LIST first
> > > - original path
> > > 
> > > dst side:
> > > - if we got the QEMU_VM_SECTION_DEVICE_LIST, have a check with the devices(name,version)
> > > - otherwise original path
> > 
> > Yes, and only send it on new machine types.
> > I think a list could be a good idea; I don't worry too much about the 'paused (postmigrate)'
> > case, because libvirt can spot that the destination failed and then restart the source,
> Oh, that's what I didn't know before. Thank for your information.
> 
> 
> > however a list would also fix the opposite case; where the destination has an extra device
> > that the source does not have, in most cases I think that doesn't cause a failure at the moment
> What's our expected behavior? I think in most cases(where the destination has an extra device
> that the source does not have), destination seems fine after migration.
> 
> If we expect migration failure, it needs more check.

I think it's a check we could add that would detect more misconfigurations.
  (you'd have to be careful about devices that just didn't need to send any migration state;
   and make sure it didn't break existing migrations).

> > but the device has an unusual state.
> > 
> > A QEMU_VM_SECTION_DEVICE_LIST would work, but perhaps a subsection of vmstate_globalstate
> > would work?
> 
> Currently, the vmstate_globalstate was saved/restored at the migration completion stage while I expect
> the beginning of migration.

Ah sorry, I was confused; it's vmstate_configuration (see migration/savevm.c) that's saved
at the beginning.

Dave

> And i cook the V2, with a new section type called SECTION_HEADER, any comment is welcomed.
> 
> Thanks
> 
> > 
> > Dave
> > 
> > > 
> > > Please correct me.
> > > 
> > > Thanks
> > > Zhijian
> > > 
> > > > 
> > > > 
> > > > 		Amit
> > > > 
> > > > 
> > > > 
> > > 
> > > 
> > > 
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > 
> > 
> > .
> > 
> 
> -- 
> Best regards.
> Li Zhijian (8555)
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
diff mbox

Patch

diff --git a/migration/savevm.c b/migration/savevm.c
index 09d8e99..0c07671 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -887,6 +887,7 @@  void qemu_savevm_state_begin(QEMUFile *f,
 
     trace_savevm_state_begin();
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+        save_section_header(f, se, QEMU_VM_SECTION_START);
         if (!se->ops || !se->ops->set_params) {
             continue;
         }
@@ -902,7 +903,7 @@  void qemu_savevm_state_begin(QEMUFile *f,
                 continue;
             }
         }
-        save_section_header(f, se, QEMU_VM_SECTION_START);
+        save_section_header(f, se, QEMU_VM_SECTION_FULL);
 
         ret = se->ops->save_live_setup(f, se->opaque);
         save_section_footer(f, se);
@@ -1775,7 +1776,7 @@  qemu_loadvm_section_start(QEMUFile *f, MigrationIncomingState *mis,
 }
 
 static int
-qemu_loadvm_section_full(QEMUFile *f, MigrationIncomingState *mis)
+qemu_loadvm_section_start_full(QEMUFile *f, MigrationIncomingState *mis)
 {
     uint32_t version_id, section_id;
     SaveStateEntry *se;
@@ -1846,14 +1847,21 @@  static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
 {
     uint8_t section_type;
     int ret;
+    SaveStateEntry *se;
+    uint32_t instance, version;
 
     while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
 
         trace_qemu_loadvm_state_section(section_type);
         switch (section_type) {
         case QEMU_VM_SECTION_START:
+            ret = qemu_loadvm_section_start(f, mis, &se, &instance, &version);
+            if (ret < 0) {
+                return ret;
+            }
+            break;
         case QEMU_VM_SECTION_FULL:
-            ret = qemu_loadvm_section_full(f, mis);
+            ret = qemu_loadvm_section_start_full(f, mis);
             if (ret < 0) {
                 return ret;
             }