Message ID | 567C9FBD.4000104@cn.fujitsu.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 12/25/2015 09:45 AM, Wen Congyang wrote: > On 12/24/2015 08:36 PM, Andrew Cooper wrote: >> On 24/12/15 02:29, Wen Congyang wrote: >>> Hi Andrew Cooper: >>> >>> I rebase the COLO codes to the newest upstream xen, and test it. I found >>> a problem in the test, and I can reproduce this problem via the migration. >>> >>> How to reproduce: >>> 1. xl cr -p hvm_nopv >>> 2. xl migrate hvm_nopv 192.168.3.1 >> >> You are the very first person to try a usecase like this. >> >> It works as much as it does because of your changes to the uncooperative HVM domain logic. I have said repeatedly during review, this is not necessarily a safe change to make without an in-depth analysis of the knock-on effects; it looks as if you have found the first knock-on effect. >> >>> >>> The migration successes, but the vm doesn't run in the target machine. >>> You can get the reason from 'xl dmesg': >>> (XEN) HVM2 restore: VMCE_VCPU 1 >>> (XEN) HVM2 restore: TSC_ADJUST 0 >>> (XEN) HVM2 restore: TSC_ADJUST 1 >>> (d2) HVM Loader >>> (d2) Detected Xen v4.7-unstable >>> (d2) Get guest memory maps[128] failed. (-38) >>> (d2) *** HVMLoader bug at e820.c:39 >>> (d2) *** HVMLoader crashed. >>> >>> The reason is that: >>> We don't call xc_domain_set_memory_map() in the target machine. >>> When we create a hvm domain: >>> libxl__domain_build() >>> libxl__build_hvm() >>> libxl__arch_domain_construct_memmap() >>> xc_domain_set_memory_map() >>> >>> Should we migrate the guest memory from source machine to target machine? >> >> This bug specifically is because HVMLoader is expected to have run and turned the hypercall information in an E820 table in the guest before a migration occurs. >> >> Unfortunately, the current codebase is riddled with such assumption and expectations (e.g. the HVM save code assumed that FPU context is valid when it is saving register state) which is a direct side effect of how it was developed. >> >> >> Having said all of the above, I agree that your example is a usecase which should work. It is the ultimate test of whether the migration stream contains enough information to faithfully reproduce the domain on the far side. Clearly at the moment, this is not the case. >> >> I have an upcoming project to work on the domain memory layout logic, because it is unsuitable for a number of XenServer usecases. Part of that will require moving it in the migration stream. > > I found another migration problem in the test: > If the migration fails, we will resume it in the source side. > But the hvm guest doesn't response any more. > > In my test envirionment, the migration always successses, so I > use a hack way to reproduce it: > 1. modify the target xen tools: > > diff --git a/tools/libxl/libxl_stream_read.c b/tools/libxl/libxl_stream_read.c > index 258dec4..da95606 100644 > --- a/tools/libxl/libxl_stream_read.c > +++ b/tools/libxl/libxl_stream_read.c > @@ -767,6 +767,8 @@ void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void, > goto err; > } > > + rc = ERROR_FAIL; > + > err: > check_all_finished(egc, stream, rc); > > 2. xl cr hvm_nopv, and wait some time(You can login to the guest) > 3. xl migrate hvm_nopv 192.168.3.1 Another problem: If migration fails after the guest is suspended, we will resume it in the source. In this case, we cannot shutdown it. because no process hanlds the shutdown event. The log in /var/log/xen/xl-hvm_nopv.log: Waiting for domain hvm_nopv (domid 1) to die [pid 5508] Domain 1 has shut down, reason code 2 0x2 Domain has suspended. Done. Exiting now The xl has exited... Thanks Wen Congyang > > The reason it that: > We create a default ioreq server when we get the hvm param HVM_PARAM_IOREQ_PFN. > It means that: the problem occurs only when the migration fails after we get > the hvm param HVM_PARAM_IOREQ_PFN. > > In the function hvm_select_ioreq_server() > If the I/O will be handed by non-default ioreq server, we will return the > non-default ioreq server. In this case, it is handed by qemu. > If the I/O will not be handed by non-default ioreq server, we will return > the default ioreq server. Before migration, we return NULL, and after migration > it is not NULL. > See the caller is hvmemul_do_io(): > case X86EMUL_UNHANDLEABLE: > { > struct hvm_ioreq_server *s = > hvm_select_ioreq_server(curr->domain, &p); > > /* If there is no suitable backing DM, just ignore accesses */ > if ( !s ) > { > rc = hvm_process_io_intercept(&null_handler, &p); > vio->io_req.state = STATE_IOREQ_NONE; > } > else > { > rc = hvm_send_ioreq(s, &p, 0); > if ( rc != X86EMUL_RETRY || curr->domain->is_shutting_down ) > vio->io_req.state = STATE_IOREQ_NONE; > else if ( data_is_addr ) > rc = X86EMUL_OKAY; > } > break; > > We send the I/O request to the default I/O request server, but no backing > DM hands it. We wil wait the I/O forever...... > > Thanks > Wen Congyang > >> >> ~Andrew >> >> >> . >> > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel > . >
On 25/12/2015 01:45, Wen Congyang wrote: > On 12/24/2015 08:36 PM, Andrew Cooper wrote: >> On 24/12/15 02:29, Wen Congyang wrote: >>> Hi Andrew Cooper: >>> >>> I rebase the COLO codes to the newest upstream xen, and test it. I found >>> a problem in the test, and I can reproduce this problem via the migration. >>> >>> How to reproduce: >>> 1. xl cr -p hvm_nopv >>> 2. xl migrate hvm_nopv 192.168.3.1 >> You are the very first person to try a usecase like this. >> >> It works as much as it does because of your changes to the uncooperative HVM domain logic. I have said repeatedly during review, this is not necessarily a safe change to make without an in-depth analysis of the knock-on effects; it looks as if you have found the first knock-on effect. >> >>> The migration successes, but the vm doesn't run in the target machine. >>> You can get the reason from 'xl dmesg': >>> (XEN) HVM2 restore: VMCE_VCPU 1 >>> (XEN) HVM2 restore: TSC_ADJUST 0 >>> (XEN) HVM2 restore: TSC_ADJUST 1 >>> (d2) HVM Loader >>> (d2) Detected Xen v4.7-unstable >>> (d2) Get guest memory maps[128] failed. (-38) >>> (d2) *** HVMLoader bug at e820.c:39 >>> (d2) *** HVMLoader crashed. >>> >>> The reason is that: >>> We don't call xc_domain_set_memory_map() in the target machine. >>> When we create a hvm domain: >>> libxl__domain_build() >>> libxl__build_hvm() >>> libxl__arch_domain_construct_memmap() >>> xc_domain_set_memory_map() >>> >>> Should we migrate the guest memory from source machine to target machine? >> This bug specifically is because HVMLoader is expected to have run and turned the hypercall information in an E820 table in the guest before a migration occurs. >> >> Unfortunately, the current codebase is riddled with such assumption and expectations (e.g. the HVM save code assumed that FPU context is valid when it is saving register state) which is a direct side effect of how it was developed. >> >> >> Having said all of the above, I agree that your example is a usecase which should work. It is the ultimate test of whether the migration stream contains enough information to faithfully reproduce the domain on the far side. Clearly at the moment, this is not the case. >> >> I have an upcoming project to work on the domain memory layout logic, because it is unsuitable for a number of XenServer usecases. Part of that will require moving it in the migration stream. > I found another migration problem in the test: > If the migration fails, we will resume it in the source side. > But the hvm guest doesn't response any more. > > In my test envirionment, the migration always successses, so I "succeeds" > use a hack way to reproduce it: > 1. modify the target xen tools: > > diff --git a/tools/libxl/libxl_stream_read.c b/tools/libxl/libxl_stream_read.c > index 258dec4..da95606 100644 > --- a/tools/libxl/libxl_stream_read.c > +++ b/tools/libxl/libxl_stream_read.c > @@ -767,6 +767,8 @@ void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void, > goto err; > } > > + rc = ERROR_FAIL; > + > err: > check_all_finished(egc, stream, rc); > > 2. xl cr hvm_nopv, and wait some time(You can login to the guest) > 3. xl migrate hvm_nopv 192.168.3.1 > > The reason it that: > We create a default ioreq server when we get the hvm param HVM_PARAM_IOREQ_PFN. > It means that: the problem occurs only when the migration fails after we get > the hvm param HVM_PARAM_IOREQ_PFN. > > In the function hvm_select_ioreq_server() > If the I/O will be handed by non-default ioreq server, we will return the > non-default ioreq server. In this case, it is handed by qemu. > If the I/O will not be handed by non-default ioreq server, we will return > the default ioreq server. Before migration, we return NULL, and after migration > it is not NULL. > See the caller is hvmemul_do_io(): > case X86EMUL_UNHANDLEABLE: > { > struct hvm_ioreq_server *s = > hvm_select_ioreq_server(curr->domain, &p); > > /* If there is no suitable backing DM, just ignore accesses */ > if ( !s ) > { > rc = hvm_process_io_intercept(&null_handler, &p); > vio->io_req.state = STATE_IOREQ_NONE; > } > else > { > rc = hvm_send_ioreq(s, &p, 0); > if ( rc != X86EMUL_RETRY || curr->domain->is_shutting_down ) > vio->io_req.state = STATE_IOREQ_NONE; > else if ( data_is_addr ) > rc = X86EMUL_OKAY; > } > break; > > We send the I/O request to the default I/O request server, but no backing > DM hands it. We wil wait the I/O forever...... Hmm yes. This needs fixing. CC'ing Paul who did the ioreq server work. This bug is caused by the read side effects of HVM_PARAM_IOREQ_PFN. The migration code needs a way of being able to query whether a default ioreq server exists, without creating one. Can you remember what the justification for the read side effects were? ISTR that it was only for qemu compatibility until the ioreq server work got in upstream. If that was the case, can we drop the read side effects now and mandate that all qemus explicitly create their ioreq servers (even if this involves creating a default ioreq server for qemu-trad)? ~Andrew
On 25/12/2015 03:06, Wen Congyang wrote: > On 12/25/2015 09:45 AM, Wen Congyang wrote: >> On 12/24/2015 08:36 PM, Andrew Cooper wrote: >>> On 24/12/15 02:29, Wen Congyang wrote: >>>> Hi Andrew Cooper: >>>> >>>> I rebase the COLO codes to the newest upstream xen, and test it. I found >>>> a problem in the test, and I can reproduce this problem via the migration. >>>> >>>> How to reproduce: >>>> 1. xl cr -p hvm_nopv >>>> 2. xl migrate hvm_nopv 192.168.3.1 >>> You are the very first person to try a usecase like this. >>> >>> It works as much as it does because of your changes to the uncooperative HVM domain logic. I have said repeatedly during review, this is not necessarily a safe change to make without an in-depth analysis of the knock-on effects; it looks as if you have found the first knock-on effect. >>> >>>> The migration successes, but the vm doesn't run in the target machine. >>>> You can get the reason from 'xl dmesg': >>>> (XEN) HVM2 restore: VMCE_VCPU 1 >>>> (XEN) HVM2 restore: TSC_ADJUST 0 >>>> (XEN) HVM2 restore: TSC_ADJUST 1 >>>> (d2) HVM Loader >>>> (d2) Detected Xen v4.7-unstable >>>> (d2) Get guest memory maps[128] failed. (-38) >>>> (d2) *** HVMLoader bug at e820.c:39 >>>> (d2) *** HVMLoader crashed. >>>> >>>> The reason is that: >>>> We don't call xc_domain_set_memory_map() in the target machine. >>>> When we create a hvm domain: >>>> libxl__domain_build() >>>> libxl__build_hvm() >>>> libxl__arch_domain_construct_memmap() >>>> xc_domain_set_memory_map() >>>> >>>> Should we migrate the guest memory from source machine to target machine? >>> This bug specifically is because HVMLoader is expected to have run and turned the hypercall information in an E820 table in the guest before a migration occurs. >>> >>> Unfortunately, the current codebase is riddled with such assumption and expectations (e.g. the HVM save code assumed that FPU context is valid when it is saving register state) which is a direct side effect of how it was developed. >>> >>> >>> Having said all of the above, I agree that your example is a usecase which should work. It is the ultimate test of whether the migration stream contains enough information to faithfully reproduce the domain on the far side. Clearly at the moment, this is not the case. >>> >>> I have an upcoming project to work on the domain memory layout logic, because it is unsuitable for a number of XenServer usecases. Part of that will require moving it in the migration stream. >> I found another migration problem in the test: >> If the migration fails, we will resume it in the source side. >> But the hvm guest doesn't response any more. >> >> In my test envirionment, the migration always successses, so I >> use a hack way to reproduce it: >> 1. modify the target xen tools: >> >> diff --git a/tools/libxl/libxl_stream_read.c b/tools/libxl/libxl_stream_read.c >> index 258dec4..da95606 100644 >> --- a/tools/libxl/libxl_stream_read.c >> +++ b/tools/libxl/libxl_stream_read.c >> @@ -767,6 +767,8 @@ void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void, >> goto err; >> } >> >> + rc = ERROR_FAIL; >> + >> err: >> check_all_finished(egc, stream, rc); >> >> 2. xl cr hvm_nopv, and wait some time(You can login to the guest) >> 3. xl migrate hvm_nopv 192.168.3.1 > Another problem: > If migration fails after the guest is suspended, we will resume it in the source. > In this case, we cannot shutdown it. because no process hanlds the shutdown event. > The log in /var/log/xen/xl-hvm_nopv.log: > Waiting for domain hvm_nopv (domid 1) to die [pid 5508] > Domain 1 has shut down, reason code 2 0x2 > Domain has suspended. > Done. Exiting now > > The xl has exited... > > Thanks > Wen Congyang Hmm yes. This is a libxl bug in libxl_evenable_domain_death(). CC'ing the toolstack maintainers. It waits for the @releasedomain watch, but doesn't interpret the results correctly. In particular, if it can still make successful hypercalls with the provided domid, that domain was not the subject of @releasedomain. (I also observe that domain_death_xswatch_callback() is very inefficient. It only needs to make a single hypercall, not query the entire state of all domains.) ~Andrew
> -----Original Message----- [snip] > > We send the I/O request to the default I/O request server, but no backing > > DM hands it. We wil wait the I/O forever...... > > Hmm yes. This needs fixing. > > CC'ing Paul who did the ioreq server work. > > This bug is caused by the read side effects of HVM_PARAM_IOREQ_PFN. The > migration code needs a way of being able to query whether a default > ioreq server exists, without creating one. > > Can you remember what the justification for the read side effects were? > ISTR that it was only for qemu compatibility until the ioreq server work > got in upstream. If that was the case, can we drop the read side > effects now and mandate that all qemus explicitly create their ioreq > servers (even if this involves creating a default ioreq server for > qemu-trad)? > Yes, you're correct that the read side-effect was the only way of keeping compatibility with existing QEMU at the time. It would be nice to remove that hackery but it would indeed require a patch to qemu-trad and would clearly break compatibility with distro qemu packages built prior to my modification. An alternative solution to avoid breaking compatibility (albeit a bit icky) would be to turn off the side effects when HVMOP_create_ioreq_server is handled. There should be no case where a non-default server needs to be created in advance of a default server. Paul > ~Andrew
On 04/01/16 10:28, Paul Durrant wrote: >> -----Original Message----- > [snip] >>> We send the I/O request to the default I/O request server, but no backing >>> DM hands it. We wil wait the I/O forever...... >> Hmm yes. This needs fixing. >> >> CC'ing Paul who did the ioreq server work. >> >> This bug is caused by the read side effects of HVM_PARAM_IOREQ_PFN. The >> migration code needs a way of being able to query whether a default >> ioreq server exists, without creating one. >> >> Can you remember what the justification for the read side effects were? >> ISTR that it was only for qemu compatibility until the ioreq server work >> got in upstream. If that was the case, can we drop the read side >> effects now and mandate that all qemus explicitly create their ioreq >> servers (even if this involves creating a default ioreq server for >> qemu-trad)? >> > Yes, you're correct that the read side-effect was the only way of keeping compatibility with existing QEMU at the time. It would be nice to remove that hackery but it would indeed require a patch to qemu-trad and would clearly break compatibility with distro qemu packages built prior to my modification. > An alternative solution to avoid breaking compatibility (albeit a bit icky) would be to turn off the side effects when HVMOP_create_ioreq_server is handled. There should be no case where a non-default server needs to be created in advance of a default server. I was under the impression that qemu-trad (like libxc) was expected to exactly match. There is deliberately no facility to use a distro packaged qemu-trad in the Xen build system. CC'ing toolstack maintainers for their input. If this is indeed the case, then the former option is the better course of action. ~Andrew
> -----Original Message----- > From: Andrew Cooper [mailto:andrew.cooper3@citrix.com] > Sent: 04 January 2016 10:36 > To: Paul Durrant; Wen Congyang > Cc: xen devel; Ian Campbell; Ian Jackson; Wei Liu > Subject: Re: [Xen-devel] question about migration > > On 04/01/16 10:28, Paul Durrant wrote: > >> -----Original Message----- > > [snip] > >>> We send the I/O request to the default I/O request server, but no > backing > >>> DM hands it. We wil wait the I/O forever...... > >> Hmm yes. This needs fixing. > >> > >> CC'ing Paul who did the ioreq server work. > >> > >> This bug is caused by the read side effects of HVM_PARAM_IOREQ_PFN. > The > >> migration code needs a way of being able to query whether a default > >> ioreq server exists, without creating one. > >> > >> Can you remember what the justification for the read side effects were? > >> ISTR that it was only for qemu compatibility until the ioreq server work > >> got in upstream. If that was the case, can we drop the read side > >> effects now and mandate that all qemus explicitly create their ioreq > >> servers (even if this involves creating a default ioreq server for > >> qemu-trad)? > >> > > Yes, you're correct that the read side-effect was the only way of keeping > compatibility with existing QEMU at the time. It would be nice to remove that > hackery but it would indeed require a patch to qemu-trad and would clearly > break compatibility with distro qemu packages built prior to my modification. > > An alternative solution to avoid breaking compatibility (albeit a bit icky) > would be to turn off the side effects when HVMOP_create_ioreq_server is > handled. There should be no case where a non-default server needs to be > created in advance of a default server. > > I was under the impression that qemu-trad (like libxc) was expected to > exactly match. There is deliberately no facility to use a distro > packaged qemu-trad in the Xen build system. CC'ing toolstack > maintainers for their input. > It wasn't clear but I meant that compatibility with *upstream* QEMU builds prior to my patch would be broken. It depends whether we care about this or not. Paul > If this is indeed the case, then the former option is the better course > of action. > > ~Andrew
diff --git a/tools/libxl/libxl_stream_read.c b/tools/libxl/libxl_stream_read.c index 258dec4..da95606 100644 --- a/tools/libxl/libxl_stream_read.c +++ b/tools/libxl/libxl_stream_read.c @@ -767,6 +767,8 @@ void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void, goto err; } + rc = ERROR_FAIL; + err: check_all_finished(egc, stream, rc);