Message ID | 1451442548-26974-13-git-send-email-wency@cn.fujitsu.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Dec 30, 2015 at 10:29:02AM +0800, Wen Congyang wrote: > In COLO mode, both VMs are running, and are considered in sync if the > visible network traffic is identical. After some time, they fall out of > sync. > > At this point, the two VMs have definitely diverged. Lets call the > primary dirty bitmap set A, while the secondary dirty bitmap set B. > > Sets A and B are different. > > Under normal migration, the page data for set A will be sent form the s/form/from/ > primary to the secondary. > > However, the set difference B - A (lets call this C) is out-of-date on > the secondary (with respect to the primary) and will not be sent by the > primary, as it was not memory dirtied by the primary. The secondary s/primary/primary (to secondary)/ > needs the page data for C to reconstruct an exact copy of the primary at s/the page data/C page data/ > the checkpoint. > > The secondary cannot calculate C as it doesn't know A. Instead, the > secondary must send B to the primary, at which point the primary > calculates the union of A and B (lets call this D) which is all the > pages dirtied by both the primary and the secondary, and sends all page > data covered by D. You could invert this - the primary could send A to secondary? I presume this non-optimal as the 'A' set is much much bigger than 'C' set? It may be good to include this in the commit description. > > In the general case, D is a superset of both A and B. Without the > backchannel dirty bitmap, a COLO checkpoint can't reconstruct a valid > copy of the primary. > > We transfer the dirty bitmap on libxc side, so we need to introduce back > channel to libxc. > > Note: it is different from the paper. We change the original design to > the current one, according to our following concerns: > 1. The original design needs extra memory on Secondary host. When there's > multiple backups on one host, the memory cost is high. > 2. The memory cache code will be another 1k+, it will make the review > more time consuming. Well, that 2) is a very good reason :-) > > Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn> > commit message: ? Huh? > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> > CC: Ian Campbell <Ian.Campbell@citrix.com> > CC: Ian Jackson <Ian.Jackson@eu.citrix.com> > CC: Wei Liu <wei.liu2@citrix.com> .. snip.. > index 05159bb..d4dc501 100644 > --- a/tools/libxc/xc_sr_restore.c > +++ b/tools/libxc/xc_sr_restore.c > @@ -722,7 +722,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, > unsigned long *console_gfn, domid_t console_domid, > unsigned int hvm, unsigned int pae, int superpages, > int checkpointed_stream, > - struct restore_callbacks *callbacks) > + struct restore_callbacks *callbacks, int back_fd) > { > struct xc_sr_context ctx = > { > diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c > index 8ffd71d..a49d083 100644 > --- a/tools/libxc/xc_sr_save.c > +++ b/tools/libxc/xc_sr_save.c > @@ -824,7 +824,7 @@ static int save(struct xc_sr_context *ctx, uint16_t guest_type) > int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, > uint32_t max_iters, uint32_t max_factor, uint32_t flags, > struct save_callbacks* callbacks, int hvm, > - int checkpointed_stream) > + int checkpointed_stream, int back_fd) > { > struct xc_sr_context ctx = > { But where is the code? Or is that suppose to be done in another patch? If so you may want to mention that in the commit description?
On 01/26/2016 03:41 AM, Konrad Rzeszutek Wilk wrote: > On Wed, Dec 30, 2015 at 10:29:02AM +0800, Wen Congyang wrote: >> In COLO mode, both VMs are running, and are considered in sync if the >> visible network traffic is identical. After some time, they fall out of >> sync. >> >> At this point, the two VMs have definitely diverged. Lets call the >> primary dirty bitmap set A, while the secondary dirty bitmap set B. >> >> Sets A and B are different. >> >> Under normal migration, the page data for set A will be sent form the > > s/form/from/ > >> primary to the secondary. >> >> However, the set difference B - A (lets call this C) is out-of-date on >> the secondary (with respect to the primary) and will not be sent by the >> primary, as it was not memory dirtied by the primary. The secondary > > s/primary/primary (to secondary)/ > >> needs the page data for C to reconstruct an exact copy of the primary at > > s/the page data/C page data/ > >> the checkpoint. >> >> The secondary cannot calculate C as it doesn't know A. Instead, the >> secondary must send B to the primary, at which point the primary >> calculates the union of A and B (lets call this D) which is all the >> pages dirtied by both the primary and the secondary, and sends all page >> data covered by D. > > You could invert this - the primary could send A to secondary? I presume > this non-optimal as the 'A' set is much much bigger than 'C' set? 'C' set is the one in 'B' set but not in 'A' set. > > It may be good to include this in the commit description. > >> >> In the general case, D is a superset of both A and B. Without the >> backchannel dirty bitmap, a COLO checkpoint can't reconstruct a valid >> copy of the primary. >> >> We transfer the dirty bitmap on libxc side, so we need to introduce back >> channel to libxc. > >> >> Note: it is different from the paper. We change the original design to >> the current one, according to our following concerns: >> 1. The original design needs extra memory on Secondary host. When there's >> multiple backups on one host, the memory cost is high. >> 2. The memory cache code will be another 1k+, it will make the review >> more time consuming. > > Well, that 2) is a very good reason :-) >> >> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn> >> commit message: > > ? Huh? I don't know what it is. Will remove it in the next version. > >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> >> CC: Ian Campbell <Ian.Campbell@citrix.com> >> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> >> CC: Wei Liu <wei.liu2@citrix.com> > > .. snip.. >> index 05159bb..d4dc501 100644 >> --- a/tools/libxc/xc_sr_restore.c >> +++ b/tools/libxc/xc_sr_restore.c >> @@ -722,7 +722,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, >> unsigned long *console_gfn, domid_t console_domid, >> unsigned int hvm, unsigned int pae, int superpages, >> int checkpointed_stream, >> - struct restore_callbacks *callbacks) >> + struct restore_callbacks *callbacks, int back_fd) >> { >> struct xc_sr_context ctx = >> { >> diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c >> index 8ffd71d..a49d083 100644 >> --- a/tools/libxc/xc_sr_save.c >> +++ b/tools/libxc/xc_sr_save.c >> @@ -824,7 +824,7 @@ static int save(struct xc_sr_context *ctx, uint16_t guest_type) >> int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, >> uint32_t max_iters, uint32_t max_factor, uint32_t flags, >> struct save_callbacks* callbacks, int hvm, >> - int checkpointed_stream) >> + int checkpointed_stream, int back_fd) >> { >> struct xc_sr_context ctx = >> { > > > But where is the code? > > Or is that suppose to be done in another patch? If so you may want to > mention that in the commit description? Do you mean where is the code that uses back_fd? It is in another series: http://lists.xenproject.org/archives/html/xen-devel/2015-12/msg02904.html Thanks Wen Congyang > > > > . >
> > Or is that suppose to be done in another patch? If so you may want to > > mention that in the commit description? > > Do you mean where is the code that uses back_fd? It is in another series: > http://lists.xenproject.org/archives/html/xen-devel/2015-12/msg02904.html Ah right that big patchset one. Hadn't looked at that yet - it is a bit hard without having a git tree on which the foundation patches (this patch series) are applied so I can look at the contents of the functions.
On 01/26/2016 10:29 PM, Konrad Rzeszutek Wilk wrote: >>> Or is that suppose to be done in another patch? If so you may want to >>> mention that in the commit description? >> >> Do you mean where is the code that uses back_fd? It is in another series: >> http://lists.xenproject.org/archives/html/xen-devel/2015-12/msg02904.html > > Ah right that big patchset one. Hadn't looked at that yet - it is a bit hard > without having a git tree on which the foundation patches (this patch > series) are applied so I can look at the contents of the functions. Yes, I will provide a git tree to help review. Thanks Wen Congyang > > > > > . >
diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h index e8bc46d..bd133af 100644 --- a/tools/libxc/include/xenguest.h +++ b/tools/libxc/include/xenguest.h @@ -82,7 +82,7 @@ struct save_callbacks { int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iters, uint32_t max_factor, uint32_t flags /* XCFLAGS_xxx */, struct save_callbacks* callbacks, int hvm, - int checkpointed_stream); + int checkpointed_stream, int back_fd); /* callbacks provided by xc_domain_restore */ struct restore_callbacks { @@ -121,7 +121,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, unsigned long *console_mfn, domid_t console_domid, unsigned int hvm, unsigned int pae, int superpages, int checkpointed_stream, - struct restore_callbacks *callbacks); + struct restore_callbacks *callbacks, int back_fd); /** * This function will create a domain for a paravirtualized Linux diff --git a/tools/libxc/xc_nomigrate.c b/tools/libxc/xc_nomigrate.c index c9124df..089f767 100644 --- a/tools/libxc/xc_nomigrate.c +++ b/tools/libxc/xc_nomigrate.c @@ -23,7 +23,7 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iters, uint32_t max_factor, uint32_t flags, struct save_callbacks* callbacks, int hvm, - int checkpointed_stream) + int checkpointed_stream, int back_fd) { errno = ENOSYS; return -1; @@ -35,7 +35,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, unsigned long *console_mfn, domid_t console_domid, unsigned int hvm, unsigned int pae, int superpages, int checkpointed_stream, - struct restore_callbacks *callbacks) + struct restore_callbacks *callbacks, int back_fd) { errno = ENOSYS; return -1; diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c index 05159bb..d4dc501 100644 --- a/tools/libxc/xc_sr_restore.c +++ b/tools/libxc/xc_sr_restore.c @@ -722,7 +722,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, unsigned long *console_gfn, domid_t console_domid, unsigned int hvm, unsigned int pae, int superpages, int checkpointed_stream, - struct restore_callbacks *callbacks) + struct restore_callbacks *callbacks, int back_fd) { struct xc_sr_context ctx = { diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c index 8ffd71d..a49d083 100644 --- a/tools/libxc/xc_sr_save.c +++ b/tools/libxc/xc_sr_save.c @@ -824,7 +824,7 @@ static int save(struct xc_sr_context *ctx, uint16_t guest_type) int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iters, uint32_t max_factor, uint32_t flags, struct save_callbacks* callbacks, int hvm, - int checkpointed_stream) + int checkpointed_stream, int back_fd) { struct xc_sr_context ctx = { diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c index 416b318..631e3e2 100644 --- a/tools/libxl/libxl_save_callout.c +++ b/tools/libxl/libxl_save_callout.c @@ -27,7 +27,7 @@ */ static void run_helper(libxl__egc *egc, libxl__save_helper_state *shs, const char *mode_arg, - int stream_fd, + int stream_fd, int back_fd, const int *preserve_fds, int num_preserve_fds, const unsigned long *argnums, int num_argnums); @@ -50,6 +50,7 @@ void libxl__xc_domain_restore(libxl__egc *egc, libxl__domain_create_state *dcs, /* Convenience aliases */ const uint32_t domid = dcs->guest_domid; const int restore_fd = dcs->libxc_fd; + const int send_fd = dcs->send_fd; libxl__domain_build_state *const state = &dcs->build_state; unsigned cbflags = @@ -71,7 +72,7 @@ void libxl__xc_domain_restore(libxl__egc *egc, libxl__domain_create_state *dcs, shs->caller_state = dcs; shs->need_results = 1; - run_helper(egc, shs, "--restore-domain", restore_fd, 0, 0, + run_helper(egc, shs, "--restore-domain", restore_fd, send_fd, 0, 0, argnums, ARRAY_SIZE(argnums)); } @@ -95,7 +96,7 @@ void libxl__xc_domain_save(libxl__egc *egc, libxl__domain_save_state *dss, shs->caller_state = dss; shs->need_results = 0; - run_helper(egc, shs, "--save-domain", dss->fd, + run_helper(egc, shs, "--save-domain", dss->fd, dss->recv_fd, NULL, 0, argnums, ARRAY_SIZE(argnums)); return; @@ -118,14 +119,29 @@ void libxl__save_helper_init(libxl__save_helper_state *shs) } /*----- helper execution -----*/ +static int dup_fd_helper(libxl__gc *gc, int fd, const char *what) +{ + int dup_fd = fd; + + if (fd <= 2) { + dup_fd = dup(fd); + if (dup_fd < 0) { + LOGE(ERROR,"dup %s", what); + exit(-1); + } + } + libxl_fd_set_cloexec(CTX, dup_fd, 0); + + return dup_fd; +} static void run_helper(libxl__egc *egc, libxl__save_helper_state *shs, - const char *mode_arg, int stream_fd, + const char *mode_arg, int stream_fd, int back_fd, const int *preserve_fds, int num_preserve_fds, const unsigned long *argnums, int num_argnums) { STATE_AO_GC(shs->ao); - const char *args[4 + num_argnums]; + const char *args[5 + num_argnums]; const char **arg = args; int i, rc; @@ -153,6 +169,7 @@ static void run_helper(libxl__egc *egc, libxl__save_helper_state *shs, *arg++ = getenv("LIBXL_SAVE_HELPER") ?: LIBEXEC_BIN "/" "libxl-save-helper"; *arg++ = mode_arg; const char **stream_fd_arg = arg++; + const char **back_fd_arg = arg++; for (i=0; i<num_argnums; i++) *arg++ = GCSPRINTF("%lu", argnums[i]); *arg++ = 0; @@ -177,16 +194,12 @@ static void run_helper(libxl__egc *egc, libxl__save_helper_state *shs, pid_t pid = libxl__ev_child_fork(gc, &shs->child, helper_exited); if (!pid) { - if (stream_fd <= 2) { - stream_fd = dup(stream_fd); - if (stream_fd < 0) { - LOGE(ERROR,"dup migration stream fd"); - exit(-1); - } - } - libxl_fd_set_cloexec(CTX, stream_fd, 0); + stream_fd = dup_fd_helper(gc, stream_fd, "migration stream fd"); *stream_fd_arg = GCSPRINTF("%d", stream_fd); + back_fd = dup_fd_helper(gc, back_fd, "migration back channel fd"); + *back_fd_arg = GCSPRINTF("%d", back_fd); + for (i=0; i<num_preserve_fds; i++) if (preserve_fds[i] >= 0) { assert(preserve_fds[i] > 2); diff --git a/tools/libxl/libxl_save_helper.c b/tools/libxl/libxl_save_helper.c index 6bdcf13..9bdcf41 100644 --- a/tools/libxl/libxl_save_helper.c +++ b/tools/libxl/libxl_save_helper.c @@ -238,6 +238,7 @@ static struct restore_callbacks helper_restore_callbacks; int main(int argc, char **argv) { int r; + int back_fd; #define NEXTARG (++argv, assert(*argv), *argv) @@ -247,6 +248,7 @@ int main(int argc, char **argv) if (!strcmp(mode,"--save-domain")) { io_fd = atoi(NEXTARG); + back_fd = atoi(NEXTARG); uint32_t dom = strtoul(NEXTARG,0,10); uint32_t max_iters = strtoul(NEXTARG,0,10); uint32_t max_factor = strtoul(NEXTARG,0,10); @@ -262,12 +264,14 @@ int main(int argc, char **argv) setup_signals(save_signal_handler); r = xc_domain_save(xch, io_fd, dom, max_iters, max_factor, flags, - &helper_save_callbacks, hvm, checkpointed_stream); + &helper_save_callbacks, hvm, checkpointed_stream, + back_fd); complete(r); } else if (!strcmp(mode,"--restore-domain")) { io_fd = atoi(NEXTARG); + back_fd = atoi(NEXTARG); uint32_t dom = strtoul(NEXTARG,0,10); unsigned store_evtchn = strtoul(NEXTARG,0,10); domid_t store_domid = strtoul(NEXTARG,0,10); @@ -292,7 +296,7 @@ int main(int argc, char **argv) store_domid, console_evtchn, &console_mfn, console_domid, hvm, pae, superpages, checkpointed, - &helper_restore_callbacks); + &helper_restore_callbacks, back_fd); helper_stub_restore_results(store_mfn,console_mfn,0); complete(r);