Message ID | 1492850128-17472-8-git-send-email-zhang.zhanghailiang@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
zhanghailiang <zhang.zhanghailiang@huawei.com> wrote: > We should not load PVM's state directly into SVM, because there maybe some > errors happen when SVM is receving data, which will break SVM. > > We need to ensure receving all data before load the state into SVM. We use > an extra memory to cache these data (PVM's ram). The ram cache in secondary side > is initially the same as SVM/PVM's memory. And in the process of checkpoint, > we cache the dirty pages of PVM into this ram cache firstly, so this ram cache > always the same as PVM's memory at every checkpoint, then we flush this cached ram > to SVM after we receive all PVM's state. > > Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> > Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> > Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> > --- > v2: > - Move colo_init_ram_cache() and colo_release_ram_cache() out of > incoming thread since both of them need the global lock, if we keep > colo_release_ram_cache() in incoming thread, there are potential > dead-lock. > - Remove bool ram_cache_enable flag, use migration_incoming_in_state() instead. > - Remove the Reviewd-by tag because of the above changes. > +out_locked: > + QLIST_FOREACH_RCU(block, &ram_list.blocks, next) { > + if (block->colo_cache) { > + qemu_anon_ram_free(block->colo_cache, block->used_length); > + block->colo_cache = NULL; > + } > + } > + > + rcu_read_unlock(); > + return -errno; > +} > + > +/* It is need to hold the global lock to call this helper */ > +void colo_release_ram_cache(void) > +{ > + RAMBlock *block; > + > + rcu_read_lock(); > + QLIST_FOREACH_RCU(block, &ram_list.blocks, next) { > + if (block->colo_cache) { > + qemu_anon_ram_free(block->colo_cache, block->used_length); > + block->colo_cache = NULL; > + } > + } > + rcu_read_unlock(); > +} Create a function from the creation/removal? We have exactly two copies of the same code. Right now the code inside the function is very small, but it could be bigger, no? Later, Juan.
On 2017/4/25 2:27, Juan Quintela wrote: > zhanghailiang <zhang.zhanghailiang@huawei.com> wrote: >> We should not load PVM's state directly into SVM, because there maybe some >> errors happen when SVM is receving data, which will break SVM. >> >> We need to ensure receving all data before load the state into SVM. We use >> an extra memory to cache these data (PVM's ram). The ram cache in secondary side >> is initially the same as SVM/PVM's memory. And in the process of checkpoint, >> we cache the dirty pages of PVM into this ram cache firstly, so this ram cache >> always the same as PVM's memory at every checkpoint, then we flush this cached ram >> to SVM after we receive all PVM's state. >> >> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> >> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> >> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> >> --- >> v2: >> - Move colo_init_ram_cache() and colo_release_ram_cache() out of >> incoming thread since both of them need the global lock, if we keep >> colo_release_ram_cache() in incoming thread, there are potential >> dead-lock. >> - Remove bool ram_cache_enable flag, use migration_incoming_in_state() instead. >> - Remove the Reviewd-by tag because of the above changes. > >> +out_locked: >> + QLIST_FOREACH_RCU(block, &ram_list.blocks, next) { >> + if (block->colo_cache) { >> + qemu_anon_ram_free(block->colo_cache, block->used_length); >> + block->colo_cache = NULL; >> + } >> + } >> + >> + rcu_read_unlock(); >> + return -errno; >> +} >> + >> +/* It is need to hold the global lock to call this helper */ >> +void colo_release_ram_cache(void) >> +{ >> + RAMBlock *block; >> + >> + rcu_read_lock(); >> + QLIST_FOREACH_RCU(block, &ram_list.blocks, next) { >> + if (block->colo_cache) { >> + qemu_anon_ram_free(block->colo_cache, block->used_length); >> + block->colo_cache = NULL; >> + } >> + } >> + rcu_read_unlock(); >> +} > Create a function from the creation/removal? We have exactly two copies > of the same code. Right now the code inside the function is very small, > but it could be bigger, no? Yes, we add more codes in next patch (patch 08). :) > Later, Juan. > > > . >
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h index c9ddcd0..0b3d77c 100644 --- a/include/exec/ram_addr.h +++ b/include/exec/ram_addr.h @@ -27,6 +27,7 @@ struct RAMBlock { struct rcu_head rcu; struct MemoryRegion *mr; uint8_t *host; + uint8_t *colo_cache; /* For colo, VM's ram cache */ ram_addr_t offset; ram_addr_t used_length; ram_addr_t max_length; diff --git a/include/migration/migration.h b/include/migration/migration.h index ba1a16c..ba765eb 100644 --- a/include/migration/migration.h +++ b/include/migration/migration.h @@ -360,4 +360,8 @@ uint64_t ram_pagesize_summary(void); PostcopyState postcopy_state_get(void); /* Set the state and return the old state */ PostcopyState postcopy_state_set(PostcopyState new_state); + +/* ram cache */ +int colo_init_ram_cache(void); +void colo_release_ram_cache(void); #endif diff --git a/migration/migration.c b/migration/migration.c index 755ea54..7419404 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -441,6 +441,10 @@ static void process_incoming_migration_co(void *opaque) error_report_err(local_err); exit(EXIT_FAILURE); } + if (colo_init_ram_cache() < 0) { + error_report("Init ram cache failed"); + exit(EXIT_FAILURE); + } mis->migration_incoming_co = qemu_coroutine_self(); qemu_thread_create(&mis->colo_incoming_thread, "COLO incoming", colo_process_incoming_thread, mis, QEMU_THREAD_JOINABLE); @@ -449,6 +453,8 @@ static void process_incoming_migration_co(void *opaque) /* Wait checkpoint incoming thread exit before free resource */ qemu_thread_join(&mis->colo_incoming_thread); + /* We hold the global iothread lock, so it is safe here */ + colo_release_ram_cache(); } if (ret < 0) { diff --git a/migration/ram.c b/migration/ram.c index f48664e..05d1b06 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -2265,6 +2265,20 @@ static inline void *host_from_ram_block_offset(RAMBlock *block, return block->host + offset; } +static inline void *colo_cache_from_block_offset(RAMBlock *block, + ram_addr_t offset) +{ + if (!offset_in_ramblock(block, offset)) { + return NULL; + } + if (!block->colo_cache) { + error_report("%s: colo_cache is NULL in block :%s", + __func__, block->idstr); + return NULL; + } + return block->colo_cache + offset; +} + /** * ram_handle_compressed: handle the zero page case * @@ -2605,7 +2619,12 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZRLE)) { RAMBlock *block = ram_block_from_stream(f, flags); - host = host_from_ram_block_offset(block, addr); + /* After going into COLO, we should load the Page into colo_cache */ + if (migration_incoming_in_colo_state()) { + host = colo_cache_from_block_offset(block, addr); + } else { + host = host_from_ram_block_offset(block, addr); + } if (!host) { error_report("Illegal RAM offset " RAM_ADDR_FMT, addr); ret = -EINVAL; @@ -2712,6 +2731,56 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) return ret; } +/* + * colo cache: this is for secondary VM, we cache the whole + * memory of the secondary VM, it is need to hold the global lock + * to call this helper. + */ +int colo_init_ram_cache(void) +{ + RAMBlock *block; + + rcu_read_lock(); + QLIST_FOREACH_RCU(block, &ram_list.blocks, next) { + block->colo_cache = qemu_anon_ram_alloc(block->used_length, NULL); + if (!block->colo_cache) { + error_report("%s: Can't alloc memory for COLO cache of block %s," + "size 0x" RAM_ADDR_FMT, __func__, block->idstr, + block->used_length); + goto out_locked; + } + memcpy(block->colo_cache, block->host, block->used_length); + } + rcu_read_unlock(); + return 0; + +out_locked: + QLIST_FOREACH_RCU(block, &ram_list.blocks, next) { + if (block->colo_cache) { + qemu_anon_ram_free(block->colo_cache, block->used_length); + block->colo_cache = NULL; + } + } + + rcu_read_unlock(); + return -errno; +} + +/* It is need to hold the global lock to call this helper */ +void colo_release_ram_cache(void) +{ + RAMBlock *block; + + rcu_read_lock(); + QLIST_FOREACH_RCU(block, &ram_list.blocks, next) { + if (block->colo_cache) { + qemu_anon_ram_free(block->colo_cache, block->used_length); + block->colo_cache = NULL; + } + } + rcu_read_unlock(); +} + static SaveVMHandlers savevm_ram_handlers = { .save_live_setup = ram_save_setup, .save_live_iterate = ram_save_iterate,