[V2,4/8] COLO: Optimize memory back-up process

Message ID	20200224065414.36524-5-zhang.zhanghailiang@huawei.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=a4nG=4M=nongnu.org=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7312C20675 From: zhanghailiang <zhang.zhanghailiang@huawei.com> To: <qemu-devel@nongnu.org> Subject: [PATCH V2 4/8] COLO: Optimize memory back-up process Date: Mon, 24 Feb 2020 14:54:10 +0800 Message-ID: <20200224065414.36524-5-zhang.zhanghailiang@huawei.com> In-Reply-To: <20200224065414.36524-1-zhang.zhanghailiang@huawei.com> References: <20200224065414.36524-1-zhang.zhanghailiang@huawei.com> MIME-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: quoted-printable Precedence: list Cc: danielcho@qnap.com, zhanghailiang <zhang.zhanghailiang@huawei.com>, dgilbert@redhat.com, quintela@redhat.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>
Series	Optimize VM's downtime while do checkpoint in COLO \| expand [V2,0/8] Optimize VM's downtime while do checkpoint in COLO [V2,1/8] migration: fix COLO broken caused by a previous commit [V2,2/8] migration/colo: wrap incoming checkpoint process into new helper [V2,3/8] savevm: Don't call colo_init_ram_cache twice [V2,4/8] COLO: Optimize memory back-up process [V2,6/8] migration: recognize COLO as part of activating process [V2,7/8] COLO: Migrate dirty pages during the gap of checkpointing [V2,8/8] migration/colo: Only flush ram cache while do checkpoint

Message ID

20200224065414.36524-5-zhang.zhanghailiang@huawei.com (mailing list archive)

State

New, archived

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7312C20675
From: zhanghailiang <zhang.zhanghailiang@huawei.com>
To: <qemu-devel@nongnu.org>
Subject: [PATCH V2 4/8] COLO: Optimize memory back-up process
Date: Mon, 24 Feb 2020 14:54:10 +0800
Message-ID: <20200224065414.36524-5-zhang.zhanghailiang@huawei.com>
In-Reply-To: <20200224065414.36524-1-zhang.zhanghailiang@huawei.com>
References: <20200224065414.36524-1-zhang.zhanghailiang@huawei.com>
MIME-Version: 1.0
Content-Type: text/plain
Content-Transfer-Encoding: quoted-printable
Precedence: list
Cc: danielcho@qnap.com, zhanghailiang <zhang.zhanghailiang@huawei.com>,
 dgilbert@redhat.com, quintela@redhat.com
Errors-To: 
 qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org
Sender: "Qemu-devel"
 <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>

Series

Optimize VM's downtime while do checkpoint in COLO | expand

Commit Message

Zhanghailiang Feb. 24, 2020, 6:54 a.m. UTC

This patch will reduce the downtime of VM for the initial process,
Privously, we copied all these memory in preparing stage of COLO
while we need to stop VM, which is a time-consuming process.
Here we optimize it by a trick, back-up every page while in migration
process while COLO is enabled, though it affects the speed of the
migration, but it obviously reduce the downtime of back-up all SVM'S
memory in COLO preparing stage.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 migration/colo.c |  3 +++
 migration/ram.c  | 68 +++++++++++++++++++++++++++++++++++-------------
 migration/ram.h  |  1 +
 3 files changed, 54 insertions(+), 18 deletions(-)

Comments

Daniel Cho Feb. 25, 2020, 2:52 a.m. UTC | #1

Hi Hailiang,

With version 2, the code in migration/ram.c

+            if (migration_incoming_colo_enabled()) {
+                if (migration_incoming_in_colo_state()) {
+                    /* In COLO stage, put all pages into cache temporarily */
+                    host = colo_cache_from_block_offset(block, addr);
+                } else {
+                   /*
+                    * In migration stage but before COLO stage,
+                    * Put all pages into both cache and SVM's memory.
+                    */
+                    host_bak = colo_cache_from_block_offset(block, addr);
+                }
             }
             if (!host) {
                 error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
                 ret = -EINVAL;
                 break;
             }

host = colo_cache_from_block_offset(block, addr);
host_bak = colo_cache_from_block_offset(block, addr);
Does it cause the "if(!host)" will go break if the condition goes
"host_bak = colo_cache_from_block_offset(block, addr);" ?

Best regards,
Daniel Cho

zhanghailiang <zhang.zhanghailiang@huawei.com> 於 2020年2月24日 週一 下午2:55寫道：
>
> This patch will reduce the downtime of VM for the initial process,
> Privously, we copied all these memory in preparing stage of COLO
> while we need to stop VM, which is a time-consuming process.
> Here we optimize it by a trick, back-up every page while in migration
> process while COLO is enabled, though it affects the speed of the
> migration, but it obviously reduce the downtime of back-up all SVM'S
> memory in COLO preparing stage.
>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> ---
>  migration/colo.c |  3 +++
>  migration/ram.c  | 68 +++++++++++++++++++++++++++++++++++-------------
>  migration/ram.h  |  1 +
>  3 files changed, 54 insertions(+), 18 deletions(-)
>
> diff --git a/migration/colo.c b/migration/colo.c
> index 93c5a452fb..44942c4e23 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -26,6 +26,7 @@
>  #include "qemu/main-loop.h"
>  #include "qemu/rcu.h"
>  #include "migration/failover.h"
> +#include "migration/ram.h"
>  #ifdef CONFIG_REPLICATION
>  #include "replication.h"
>  #endif
> @@ -845,6 +846,8 @@ void *colo_process_incoming_thread(void *opaque)
>       */
>      qemu_file_set_blocking(mis->from_src_file, true);
>
> +    colo_incoming_start_dirty_log();
> +
>      bioc = qio_channel_buffer_new(COLO_BUFFER_BASE_SIZE);
>      fb = qemu_fopen_channel_input(QIO_CHANNEL(bioc));
>      object_unref(OBJECT(bioc));
> diff --git a/migration/ram.c b/migration/ram.c
> index ed23ed1c7c..ebf9e6ba51 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -2277,6 +2277,7 @@ static void ram_list_init_bitmaps(void)
>               * dirty_memory[DIRTY_MEMORY_MIGRATION] don't include the whole
>               * guest memory.
>               */
> +
>              block->bmap = bitmap_new(pages);
>              bitmap_set(block->bmap, 0, pages);
>              block->clear_bmap_shift = shift;
> @@ -2986,7 +2987,6 @@ int colo_init_ram_cache(void)
>                  }
>                  return -errno;
>              }
> -            memcpy(block->colo_cache, block->host, block->used_length);
>          }
>      }
>
> @@ -3000,19 +3000,36 @@ int colo_init_ram_cache(void)
>
>          RAMBLOCK_FOREACH_NOT_IGNORED(block) {
>              unsigned long pages = block->max_length >> TARGET_PAGE_BITS;
> -
>              block->bmap = bitmap_new(pages);
> -            bitmap_set(block->bmap, 0, pages);
>          }
>      }
> -    ram_state = g_new0(RAMState, 1);
> -    ram_state->migration_dirty_pages = 0;
> -    qemu_mutex_init(&ram_state->bitmap_mutex);
> -    memory_global_dirty_log_start();
>
> +    ram_state_init(&ram_state);
>      return 0;
>  }
>
> +/* TODO: duplicated with ram_init_bitmaps */
> +void colo_incoming_start_dirty_log(void)
> +{
> +    RAMBlock *block = NULL;
> +    /* For memory_global_dirty_log_start below. */
> +    qemu_mutex_lock_iothread();
> +    qemu_mutex_lock_ramlist();
> +
> +    memory_global_dirty_log_sync();
> +    WITH_RCU_READ_LOCK_GUARD() {
> +        RAMBLOCK_FOREACH_NOT_IGNORED(block) {
> +            ramblock_sync_dirty_bitmap(ram_state, block);
> +            /* Discard this dirty bitmap record */
> +            bitmap_zero(block->bmap, block->max_length >> TARGET_PAGE_BITS);
> +        }
> +        memory_global_dirty_log_start();
> +    }
> +    ram_state->migration_dirty_pages = 0;
> +    qemu_mutex_unlock_ramlist();
> +    qemu_mutex_unlock_iothread();
> +}
> +
>  /* It is need to hold the global lock to call this helper */
>  void colo_release_ram_cache(void)
>  {
> @@ -3032,9 +3049,7 @@ void colo_release_ram_cache(void)
>              }
>          }
>      }
> -    qemu_mutex_destroy(&ram_state->bitmap_mutex);
> -    g_free(ram_state);
> -    ram_state = NULL;
> +    ram_state_cleanup(&ram_state);
>  }
>
>  /**
> @@ -3302,7 +3317,6 @@ static void colo_flush_ram_cache(void)
>              ramblock_sync_dirty_bitmap(ram_state, block);
>          }
>      }
> -
>      trace_colo_flush_ram_cache_begin(ram_state->migration_dirty_pages);
>      WITH_RCU_READ_LOCK_GUARD() {
>          block = QLIST_FIRST_RCU(&ram_list.blocks);
> @@ -3348,7 +3362,7 @@ static int ram_load_precopy(QEMUFile *f)
>
>      while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
>          ram_addr_t addr, total_ram_bytes;
> -        void *host = NULL;
> +        void *host = NULL, *host_bak = NULL;
>          uint8_t ch;
>
>          /*
> @@ -3379,20 +3393,35 @@ static int ram_load_precopy(QEMUFile *f)
>                       RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZRLE)) {
>              RAMBlock *block = ram_block_from_stream(f, flags);
>
> +            host = host_from_ram_block_offset(block, addr);
>              /*
> -             * After going into COLO, we should load the Page into colo_cache.
> +             * After going into COLO stage, we should not load the page
> +             * into SVM's memory diretly, we put them into colo_cache firstly.
> +             * NOTE: We need to keep a copy of SVM's ram in colo_cache.
> +             * Privously, we copied all these memory in preparing stage of COLO
> +             * while we need to stop VM, which is a time-consuming process.
> +             * Here we optimize it by a trick, back-up every page while in
> +             * migration process while COLO is enabled, though it affects the
> +             * speed of the migration, but it obviously reduce the downtime of
> +             * back-up all SVM'S memory in COLO preparing stage.
>               */
> -            if (migration_incoming_in_colo_state()) {
> -                host = colo_cache_from_block_offset(block, addr);
> -            } else {
> -                host = host_from_ram_block_offset(block, addr);
> +            if (migration_incoming_colo_enabled()) {
> +                if (migration_incoming_in_colo_state()) {
> +                    /* In COLO stage, put all pages into cache temporarily */
> +                    host = colo_cache_from_block_offset(block, addr);
> +                } else {
> +                   /*
> +                    * In migration stage but before COLO stage,
> +                    * Put all pages into both cache and SVM's memory.
> +                    */
> +                    host_bak = colo_cache_from_block_offset(block, addr);
> +                }
>              }
>              if (!host) {
>                  error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
>                  ret = -EINVAL;
>                  break;
>              }
> -
>              if (!migration_incoming_in_colo_state()) {
>                  ramblock_recv_bitmap_set(block, host);
>              }
> @@ -3506,6 +3535,9 @@ static int ram_load_precopy(QEMUFile *f)
>          if (!ret) {
>              ret = qemu_file_get_error(f);
>          }
> +        if (!ret && host_bak) {
> +            memcpy(host_bak, host, TARGET_PAGE_SIZE);
> +        }
>      }
>
>      ret |= wait_for_decompress_done();
> diff --git a/migration/ram.h b/migration/ram.h
> index a553d40751..5ceaff7cb4 100644
> --- a/migration/ram.h
> +++ b/migration/ram.h
> @@ -66,5 +66,6 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb);
>  /* ram cache */
>  int colo_init_ram_cache(void);
>  void colo_release_ram_cache(void);
> +void colo_incoming_start_dirty_log(void);
>
>  #endif
> --
> 2.21.0
>
>

Zhanghailiang Feb. 25, 2020, 3:56 a.m. UTC | #2

Hi,


> -----Original Message-----
> From: Daniel Cho [mailto:danielcho@qnap.com]
> Sent: Tuesday, February 25, 2020 10:53 AM
> To: Zhanghailiang <zhang.zhanghailiang@huawei.com>
> Cc: qemu-devel@nongnu.org; quintela@redhat.com; Dr. David Alan Gilbert
> <dgilbert@redhat.com>
> Subject: Re: [PATCH V2 4/8] COLO: Optimize memory back-up process
> 
> Hi Hailiang,
> 
> With version 2, the code in migration/ram.c
> 
> +            if (migration_incoming_colo_enabled()) {
> +                if (migration_incoming_in_colo_state()) {
> +                    /* In COLO stage, put all pages into cache
> temporarily */
> +                    host = colo_cache_from_block_offset(block, addr);
> +                } else {
> +                   /*
> +                    * In migration stage but before COLO stage,
> +                    * Put all pages into both cache and SVM's memory.
> +                    */
> +                    host_bak = colo_cache_from_block_offset(block,
> addr);
> +                }
>              }
>              if (!host) {
>                  error_report("Illegal RAM offset " RAM_ADDR_FMT,
> addr);
>                  ret = -EINVAL;
>                  break;
>              }
> 
> host = colo_cache_from_block_offset(block, addr);
> host_bak = colo_cache_from_block_offset(block, addr);
> Does it cause the "if(!host)" will go break if the condition goes
> "host_bak = colo_cache_from_block_offset(block, addr);" ?
> 

That will not happen, you may have missed this parts.

@@ -3379,20 +3393,35 @@ static int ram_load_precopy(QEMUFile *f)
                      RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZRLE)) {
             RAMBlock *block = ram_block_from_stream(f, flags);
 
+            host = host_from_ram_block_offset(block, addr);
             /*

We have given host a value unconditionally.


> Best regards,
> Daniel Cho
> 
> zhanghailiang <zhang.zhanghailiang@huawei.com> 於 2020年2月24日 週
> 一 下午2:55寫道：
> >
> > This patch will reduce the downtime of VM for the initial process,
> > Privously, we copied all these memory in preparing stage of COLO
> > while we need to stop VM, which is a time-consuming process.
> > Here we optimize it by a trick, back-up every page while in migration
> > process while COLO is enabled, though it affects the speed of the
> > migration, but it obviously reduce the downtime of back-up all SVM'S
> > memory in COLO preparing stage.
> >
> > Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> > ---
> >  migration/colo.c |  3 +++
> >  migration/ram.c  | 68
> +++++++++++++++++++++++++++++++++++-------------
> >  migration/ram.h  |  1 +
> >  3 files changed, 54 insertions(+), 18 deletions(-)
> >
> > diff --git a/migration/colo.c b/migration/colo.c
> > index 93c5a452fb..44942c4e23 100644
> > --- a/migration/colo.c
> > +++ b/migration/colo.c
> > @@ -26,6 +26,7 @@
> >  #include "qemu/main-loop.h"
> >  #include "qemu/rcu.h"
> >  #include "migration/failover.h"
> > +#include "migration/ram.h"
> >  #ifdef CONFIG_REPLICATION
> >  #include "replication.h"
> >  #endif
> > @@ -845,6 +846,8 @@ void *colo_process_incoming_thread(void
> *opaque)
> >       */
> >      qemu_file_set_blocking(mis->from_src_file, true);
> >
> > +    colo_incoming_start_dirty_log();
> > +
> >      bioc = qio_channel_buffer_new(COLO_BUFFER_BASE_SIZE);
> >      fb = qemu_fopen_channel_input(QIO_CHANNEL(bioc));
> >      object_unref(OBJECT(bioc));
> > diff --git a/migration/ram.c b/migration/ram.c
> > index ed23ed1c7c..ebf9e6ba51 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -2277,6 +2277,7 @@ static void ram_list_init_bitmaps(void)
> >               * dirty_memory[DIRTY_MEMORY_MIGRATION] don't
> include the whole
> >               * guest memory.
> >               */
> > +
> >              block->bmap = bitmap_new(pages);
> >              bitmap_set(block->bmap, 0, pages);
> >              block->clear_bmap_shift = shift;
> > @@ -2986,7 +2987,6 @@ int colo_init_ram_cache(void)
> >                  }
> >                  return -errno;
> >              }
> > -            memcpy(block->colo_cache, block->host,
> block->used_length);
> >          }
> >      }
> >
> > @@ -3000,19 +3000,36 @@ int colo_init_ram_cache(void)
> >
> >          RAMBLOCK_FOREACH_NOT_IGNORED(block) {
> >              unsigned long pages = block->max_length >>
> TARGET_PAGE_BITS;
> > -
> >              block->bmap = bitmap_new(pages);
> > -            bitmap_set(block->bmap, 0, pages);
> >          }
> >      }
> > -    ram_state = g_new0(RAMState, 1);
> > -    ram_state->migration_dirty_pages = 0;
> > -    qemu_mutex_init(&ram_state->bitmap_mutex);
> > -    memory_global_dirty_log_start();
> >
> > +    ram_state_init(&ram_state);
> >      return 0;
> >  }
> >
> > +/* TODO: duplicated with ram_init_bitmaps */
> > +void colo_incoming_start_dirty_log(void)
> > +{
> > +    RAMBlock *block = NULL;
> > +    /* For memory_global_dirty_log_start below. */
> > +    qemu_mutex_lock_iothread();
> > +    qemu_mutex_lock_ramlist();
> > +
> > +    memory_global_dirty_log_sync();
> > +    WITH_RCU_READ_LOCK_GUARD() {
> > +        RAMBLOCK_FOREACH_NOT_IGNORED(block) {
> > +            ramblock_sync_dirty_bitmap(ram_state, block);
> > +            /* Discard this dirty bitmap record */
> > +            bitmap_zero(block->bmap, block->max_length >>
> TARGET_PAGE_BITS);
> > +        }
> > +        memory_global_dirty_log_start();
> > +    }
> > +    ram_state->migration_dirty_pages = 0;
> > +    qemu_mutex_unlock_ramlist();
> > +    qemu_mutex_unlock_iothread();
> > +}
> > +
> >  /* It is need to hold the global lock to call this helper */
> >  void colo_release_ram_cache(void)
> >  {
> > @@ -3032,9 +3049,7 @@ void colo_release_ram_cache(void)
> >              }
> >          }
> >      }
> > -    qemu_mutex_destroy(&ram_state->bitmap_mutex);
> > -    g_free(ram_state);
> > -    ram_state = NULL;
> > +    ram_state_cleanup(&ram_state);
> >  }
> >
> >  /**
> > @@ -3302,7 +3317,6 @@ static void colo_flush_ram_cache(void)
> >              ramblock_sync_dirty_bitmap(ram_state, block);
> >          }
> >      }
> > -
> >
> trace_colo_flush_ram_cache_begin(ram_state->migration_dirty_pages);
> >      WITH_RCU_READ_LOCK_GUARD() {
> >          block = QLIST_FIRST_RCU(&ram_list.blocks);
> > @@ -3348,7 +3362,7 @@ static int ram_load_precopy(QEMUFile *f)
> >
> >      while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
> >          ram_addr_t addr, total_ram_bytes;
> > -        void *host = NULL;
> > +        void *host = NULL, *host_bak = NULL;
> >          uint8_t ch;
> >
> >          /*
> > @@ -3379,20 +3393,35 @@ static int ram_load_precopy(QEMUFile *f)
> >                       RAM_SAVE_FLAG_COMPRESS_PAGE |
> RAM_SAVE_FLAG_XBZRLE)) {
> >              RAMBlock *block = ram_block_from_stream(f, flags);
> >
> > +            host = host_from_ram_block_offset(block, addr);
> >              /*
> > -             * After going into COLO, we should load the Page into
> colo_cache.
> > +             * After going into COLO stage, we should not load the page
> > +             * into SVM's memory diretly, we put them into colo_cache
> firstly.
> > +             * NOTE: We need to keep a copy of SVM's ram in
> colo_cache.
> > +             * Privously, we copied all these memory in preparing stage
> of COLO
> > +             * while we need to stop VM, which is a time-consuming
> process.
> > +             * Here we optimize it by a trick, back-up every page while
> in
> > +             * migration process while COLO is enabled, though it
> affects the
> > +             * speed of the migration, but it obviously reduce the
> downtime of
> > +             * back-up all SVM'S memory in COLO preparing stage.
> >               */
> > -            if (migration_incoming_in_colo_state()) {
> > -                host = colo_cache_from_block_offset(block, addr);
> > -            } else {
> > -                host = host_from_ram_block_offset(block, addr);
> > +            if (migration_incoming_colo_enabled()) {
> > +                if (migration_incoming_in_colo_state()) {
> > +                    /* In COLO stage, put all pages into cache
> temporarily */
> > +                    host = colo_cache_from_block_offset(block, addr);
> > +                } else {
> > +                   /*
> > +                    * In migration stage but before COLO stage,
> > +                    * Put all pages into both cache and SVM's
> memory.
> > +                    */
> > +                    host_bak = colo_cache_from_block_offset(block,
> addr);
> > +                }
> >              }
> >              if (!host) {
> >                  error_report("Illegal RAM offset " RAM_ADDR_FMT,
> addr);
> >                  ret = -EINVAL;
> >                  break;
> >              }
> > -
> >              if (!migration_incoming_in_colo_state()) {
> >                  ramblock_recv_bitmap_set(block, host);
> >              }
> > @@ -3506,6 +3535,9 @@ static int ram_load_precopy(QEMUFile *f)
> >          if (!ret) {
> >              ret = qemu_file_get_error(f);
> >          }
> > +        if (!ret && host_bak) {
> > +            memcpy(host_bak, host, TARGET_PAGE_SIZE);
> > +        }
> >      }
> >
> >      ret |= wait_for_decompress_done();
> > diff --git a/migration/ram.h b/migration/ram.h
> > index a553d40751..5ceaff7cb4 100644
> > --- a/migration/ram.h
> > +++ b/migration/ram.h
> > @@ -66,5 +66,6 @@ int ram_dirty_bitmap_reload(MigrationState *s,
> RAMBlock *rb);
> >  /* ram cache */
> >  int colo_init_ram_cache(void);
> >  void colo_release_ram_cache(void);
> > +void colo_incoming_start_dirty_log(void);
> >
> >  #endif
> > --
> > 2.21.0
> >
> >

Dr. David Alan Gilbert March 12, 2020, 6:44 p.m. UTC | #3

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> This patch will reduce the downtime of VM for the initial process,
> Privously, we copied all these memory in preparing stage of COLO
> while we need to stop VM, which is a time-consuming process.
> Here we optimize it by a trick, back-up every page while in migration
> process while COLO is enabled, though it affects the speed of the
> migration, but it obviously reduce the downtime of back-up all SVM'S
> memory in COLO preparing stage.
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

I'll queue this as well; I'm going to clean up some minor things:

> ---
>  migration/colo.c |  3 +++
>  migration/ram.c  | 68 +++++++++++++++++++++++++++++++++++-------------
>  migration/ram.h  |  1 +
>  3 files changed, 54 insertions(+), 18 deletions(-)
> 
> diff --git a/migration/colo.c b/migration/colo.c
> index 93c5a452fb..44942c4e23 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -26,6 +26,7 @@
>  #include "qemu/main-loop.h"
>  #include "qemu/rcu.h"
>  #include "migration/failover.h"
> +#include "migration/ram.h"
>  #ifdef CONFIG_REPLICATION
>  #include "replication.h"
>  #endif
> @@ -845,6 +846,8 @@ void *colo_process_incoming_thread(void *opaque)
>       */
>      qemu_file_set_blocking(mis->from_src_file, true);
>  
> +    colo_incoming_start_dirty_log();
> +
>      bioc = qio_channel_buffer_new(COLO_BUFFER_BASE_SIZE);
>      fb = qemu_fopen_channel_input(QIO_CHANNEL(bioc));
>      object_unref(OBJECT(bioc));
> diff --git a/migration/ram.c b/migration/ram.c
> index ed23ed1c7c..ebf9e6ba51 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -2277,6 +2277,7 @@ static void ram_list_init_bitmaps(void)
>               * dirty_memory[DIRTY_MEMORY_MIGRATION] don't include the whole
>               * guest memory.
>               */
> +

That change is nice, but shouldn't really be here.

>              block->bmap = bitmap_new(pages);
>              bitmap_set(block->bmap, 0, pages);
>              block->clear_bmap_shift = shift;
> @@ -2986,7 +2987,6 @@ int colo_init_ram_cache(void)
>                  }
>                  return -errno;
>              }
> -            memcpy(block->colo_cache, block->host, block->used_length);
>          }
>      }
>  
> @@ -3000,19 +3000,36 @@ int colo_init_ram_cache(void)
>  
>          RAMBLOCK_FOREACH_NOT_IGNORED(block) {
>              unsigned long pages = block->max_length >> TARGET_PAGE_BITS;
> -
>              block->bmap = bitmap_new(pages);
> -            bitmap_set(block->bmap, 0, pages);
>          }
>      }
> -    ram_state = g_new0(RAMState, 1);
> -    ram_state->migration_dirty_pages = 0;
> -    qemu_mutex_init(&ram_state->bitmap_mutex);
> -    memory_global_dirty_log_start();
>  
> +    ram_state_init(&ram_state);
>      return 0;
>  }
>  
> +/* TODO: duplicated with ram_init_bitmaps */
> +void colo_incoming_start_dirty_log(void)
> +{
> +    RAMBlock *block = NULL;
> +    /* For memory_global_dirty_log_start below. */
> +    qemu_mutex_lock_iothread();
> +    qemu_mutex_lock_ramlist();
> +
> +    memory_global_dirty_log_sync();
> +    WITH_RCU_READ_LOCK_GUARD() {
> +        RAMBLOCK_FOREACH_NOT_IGNORED(block) {
> +            ramblock_sync_dirty_bitmap(ram_state, block);
> +            /* Discard this dirty bitmap record */
> +            bitmap_zero(block->bmap, block->max_length >> TARGET_PAGE_BITS);
> +        }
> +        memory_global_dirty_log_start();
> +    }
> +    ram_state->migration_dirty_pages = 0;
> +    qemu_mutex_unlock_ramlist();
> +    qemu_mutex_unlock_iothread();
> +}
> +
>  /* It is need to hold the global lock to call this helper */
>  void colo_release_ram_cache(void)
>  {
> @@ -3032,9 +3049,7 @@ void colo_release_ram_cache(void)
>              }
>          }
>      }
> -    qemu_mutex_destroy(&ram_state->bitmap_mutex);
> -    g_free(ram_state);
> -    ram_state = NULL;
> +    ram_state_cleanup(&ram_state);
>  }
>  
>  /**
> @@ -3302,7 +3317,6 @@ static void colo_flush_ram_cache(void)
>              ramblock_sync_dirty_bitmap(ram_state, block);
>          }
>      }
> -

I'll remove that

>      trace_colo_flush_ram_cache_begin(ram_state->migration_dirty_pages);
>      WITH_RCU_READ_LOCK_GUARD() {
>          block = QLIST_FIRST_RCU(&ram_list.blocks);
> @@ -3348,7 +3362,7 @@ static int ram_load_precopy(QEMUFile *f)
>  
>      while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
>          ram_addr_t addr, total_ram_bytes;
> -        void *host = NULL;
> +        void *host = NULL, *host_bak = NULL;
>          uint8_t ch;
>  
>          /*
> @@ -3379,20 +3393,35 @@ static int ram_load_precopy(QEMUFile *f)
>                       RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZRLE)) {
>              RAMBlock *block = ram_block_from_stream(f, flags);
>  
> +            host = host_from_ram_block_offset(block, addr);
>              /*
> -             * After going into COLO, we should load the Page into colo_cache.
> +             * After going into COLO stage, we should not load the page
> +             * into SVM's memory diretly, we put them into colo_cache firstly.
                                        ^ typo - c
> +             * NOTE: We need to keep a copy of SVM's ram in colo_cache.
> +             * Privously, we copied all these memory in preparing stage of COLO
                    ^ typo - e

> +             * while we need to stop VM, which is a time-consuming process.
> +             * Here we optimize it by a trick, back-up every page while in
> +             * migration process while COLO is enabled, though it affects the
> +             * speed of the migration, but it obviously reduce the downtime of
> +             * back-up all SVM'S memory in COLO preparing stage.
>               */
> -            if (migration_incoming_in_colo_state()) {
> -                host = colo_cache_from_block_offset(block, addr);
> -            } else {
> -                host = host_from_ram_block_offset(block, addr);
> +            if (migration_incoming_colo_enabled()) {
> +                if (migration_incoming_in_colo_state()) {
> +                    /* In COLO stage, put all pages into cache temporarily */
> +                    host = colo_cache_from_block_offset(block, addr);
> +                } else {
> +                   /*
> +                    * In migration stage but before COLO stage,
> +                    * Put all pages into both cache and SVM's memory.
> +                    */
> +                    host_bak = colo_cache_from_block_offset(block, addr);
> +                }
>              }
>              if (!host) {
>                  error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
>                  ret = -EINVAL;
>                  break;
>              }
> -
>              if (!migration_incoming_in_colo_state()) {
>                  ramblock_recv_bitmap_set(block, host);
>              }
> @@ -3506,6 +3535,9 @@ static int ram_load_precopy(QEMUFile *f)
>          if (!ret) {
>              ret = qemu_file_get_error(f);
>          }
> +        if (!ret && host_bak) {
> +            memcpy(host_bak, host, TARGET_PAGE_SIZE);
> +        }
>      }
>  
>      ret |= wait_for_decompress_done();
> diff --git a/migration/ram.h b/migration/ram.h
> index a553d40751..5ceaff7cb4 100644
> --- a/migration/ram.h
> +++ b/migration/ram.h
> @@ -66,5 +66,6 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb);
>  /* ram cache */
>  int colo_init_ram_cache(void);
>  void colo_release_ram_cache(void);
> +void colo_incoming_start_dirty_log(void);
>  
>  #endif
> -- 
> 2.21.0
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

diff --git a/migration/colo.c b/migration/colo.c
index 93c5a452fb..44942c4e23 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -26,6 +26,7 @@ 
 #include "qemu/main-loop.h"
 #include "qemu/rcu.h"
 #include "migration/failover.h"
+#include "migration/ram.h"
 #ifdef CONFIG_REPLICATION
 #include "replication.h"
 #endif
@@ -845,6 +846,8 @@  void *colo_process_incoming_thread(void *opaque)
      */
     qemu_file_set_blocking(mis->from_src_file, true);
 
+    colo_incoming_start_dirty_log();
+
     bioc = qio_channel_buffer_new(COLO_BUFFER_BASE_SIZE);
     fb = qemu_fopen_channel_input(QIO_CHANNEL(bioc));
     object_unref(OBJECT(bioc));
diff --git a/migration/ram.c b/migration/ram.c
index ed23ed1c7c..ebf9e6ba51 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2277,6 +2277,7 @@  static void ram_list_init_bitmaps(void)
              * dirty_memory[DIRTY_MEMORY_MIGRATION] don't include the whole
              * guest memory.
              */
+
             block->bmap = bitmap_new(pages);
             bitmap_set(block->bmap, 0, pages);
             block->clear_bmap_shift = shift;
@@ -2986,7 +2987,6 @@  int colo_init_ram_cache(void)
                 }
                 return -errno;
             }
-            memcpy(block->colo_cache, block->host, block->used_length);
         }
     }
 
@@ -3000,19 +3000,36 @@  int colo_init_ram_cache(void)
 
         RAMBLOCK_FOREACH_NOT_IGNORED(block) {
             unsigned long pages = block->max_length >> TARGET_PAGE_BITS;
-
             block->bmap = bitmap_new(pages);
-            bitmap_set(block->bmap, 0, pages);
         }
     }
-    ram_state = g_new0(RAMState, 1);
-    ram_state->migration_dirty_pages = 0;
-    qemu_mutex_init(&ram_state->bitmap_mutex);
-    memory_global_dirty_log_start();
 
+    ram_state_init(&ram_state);
     return 0;
 }
 
+/* TODO: duplicated with ram_init_bitmaps */
+void colo_incoming_start_dirty_log(void)
+{
+    RAMBlock *block = NULL;
+    /* For memory_global_dirty_log_start below. */
+    qemu_mutex_lock_iothread();
+    qemu_mutex_lock_ramlist();
+
+    memory_global_dirty_log_sync();
+    WITH_RCU_READ_LOCK_GUARD() {
+        RAMBLOCK_FOREACH_NOT_IGNORED(block) {
+            ramblock_sync_dirty_bitmap(ram_state, block);
+            /* Discard this dirty bitmap record */
+            bitmap_zero(block->bmap, block->max_length >> TARGET_PAGE_BITS);
+        }
+        memory_global_dirty_log_start();
+    }
+    ram_state->migration_dirty_pages = 0;
+    qemu_mutex_unlock_ramlist();
+    qemu_mutex_unlock_iothread();
+}
+
 /* It is need to hold the global lock to call this helper */
 void colo_release_ram_cache(void)
 {
@@ -3032,9 +3049,7 @@  void colo_release_ram_cache(void)
             }
         }
     }
-    qemu_mutex_destroy(&ram_state->bitmap_mutex);
-    g_free(ram_state);
-    ram_state = NULL;
+    ram_state_cleanup(&ram_state);
 }
 
 /**
@@ -3302,7 +3317,6 @@  static void colo_flush_ram_cache(void)
             ramblock_sync_dirty_bitmap(ram_state, block);
         }
     }
-
     trace_colo_flush_ram_cache_begin(ram_state->migration_dirty_pages);
     WITH_RCU_READ_LOCK_GUARD() {
         block = QLIST_FIRST_RCU(&ram_list.blocks);
@@ -3348,7 +3362,7 @@  static int ram_load_precopy(QEMUFile *f)
 
     while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
         ram_addr_t addr, total_ram_bytes;
-        void *host = NULL;
+        void *host = NULL, *host_bak = NULL;
         uint8_t ch;
 
         /*
@@ -3379,20 +3393,35 @@  static int ram_load_precopy(QEMUFile *f)
                      RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZRLE)) {
             RAMBlock *block = ram_block_from_stream(f, flags);
 
+            host = host_from_ram_block_offset(block, addr);
             /*
-             * After going into COLO, we should load the Page into colo_cache.
+             * After going into COLO stage, we should not load the page
+             * into SVM's memory diretly, we put them into colo_cache firstly.
+             * NOTE: We need to keep a copy of SVM's ram in colo_cache.
+             * Privously, we copied all these memory in preparing stage of COLO
+             * while we need to stop VM, which is a time-consuming process.
+             * Here we optimize it by a trick, back-up every page while in
+             * migration process while COLO is enabled, though it affects the
+             * speed of the migration, but it obviously reduce the downtime of
+             * back-up all SVM'S memory in COLO preparing stage.
              */
-            if (migration_incoming_in_colo_state()) {
-                host = colo_cache_from_block_offset(block, addr);
-            } else {
-                host = host_from_ram_block_offset(block, addr);
+            if (migration_incoming_colo_enabled()) {
+                if (migration_incoming_in_colo_state()) {
+                    /* In COLO stage, put all pages into cache temporarily */
+                    host = colo_cache_from_block_offset(block, addr);
+                } else {
+                   /*
+                    * In migration stage but before COLO stage,
+                    * Put all pages into both cache and SVM's memory.
+                    */
+                    host_bak = colo_cache_from_block_offset(block, addr);
+                }
             }
             if (!host) {
                 error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
                 ret = -EINVAL;
                 break;
             }
-
             if (!migration_incoming_in_colo_state()) {
                 ramblock_recv_bitmap_set(block, host);
             }
@@ -3506,6 +3535,9 @@  static int ram_load_precopy(QEMUFile *f)
         if (!ret) {
             ret = qemu_file_get_error(f);
         }
+        if (!ret && host_bak) {
+            memcpy(host_bak, host, TARGET_PAGE_SIZE);
+        }
     }
 
     ret |= wait_for_decompress_done();
diff --git a/migration/ram.h b/migration/ram.h
index a553d40751..5ceaff7cb4 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -66,5 +66,6 @@  int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb);
 /* ram cache */
 int colo_init_ram_cache(void);
 void colo_release_ram_cache(void);
+void colo_incoming_start_dirty_log(void);
 
 #endif

[V2,4/8] COLO: Optimize memory back-up process

Commit Message

Comments

Patch