From patchwork Mon Mar 28 04:16:05 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jitendra Kolhe X-Patchwork-Id: 8677811 Return-Path: X-Original-To: patchwork-qemu-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 5D4C89F30C for ; Mon, 28 Mar 2016 04:15:08 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id B8B642024C for ; Mon, 28 Mar 2016 04:15:06 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B81BA20220 for ; Mon, 28 Mar 2016 04:15:04 +0000 (UTC) Received: from localhost ([::1]:39130 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1akOZz-0006oK-Pj for patchwork-qemu-devel@patchwork.kernel.org; Mon, 28 Mar 2016 00:15:03 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:49424) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1akOZp-0006oC-3C for qemu-devel@nongnu.org; Mon, 28 Mar 2016 00:14:55 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1akOZl-0003js-PQ for qemu-devel@nongnu.org; Mon, 28 Mar 2016 00:14:53 -0400 Received: from g9t5009.houston.hp.com ([15.240.92.67]:42638) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1akOZl-0003jh-Gs for qemu-devel@nongnu.org; Mon, 28 Mar 2016 00:14:49 -0400 Received: from hpvmrhel1.in.rdlabs.hpecorp.net (unknown [15.213.178.32]) by g9t5009.houston.hp.com (Postfix) with ESMTP id ADE644C; Mon, 28 Mar 2016 04:14:35 +0000 (UTC) From: Jitendra Kolhe To: qemu-devel@nongnu.org Date: Mon, 28 Mar 2016 09:46:05 +0530 Message-Id: <1459138565-6244-1-git-send-email-jitendra.kolhe@hpe.com> X-Mailer: git-send-email 1.8.3.1 MIME-Version: 1.0 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 15.240.92.67 Cc: JBottomley@Odin.com, ehabkost@redhat.com, crosthwaite.peter@gmail.com, simhan@hpe.com, quintela@redhat.com, armbru@redhat.com, lcapitulino@redhat.com, jitendra.kolhe@hpe.com, borntraeger@de.ibm.com, mst@redhat.com, mohan_parthasarathy@hpe.com, stefanha@redhat.com, den@openvz.org, amit.shah@redhat.com, pbonzini@redhat.com, dgilbert@redhat.com, rth@twiddle.net Subject: [Qemu-devel] [PATCH v2] migration: skip sending ram pages released by virtio-balloon driver. X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP While measuring live migration performance for qemu/kvm guest, it was observed that the qemu doesn’t maintain any intelligence for the guest ram pages which are released by the guest balloon driver and treat such pages as any other normal guest ram pages. This has direct impact on overall migration time for the guest which has released (ballooned out) memory to the host. In case of large systems, where we can configure large guests with 1TB and with considerable amount of memory release by balloon driver to the, host the migration time gets worse. The solution proposed below is local only to qemu (and does not require any modification to Linux kernel or any guest driver). We have verified the fix for large guests =1TB on HPE Superdome X (which can support up to 240 cores and 12TB of memory) and in case where 90% of memory is released by balloon driver the migration time for an idle guests reduces to ~600 sec's from ~1200 sec’s. Detail: During live migration, as part of 1st iteration in ram_save_iterate() -> ram_find_and_save_block () will try to migrate ram pages which are released by vitrio-balloon driver as part of dynamic memory delete. Even though the pages which are returned to the host by virtio-balloon driver are zero pages, the migration algorithm will still end up scanning the entire page ram_find_and_save_block() -> ram_save_page/ ram_save_compressed_page -> save_zero_page() -> is_zero_range(). We also end-up sending some control information over network for these page during migration. This adds to total migration time. The proposed fix, uses the existing bitmap infrastructure to create a virtio-balloon bitmap. The bits in the bitmap represent a guest ram page of size 1UL<< VIRTIO_BALLOON_PFN_SHIFT. The bitmap represents entire guest ram memory till max configured memory. Guest ram pages claimed by the virtio-balloon driver will be represented by 1 in the bitmap. During live migration, each guest ram page (host VA offset) is checked against the virtio-balloon bitmap, if the bit is set the corresponding ram page will be excluded from scanning and sending control information during migration. The bitmap is also migrated to the target as part of every ram_save_iterate loop and after the guest is stopped remaining balloon bitmap is migrated as part of balloon driver save / load interface. With the proposed fix, the average migration time for an idle guest with 1TB maximum memory and 64vCpus - reduces from ~1200 secs to ~600 sec, with guest memory ballooned down to 128GB (~10% of 1TB). - reduces from ~1300 to ~1200 sec (7%), with guest memory ballooned down to 896GB (~90% of 1TB), - with no ballooning configured, we don’t expect to see any impact on total migration time. The optimization gets temporarily disabled, if the balloon operation is in progress. Since the optimization skips scanning and migrating control information for ballooned out pages, we might skip guest ram pages in cases where the guest balloon driver has freed the ram page to the guest but not yet informed the host/qemu about the ram page (VIRTIO_BALLOON_F_MUST_TELL_HOST). In such case with optimization, we might skip migrating ram pages which the guest is using. Since this problem is specific to balloon leak, we can restrict balloon operation in progress check to only balloon leak operation in progress check. The optimization also get permanently disabled (for all subsequent migrations) in case any of the migration uses postcopy capability. In case of postcopy the balloon bitmap would be required to send after vm_stop, which has significant impact on the downtime. Moreover, the applications in the guest space won’t be actually faulting on the ram pages which are already ballooned out, the proposed optimization will not show any improvement in migration time during postcopy. Signed-off-by: Jitendra Kolhe --- Changed in v2: - Resolved compilation issue for qemu-user binaries in exec.c - Localize balloon bitmap test to save_zero_page(). - Updated version string for newly added migration capability to 2.7. - Made minor modifications to patch commit text. balloon.c | 253 ++++++++++++++++++++++++++++++++++++- exec.c | 3 + hw/virtio/virtio-balloon.c | 35 ++++- include/hw/virtio/virtio-balloon.h | 1 + include/migration/migration.h | 1 + include/sysemu/balloon.h | 15 ++- migration/migration.c | 9 ++ migration/ram.c | 31 ++++- qapi-schema.json | 5 +- 9 files changed, 341 insertions(+), 12 deletions(-) diff --git a/balloon.c b/balloon.c index f2ef50c..1c2d228 100644 --- a/balloon.c +++ b/balloon.c @@ -33,11 +33,34 @@ #include "qmp-commands.h" #include "qapi/qmp/qerror.h" #include "qapi/qmp/qjson.h" +#include "exec/ram_addr.h" +#include "migration/migration.h" + +#define BALLOON_BITMAP_DISABLE_FLAG -1UL + +typedef enum { + BALLOON_BITMAP_DISABLE_NONE = 1, /* Enabled */ + BALLOON_BITMAP_DISABLE_CURRENT, + BALLOON_BITMAP_DISABLE_PERMANENT, +} BalloonBitmapDisableState; static QEMUBalloonEvent *balloon_event_fn; static QEMUBalloonStatus *balloon_stat_fn; +static QEMUBalloonInProgress *balloon_in_progress_fn; static void *balloon_opaque; static bool balloon_inhibited; +static unsigned long balloon_bitmap_pages; +static unsigned int balloon_bitmap_pfn_shift; +static QemuMutex balloon_bitmap_mutex; +static bool balloon_bitmap_xfered; +static unsigned long balloon_min_bitmap_offset; +static unsigned long balloon_max_bitmap_offset; +static BalloonBitmapDisableState balloon_bitmap_disable_state; + +static struct BitmapRcu { + struct rcu_head rcu; + unsigned long *bmap; +} *balloon_bitmap_rcu; bool qemu_balloon_is_inhibited(void) { @@ -49,6 +72,21 @@ void qemu_balloon_inhibit(bool state) balloon_inhibited = state; } +void qemu_mutex_lock_balloon_bitmap(void) +{ + qemu_mutex_lock(&balloon_bitmap_mutex); +} + +void qemu_mutex_unlock_balloon_bitmap(void) +{ + qemu_mutex_unlock(&balloon_bitmap_mutex); +} + +void qemu_balloon_reset_bitmap_data(void) +{ + balloon_bitmap_xfered = false; +} + static bool have_balloon(Error **errp) { if (kvm_enabled() && !kvm_has_sync_mmu()) { @@ -65,9 +103,12 @@ static bool have_balloon(Error **errp) } int qemu_add_balloon_handler(QEMUBalloonEvent *event_func, - QEMUBalloonStatus *stat_func, void *opaque) + QEMUBalloonStatus *stat_func, + QEMUBalloonInProgress *in_progress_func, + void *opaque, int pfn_shift) { - if (balloon_event_fn || balloon_stat_fn || balloon_opaque) { + if (balloon_event_fn || balloon_stat_fn || + balloon_in_progress_fn || balloon_opaque) { /* We're already registered one balloon handler. How many can * a guest really have? */ @@ -75,17 +116,39 @@ int qemu_add_balloon_handler(QEMUBalloonEvent *event_func, } balloon_event_fn = event_func; balloon_stat_fn = stat_func; + balloon_in_progress_fn = in_progress_func; balloon_opaque = opaque; + + qemu_mutex_init(&balloon_bitmap_mutex); + balloon_bitmap_disable_state = BALLOON_BITMAP_DISABLE_NONE; + balloon_bitmap_pfn_shift = pfn_shift; + balloon_bitmap_pages = (last_ram_offset() >> balloon_bitmap_pfn_shift); + balloon_bitmap_rcu = g_new0(struct BitmapRcu, 1); + balloon_bitmap_rcu->bmap = bitmap_new(balloon_bitmap_pages); + bitmap_clear(balloon_bitmap_rcu->bmap, 0, balloon_bitmap_pages); + return 0; } +static void balloon_bitmap_free(struct BitmapRcu *bmap) +{ + g_free(bmap->bmap); + g_free(bmap); +} + void qemu_remove_balloon_handler(void *opaque) { + struct BitmapRcu *bitmap = balloon_bitmap_rcu; if (balloon_opaque != opaque) { return; } + atomic_rcu_set(&balloon_bitmap_rcu, NULL); + if (bitmap) { + call_rcu(bitmap, balloon_bitmap_free, rcu); + } balloon_event_fn = NULL; balloon_stat_fn = NULL; + balloon_in_progress_fn = NULL; balloon_opaque = NULL; } @@ -116,3 +179,189 @@ void qmp_balloon(int64_t target, Error **errp) trace_balloon_event(balloon_opaque, target); balloon_event_fn(balloon_opaque, target); } + +/* Handle Ram hotplug case, only called in case old < new */ +int qemu_balloon_bitmap_extend(ram_addr_t old, ram_addr_t new) +{ + struct BitmapRcu *old_bitmap = balloon_bitmap_rcu, *bitmap; + unsigned long old_offset, new_offset; + + if (!balloon_bitmap_rcu) { + return -1; + } + + old_offset = (old >> balloon_bitmap_pfn_shift); + new_offset = (new >> balloon_bitmap_pfn_shift); + + bitmap = g_new(struct BitmapRcu, 1); + bitmap->bmap = bitmap_new(new_offset); + + qemu_mutex_lock_balloon_bitmap(); + bitmap_clear(bitmap->bmap, 0, + balloon_bitmap_pages + new_offset - old_offset); + bitmap_copy(bitmap->bmap, old_bitmap->bmap, old_offset); + + atomic_rcu_set(&balloon_bitmap_rcu, bitmap); + balloon_bitmap_pages += new_offset - old_offset; + qemu_mutex_unlock_balloon_bitmap(); + call_rcu(old_bitmap, balloon_bitmap_free, rcu); + + return 0; +} + +/* Should be called with balloon bitmap mutex lock held */ +int qemu_balloon_bitmap_update(ram_addr_t addr, int deflate) +{ + unsigned long *bitmap; + unsigned long offset = 0; + + if (!balloon_bitmap_rcu) { + return -1; + } + offset = (addr >> balloon_bitmap_pfn_shift); + if (balloon_bitmap_xfered) { + if (offset < balloon_min_bitmap_offset) { + balloon_min_bitmap_offset = offset; + } + if (offset > balloon_max_bitmap_offset) { + balloon_max_bitmap_offset = offset; + } + } + + rcu_read_lock(); + bitmap = atomic_rcu_read(&balloon_bitmap_rcu)->bmap; + if (deflate == 0) { + set_bit(offset, bitmap); + } else { + clear_bit(offset, bitmap); + } + rcu_read_unlock(); + return 0; +} + +void qemu_balloon_bitmap_setup(void) +{ + if (migrate_postcopy_ram()) { + balloon_bitmap_disable_state = BALLOON_BITMAP_DISABLE_PERMANENT; + } else if ((!balloon_bitmap_rcu || !migrate_skip_balloon()) && + (balloon_bitmap_disable_state != + BALLOON_BITMAP_DISABLE_PERMANENT)) { + balloon_bitmap_disable_state = BALLOON_BITMAP_DISABLE_CURRENT; + } +} + +int qemu_balloon_bitmap_test(RAMBlock *rb, ram_addr_t addr) +{ + unsigned long *bitmap; + ram_addr_t base; + unsigned long nr = 0; + int ret = 0; + + if (balloon_bitmap_disable_state == BALLOON_BITMAP_DISABLE_CURRENT || + balloon_bitmap_disable_state == BALLOON_BITMAP_DISABLE_PERMANENT) { + return 0; + } + balloon_in_progress_fn(balloon_opaque, &ret); + if (ret == 1) { + return 0; + } + + rcu_read_lock(); + bitmap = atomic_rcu_read(&balloon_bitmap_rcu)->bmap; + base = rb->offset >> balloon_bitmap_pfn_shift; + nr = base + (addr >> balloon_bitmap_pfn_shift); + if (test_bit(nr, bitmap)) { + ret = 1; + } + rcu_read_unlock(); + return ret; +} + +int qemu_balloon_bitmap_save(QEMUFile *f) +{ + unsigned long *bitmap; + unsigned long offset = 0, next = 0, len = 0; + unsigned long tmpoffset = 0, tmplimit = 0; + + if (balloon_bitmap_disable_state == BALLOON_BITMAP_DISABLE_PERMANENT) { + qemu_put_be64(f, BALLOON_BITMAP_DISABLE_FLAG); + return 0; + } + + qemu_mutex_lock_balloon_bitmap(); + if (balloon_bitmap_xfered) { + tmpoffset = balloon_min_bitmap_offset; + tmplimit = balloon_max_bitmap_offset; + } else { + balloon_bitmap_xfered = true; + tmpoffset = offset; + tmplimit = balloon_bitmap_pages; + } + + balloon_min_bitmap_offset = balloon_bitmap_pages; + balloon_max_bitmap_offset = 0; + + qemu_put_be64(f, balloon_bitmap_pages); + qemu_put_be64(f, tmpoffset); + qemu_put_be64(f, tmplimit); + rcu_read_lock(); + bitmap = atomic_rcu_read(&balloon_bitmap_rcu)->bmap; + while (tmpoffset < tmplimit) { + unsigned long next_set_bit, start_set_bit; + next_set_bit = find_next_bit(bitmap, balloon_bitmap_pages, tmpoffset); + start_set_bit = next_set_bit; + if (next_set_bit == balloon_bitmap_pages) { + len = 0; + next = start_set_bit; + qemu_put_be64(f, next); + qemu_put_be64(f, len); + break; + } + next_set_bit = find_next_zero_bit(bitmap, + balloon_bitmap_pages, + ++next_set_bit); + len = (next_set_bit - start_set_bit); + next = start_set_bit; + qemu_put_be64(f, next); + qemu_put_be64(f, len); + tmpoffset = next + len; + } + rcu_read_unlock(); + qemu_mutex_unlock_balloon_bitmap(); + return 0; +} + +int qemu_balloon_bitmap_load(QEMUFile *f) +{ + unsigned long *bitmap; + unsigned long next = 0, len = 0; + unsigned long tmpoffset = 0, tmplimit = 0; + + if (!balloon_bitmap_rcu) { + return -1; + } + + qemu_mutex_lock_balloon_bitmap(); + balloon_bitmap_pages = qemu_get_be64(f); + if (balloon_bitmap_pages == BALLOON_BITMAP_DISABLE_FLAG) { + balloon_bitmap_disable_state = BALLOON_BITMAP_DISABLE_PERMANENT; + qemu_mutex_unlock_balloon_bitmap(); + return 0; + } + tmpoffset = qemu_get_be64(f); + tmplimit = qemu_get_be64(f); + rcu_read_lock(); + bitmap = atomic_rcu_read(&balloon_bitmap_rcu)->bmap; + while (tmpoffset < tmplimit) { + next = qemu_get_be64(f); + len = qemu_get_be64(f); + if (len == 0) { + break; + } + bitmap_set(bitmap, next, len); + tmpoffset = next + len; + } + rcu_read_unlock(); + qemu_mutex_unlock_balloon_bitmap(); + return 0; +} diff --git a/exec.c b/exec.c index f398d21..7a448e5 100644 --- a/exec.c +++ b/exec.c @@ -43,6 +43,7 @@ #else /* !CONFIG_USER_ONLY */ #include "sysemu/xen-mapcache.h" #include "trace.h" +#include "sysemu/balloon.h" #endif #include "exec/cpu-all.h" #include "qemu/rcu_queue.h" @@ -1610,6 +1611,8 @@ static void ram_block_add(RAMBlock *new_block, Error **errp) if (new_ram_size > old_ram_size) { migration_bitmap_extend(old_ram_size, new_ram_size); dirty_memory_extend(old_ram_size, new_ram_size); + qemu_balloon_bitmap_extend(old_ram_size << TARGET_PAGE_BITS, + new_ram_size << TARGET_PAGE_BITS); } /* Keep the list sorted from biggest to smallest block. Unlike QTAILQ, * QLIST (which has an RCU-friendly variant) does not have insertion at diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c index 22ad25c..9f3a4c8 100644 --- a/hw/virtio/virtio-balloon.c +++ b/hw/virtio/virtio-balloon.c @@ -27,6 +27,7 @@ #include "qapi/visitor.h" #include "qapi-event.h" #include "trace.h" +#include "migration/migration.h" #if defined(__linux__) #include @@ -214,11 +215,13 @@ static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq) VirtQueueElement *elem; MemoryRegionSection section; + qemu_mutex_lock_balloon_bitmap(); for (;;) { size_t offset = 0; uint32_t pfn; elem = virtqueue_pop(vq, sizeof(VirtQueueElement)); if (!elem) { + qemu_mutex_unlock_balloon_bitmap(); return; } @@ -242,6 +245,7 @@ static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq) addr = section.offset_within_region; balloon_page(memory_region_get_ram_ptr(section.mr) + addr, !!(vq == s->dvq)); + qemu_balloon_bitmap_update(addr, !!(vq == s->dvq)); memory_region_unref(section.mr); } @@ -249,6 +253,7 @@ static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq) virtio_notify(vdev, vq); g_free(elem); } + qemu_mutex_unlock_balloon_bitmap(); } static void virtio_balloon_receive_stats(VirtIODevice *vdev, VirtQueue *vq) @@ -303,6 +308,16 @@ out: } } +static void virtio_balloon_migration_state_changed(Notifier *notifier, + void *data) +{ + MigrationState *mig = data; + + if (migration_has_failed(mig)) { + qemu_balloon_reset_bitmap_data(); + } +} + static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data) { VirtIOBalloon *dev = VIRTIO_BALLOON(vdev); @@ -382,6 +397,16 @@ static void virtio_balloon_stat(void *opaque, BalloonInfo *info) VIRTIO_BALLOON_PFN_SHIFT); } +static void virtio_balloon_in_progress(void *opaque, int *status) +{ + VirtIOBalloon *dev = VIRTIO_BALLOON(opaque); + if (cpu_to_le32(dev->actual) != cpu_to_le32(dev->num_pages)) { + *status = 1; + return; + } + *status = 0; +} + static void virtio_balloon_to_target(void *opaque, ram_addr_t target) { VirtIOBalloon *dev = VIRTIO_BALLOON(opaque); @@ -409,6 +434,7 @@ static void virtio_balloon_save_device(VirtIODevice *vdev, QEMUFile *f) qemu_put_be32(f, s->num_pages); qemu_put_be32(f, s->actual); + qemu_balloon_bitmap_save(f); } static int virtio_balloon_load(QEMUFile *f, void *opaque, int version_id) @@ -426,6 +452,7 @@ static int virtio_balloon_load_device(VirtIODevice *vdev, QEMUFile *f, s->num_pages = qemu_get_be32(f); s->actual = qemu_get_be32(f); + qemu_balloon_bitmap_load(f); return 0; } @@ -439,7 +466,9 @@ static void virtio_balloon_device_realize(DeviceState *dev, Error **errp) sizeof(struct virtio_balloon_config)); ret = qemu_add_balloon_handler(virtio_balloon_to_target, - virtio_balloon_stat, s); + virtio_balloon_stat, + virtio_balloon_in_progress, s, + VIRTIO_BALLOON_PFN_SHIFT); if (ret < 0) { error_setg(errp, "Only one balloon device is supported"); @@ -453,6 +482,9 @@ static void virtio_balloon_device_realize(DeviceState *dev, Error **errp) reset_stats(s); + s->migration_state_notifier.notify = virtio_balloon_migration_state_changed; + add_migration_state_change_notifier(&s->migration_state_notifier); + register_savevm(dev, "virtio-balloon", -1, 1, virtio_balloon_save, virtio_balloon_load, s); } @@ -462,6 +494,7 @@ static void virtio_balloon_device_unrealize(DeviceState *dev, Error **errp) VirtIODevice *vdev = VIRTIO_DEVICE(dev); VirtIOBalloon *s = VIRTIO_BALLOON(dev); + remove_migration_state_change_notifier(&s->migration_state_notifier); balloon_stats_destroy_timer(s); qemu_remove_balloon_handler(s); unregister_savevm(dev, "virtio-balloon", s); diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h index 35f62ac..1ded5a9 100644 --- a/include/hw/virtio/virtio-balloon.h +++ b/include/hw/virtio/virtio-balloon.h @@ -43,6 +43,7 @@ typedef struct VirtIOBalloon { int64_t stats_last_update; int64_t stats_poll_interval; uint32_t host_features; + Notifier migration_state_notifier; } VirtIOBalloon; #endif diff --git a/include/migration/migration.h b/include/migration/migration.h index ac2c12c..6c1d1af 100644 --- a/include/migration/migration.h +++ b/include/migration/migration.h @@ -267,6 +267,7 @@ void migrate_del_blocker(Error *reason); bool migrate_postcopy_ram(void); bool migrate_zero_blocks(void); +bool migrate_skip_balloon(void); bool migrate_auto_converge(void); diff --git a/include/sysemu/balloon.h b/include/sysemu/balloon.h index 3f976b4..5325c38 100644 --- a/include/sysemu/balloon.h +++ b/include/sysemu/balloon.h @@ -15,14 +15,27 @@ #define _QEMU_BALLOON_H #include "qapi-types.h" +#include "migration/qemu-file.h" typedef void (QEMUBalloonEvent)(void *opaque, ram_addr_t target); typedef void (QEMUBalloonStatus)(void *opaque, BalloonInfo *info); +typedef void (QEMUBalloonInProgress) (void *opaque, int *status); int qemu_add_balloon_handler(QEMUBalloonEvent *event_func, - QEMUBalloonStatus *stat_func, void *opaque); + QEMUBalloonStatus *stat_func, + QEMUBalloonInProgress *progress_func, + void *opaque, int pfn_shift); void qemu_remove_balloon_handler(void *opaque); bool qemu_balloon_is_inhibited(void); void qemu_balloon_inhibit(bool state); +void qemu_mutex_lock_balloon_bitmap(void); +void qemu_mutex_unlock_balloon_bitmap(void); +void qemu_balloon_reset_bitmap_data(void); +void qemu_balloon_bitmap_setup(void); +int qemu_balloon_bitmap_extend(ram_addr_t old, ram_addr_t new); +int qemu_balloon_bitmap_update(ram_addr_t addr, int deflate); +int qemu_balloon_bitmap_test(RAMBlock *rb, ram_addr_t addr); +int qemu_balloon_bitmap_save(QEMUFile *f); +int qemu_balloon_bitmap_load(QEMUFile *f); #endif diff --git a/migration/migration.c b/migration/migration.c index 034a918..cb86307 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1200,6 +1200,15 @@ int migrate_use_xbzrle(void) return s->enabled_capabilities[MIGRATION_CAPABILITY_XBZRLE]; } +bool migrate_skip_balloon(void) +{ + MigrationState *s; + + s = migrate_get_current(); + + return s->enabled_capabilities[MIGRATION_CAPABILITY_SKIP_BALLOON]; +} + int64_t migrate_xbzrle_cache_size(void) { MigrationState *s; diff --git a/migration/ram.c b/migration/ram.c index 704f6a9..161ab73 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -40,6 +40,7 @@ #include "trace.h" #include "exec/ram_addr.h" #include "qemu/rcu_queue.h" +#include "sysemu/balloon.h" #ifdef DEBUG_MIGRATION_RAM #define DPRINTF(fmt, ...) \ @@ -65,6 +66,7 @@ static uint64_t bitmap_sync_count; #define RAM_SAVE_FLAG_XBZRLE 0x40 /* 0x80 is reserved in migration.h start with 0x100 next */ #define RAM_SAVE_FLAG_COMPRESS_PAGE 0x100 +#define RAM_SAVE_FLAG_BALLOON 0x200 static const uint8_t ZERO_TARGET_PAGE[TARGET_PAGE_SIZE]; @@ -702,13 +704,17 @@ static int save_zero_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset, { int pages = -1; - if (is_zero_range(p, TARGET_PAGE_SIZE)) { - acct_info.dup_pages++; - *bytes_transferred += save_page_header(f, block, + if (qemu_balloon_bitmap_test(block, offset) != 1) { + if (is_zero_range(p, TARGET_PAGE_SIZE)) { + acct_info.dup_pages++; + *bytes_transferred += save_page_header(f, block, offset | RAM_SAVE_FLAG_COMPRESS); - qemu_put_byte(f, 0); - *bytes_transferred += 1; - pages = 1; + qemu_put_byte(f, 0); + *bytes_transferred += 1; + pages = 1; + } + } else { + pages = 0; } return pages; @@ -773,7 +779,7 @@ static int ram_save_page(QEMUFile *f, PageSearchStatus *pss, * page would be stale */ xbzrle_cache_zero_page(current_addr); - } else if (!ram_bulk_stage && migrate_use_xbzrle()) { + } else if (pages != 0 && !ram_bulk_stage && migrate_use_xbzrle()) { pages = save_xbzrle_page(f, &p, current_addr, block, offset, last_stage, bytes_transferred); if (!last_stage) { @@ -1355,6 +1361,9 @@ static int ram_find_and_save_block(QEMUFile *f, bool last_stage, } if (found) { + /* skip saving ram host page if the corresponding guest page + * is ballooned out + */ pages = ram_save_host_page(ms, f, &pss, last_stage, bytes_transferred, dirty_ram_abs); @@ -1959,6 +1968,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque) rcu_read_unlock(); + qemu_balloon_bitmap_setup(); ram_control_before_iterate(f, RAM_CONTROL_SETUP); ram_control_after_iterate(f, RAM_CONTROL_SETUP); @@ -1984,6 +1994,9 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) ram_control_before_iterate(f, RAM_CONTROL_ROUND); + qemu_put_be64(f, RAM_SAVE_FLAG_BALLOON); + qemu_balloon_bitmap_save(f); + t0 = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); i = 0; while ((ret = qemu_file_rate_limit(f)) == 0) { @@ -2493,6 +2506,10 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) } break; + case RAM_SAVE_FLAG_BALLOON: + qemu_balloon_bitmap_load(f); + break; + case RAM_SAVE_FLAG_COMPRESS: ch = qemu_get_byte(f); ram_handle_compressed(host, ch, TARGET_PAGE_SIZE); diff --git a/qapi-schema.json b/qapi-schema.json index 7f8d799..38163ca 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -544,11 +544,14 @@ # been migrated, pulling the remaining pages along as needed. NOTE: If # the migration fails during postcopy the VM will fail. (since 2.6) # +# @skip-balloon: Skip scanning ram pages released by virtio-balloon driver. +# (since 2.7) +# # Since: 1.2 ## { 'enum': 'MigrationCapability', 'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks', - 'compress', 'events', 'postcopy-ram'] } + 'compress', 'events', 'postcopy-ram', 'skip-balloon'] } ## # @MigrationCapabilityStatus