From patchwork Fri Nov 4 09:43:24 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Tosatti X-Patchwork-Id: 9412291 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 355E8601C2 for ; Fri, 4 Nov 2016 09:49:32 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3A8B72A2AE for ; Fri, 4 Nov 2016 09:49:32 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2F32D2A2E7; Fri, 4 Nov 2016 09:49:32 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id D37362A2AE for ; Fri, 4 Nov 2016 09:49:31 +0000 (UTC) Received: from localhost ([::1]:37374 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c2b7q-0001d3-Vb for patchwork-qemu-devel@patchwork.kernel.org; Fri, 04 Nov 2016 05:49:31 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41875) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c2b2R-0006NS-2e for qemu-devel@nongnu.org; Fri, 04 Nov 2016 05:43:56 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c2b2N-00044x-TA for qemu-devel@nongnu.org; Fri, 04 Nov 2016 05:43:55 -0400 Received: from mx1.redhat.com ([209.132.183.28]:36424) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1c2b2N-00043I-LG for qemu-devel@nongnu.org; Fri, 04 Nov 2016 05:43:51 -0400 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 2C2BE3D953; Fri, 4 Nov 2016 09:43:50 +0000 (UTC) Received: from amt.cnet (vpn1-7-230.gru2.redhat.com [10.97.7.230]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id uA49hnGl026612; Fri, 4 Nov 2016 05:43:49 -0400 Received: from amt.cnet (localhost [127.0.0.1]) by amt.cnet (Postfix) with ESMTP id 9CFF41008B6; Fri, 4 Nov 2016 07:43:32 -0200 (BRST) Received: (from marcelo@localhost) by amt.cnet (8.14.7/8.14.7/Submit) id uA49hSDG017710; Fri, 4 Nov 2016 07:43:28 -0200 Date: Fri, 4 Nov 2016 07:43:24 -0200 From: Marcelo Tosatti To: kvm@vger.kernel.org, qemu-devel Message-ID: <20161104094322.GA16930@amt.cnet> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-Scanned-By: MIMEDefang 2.68 on 10.5.11.24 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Fri, 04 Nov 2016 09:43:50 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [QEMU PATCH] kvmclock: advance clock by time window between vm_stop and pre_save X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Paolo Bonzini , Juan Quintela , "Dr. David Alan Gilbert" , Eduardo Habkost , Radim =?utf-8?B?S3LEjW3DocWZ?= Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP This patch, relative to pre-copy migration codepath, measures the time between vm_stop() and pre_save(), which includes copying the remaining RAM to destination, and advances the clock by that amount. In a VM with 5 seconds downtime, this reduces the guest clock difference on destination from 5s to 0.2s. Please do not apply this yet as some codepaths still need checking, submitting early for comments. Signed-off-by: Marcelo Tosatti diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c index 0f75dd3..1bd8fd6 100644 --- a/hw/i386/kvm/clock.c +++ b/hw/i386/kvm/clock.c @@ -22,9 +22,11 @@ #include "kvm_i386.h" #include "hw/sysbus.h" #include "hw/kvm/clock.h" +#include "migration/migration.h" #include #include +#include #define TYPE_KVM_CLOCK "kvmclock" #define KVM_CLOCK(obj) OBJECT_CHECK(KVMClockState, (obj), TYPE_KVM_CLOCK) @@ -35,7 +37,11 @@ typedef struct KVMClockState { /*< public >*/ uint64_t clock; + uint64_t ns; bool clock_valid; + + uint64_t advance_clock; + struct timespec t_aftervmstop; } KVMClockState; struct pvclock_vcpu_time_info { @@ -100,6 +106,11 @@ static void kvmclock_vm_state_change(void *opaque, int running, s->clock = time_at_migration; } + if (s->advance_clock && s->clock + s->advance_clock > s->clock) { + s->clock += s->advance_clock; + s->advance_clock = 0; + } + data.clock = s->clock; ret = kvm_vm_ioctl(kvm_state, KVM_SET_CLOCK, &data); if (ret < 0) { @@ -135,6 +146,18 @@ static void kvmclock_vm_state_change(void *opaque, int running, abort(); } s->clock = data.clock; + /* + * Transition from VM-running to VM-stopped via migration? + * Record when the VM was stopped. + */ + + if (state == RUN_STATE_FINISH_MIGRATE && + !migration_in_postcopy(migrate_get_current())) { + clock_gettime(CLOCK_MONOTONIC, &s->t_aftervmstop); + } else { + s->t_aftervmstop.tv_sec = 0; + s->t_aftervmstop.tv_nsec = 0; + } /* * If the VM is stopped, declare the clock state valid to @@ -152,12 +175,66 @@ static void kvmclock_realize(DeviceState *dev, Error **errp) qemu_add_vm_change_state_handler(kvmclock_vm_state_change, s); } +static uint64_t clock_delta(struct timespec *before, struct timespec *after) +{ + if (before->tv_sec > after->tv_sec || + (before->tv_sec == after->tv_sec && + before->tv_nsec > after->tv_nsec)) { + fprintf(stderr, "clock_delta failed: before=(%ld sec, %ld nsec)," + "after=(%ld sec, %ld nsec)\n", before->tv_sec, + before->tv_nsec, after->tv_sec, after->tv_nsec); + abort(); + } + + return (after->tv_sec - before->tv_sec) * 1000000000ULL + + after->tv_nsec - before->tv_nsec; +} + +static void kvmclock_pre_save(void *opaque) +{ + KVMClockState *s = opaque; + struct timespec now; + uint64_t ns; + + if (s->t_aftervmstop.tv_sec == 0) { + return; + } + + clock_gettime(CLOCK_MONOTONIC, &now); + + ns = clock_delta(&s->t_aftervmstop, &now); + + /* + * Linux guests can overflow if time jumps + * forward in large increments. + * Cap maximum adjustment to 10 minutes. + */ + ns = MIN(ns, 600000000000ULL); + + if (s->clock + ns > s->clock) { + s->ns = ns; + } +} + +static int kvmclock_post_load(void *opaque, int version_id) +{ + KVMClockState *s = opaque; + + /* save the value from incoming migration */ + s->advance_clock = s->ns; + + return 0; +} + static const VMStateDescription kvmclock_vmsd = { .name = "kvmclock", - .version_id = 1, + .version_id = 2, .minimum_version_id = 1, + .pre_save = kvmclock_pre_save, + .post_load = kvmclock_post_load, .fields = (VMStateField[]) { VMSTATE_UINT64(clock, KVMClockState), + VMSTATE_UINT64_V(ns, KVMClockState, 2), VMSTATE_END_OF_LIST() } };