From patchwork Thu Oct 10 07:30:54 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Suleiman Souhlal X-Patchwork-Id: 11182813 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 64E8614ED for ; Thu, 10 Oct 2019 07:39:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 3C72221D56 for ; Thu, 10 Oct 2019 07:39:17 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="C/ywy926" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1733135AbfJJHiK (ORCPT ); Thu, 10 Oct 2019 03:38:10 -0400 Received: from mail-qk1-f202.google.com ([209.85.222.202]:40191 "EHLO mail-qk1-f202.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1733062AbfJJHbL (ORCPT ); Thu, 10 Oct 2019 03:31:11 -0400 Received: by mail-qk1-f202.google.com with SMTP id x62so4621583qkb.7 for ; Thu, 10 Oct 2019 00:31:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=cqfOgbBFq1cuCJ9mBVUN432MN6WUG2Cztrp58NLs2Vw=; b=C/ywy926tafCeZE7HfHuO7rwVEIl1bPrxo0phIi3BlyVKKzbYBsdlulXvZwVmVv152 pSDnC5TXL5u/rIOqk5sCwwe9zTponYR5613RtFPYKxAnp1P2hQy46QRdfWSTvAyKgHxT +s8OWL8JsfxGoGswhx87REvg1bxy6UD5+Z5N5/cwKMK++NsW9zmTgwvwrmXvdsO+RD46 /CW3FiWNnaeM+/a58dWILDDj6hT3tHScRgNcUFz14HaFjyNfynJpHIgxuskOLZUyUB3u A58hwqSvLbz9Bz6p6Nydy/epNL3Cu9kha1GvBWk/w8LyVjBW7G6GiF68ffWwZtt7wfWn Xjlw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=cqfOgbBFq1cuCJ9mBVUN432MN6WUG2Cztrp58NLs2Vw=; b=ZGFWua94nRDszJMiwm1utT/gpE/25P3kbSiP5DhQBRyuNNw1OLeLmgl74jSuwLxMJ3 7M2EkAZv+pEa64Oi9V0dh5YTWJZetMonzzhmsAvQYR8KykjRmYxInod8n82wonQIK7+D k10t0Lw6ddrOKTmFWhzDeYJenOteqQtDMxm9BVBz52EtfKeEwxZ580bG1ERWWDKS+WEd uXFhfBzTfYGz1HNqk+ng6MJ8gOpmVe+j2loSXSuUm/9/8+N1fgEJCSRKGJ2fz1F5u0PL aaoHtlvV7RACD6oBy9rcZXGtnvavwRBA9EaP0hqXJw+7z9vqurE5lMez7QI9ulg1Yfhm 1V1Q== X-Gm-Message-State: APjAAAVbaAcg5zLn8GrqgbLCKB4T90abhxE1bxYPENCLkoMDT+DtWBNA 71RcarD9TGM3LqLg7K6nEpDDjs6+oqrnmQ== X-Google-Smtp-Source: APXvYqwCqV1V0a/8Ms8j/bNzW6sJqTHXo4TnK3h/ScttMp4kqMIWGm+wYIgZKsfhzUHRlf4rOEsV57bcyS0uBw== X-Received: by 2002:a05:620a:136e:: with SMTP id d14mr8147013qkl.393.1570692668234; Thu, 10 Oct 2019 00:31:08 -0700 (PDT) Date: Thu, 10 Oct 2019 16:30:54 +0900 In-Reply-To: <20191010073055.183635-1-suleiman@google.com> Message-Id: <20191010073055.183635-2-suleiman@google.com> Mime-Version: 1.0 References: <20191010073055.183635-1-suleiman@google.com> X-Mailer: git-send-email 2.23.0.581.g78d2f28ef7-goog Subject: [RFC v2 1/2] kvm: Mechanism to copy host timekeeping parameters into guest. From: Suleiman Souhlal To: pbonzini@redhat.com, rkrcmar@redhat.com, tglx@linutronix.de Cc: john.stultz@linaro.org, sboyd@kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, ssouhlal@freebsd.org, tfiga@chromium.org, vkuznets@redhat.com, Suleiman Souhlal Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This is used to synchronize time between host and guest. The guest can request the (guest) physical address it wants the data in through the MSR_KVM_TIMEKEEPER_EN MSR. It currently assumes the host timekeeper is "tsc". Signed-off-by: Suleiman Souhlal --- arch/x86/include/asm/kvm_host.h | 3 + arch/x86/include/asm/pvclock-abi.h | 27 ++++++ arch/x86/include/uapi/asm/kvm_para.h | 1 + arch/x86/kvm/x86.c | 121 +++++++++++++++++++++++++++ 4 files changed, 152 insertions(+) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 50eb430b0ad8..4d622450cb4a 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -659,7 +659,10 @@ struct kvm_vcpu_arch { struct pvclock_vcpu_time_info hv_clock; unsigned int hw_tsc_khz; struct gfn_to_hva_cache pv_time; + struct gfn_to_hva_cache pv_timekeeper_g2h; + struct pvclock_timekeeper pv_timekeeper; bool pv_time_enabled; + bool pv_timekeeper_enabled; /* set guest stopped flag in pvclock flags field */ bool pvclock_set_guest_stopped_request; diff --git a/arch/x86/include/asm/pvclock-abi.h b/arch/x86/include/asm/pvclock-abi.h index 1436226efe3e..2809008b9b26 100644 --- a/arch/x86/include/asm/pvclock-abi.h +++ b/arch/x86/include/asm/pvclock-abi.h @@ -40,6 +40,33 @@ struct pvclock_wall_clock { u32 nsec; } __attribute__((__packed__)); +struct pvclock_read_base { + u64 mask; + u64 cycle_last; + u32 mult; + u32 shift; + u64 xtime_nsec; + u64 base; +} __attribute__((__packed__)); + +struct pvclock_timekeeper { + u64 gen; + u64 flags; + struct pvclock_read_base tkr_mono; + struct pvclock_read_base tkr_raw; + u64 xtime_sec; + u64 ktime_sec; + u64 wall_to_monotonic_sec; + u64 wall_to_monotonic_nsec; + u64 offs_real; + u64 offs_boot; + u64 offs_tai; + u64 raw_sec; + u64 tsc_offset; +} __attribute__((__packed__)); + +#define PVCLOCK_TIMEKEEPER_ENABLED (1 << 0) + #define PVCLOCK_TSC_STABLE_BIT (1 << 0) #define PVCLOCK_GUEST_STOPPED (1 << 1) /* PVCLOCK_COUNTS_FROM_ZERO broke ABI and can't be used anymore. */ diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h index 2a8e0b6b9805..3ebb1d87db3a 100644 --- a/arch/x86/include/uapi/asm/kvm_para.h +++ b/arch/x86/include/uapi/asm/kvm_para.h @@ -50,6 +50,7 @@ #define MSR_KVM_STEAL_TIME 0x4b564d03 #define MSR_KVM_PV_EOI_EN 0x4b564d04 #define MSR_KVM_POLL_CONTROL 0x4b564d05 +#define MSR_KVM_TIMEKEEPER_EN 0x4b564d06 struct kvm_steal_time { __u64 steal; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 661e2bf38526..937f83cdda4b 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -157,6 +157,8 @@ module_param(force_emulation_prefix, bool, S_IRUGO); int __read_mostly pi_inject_timer = -1; module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR); +static atomic_t pv_timekeepers_nr; + #define KVM_NR_SHARED_MSRS 16 struct kvm_shared_msrs_global { @@ -2729,6 +2731,16 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) break; } + case MSR_KVM_TIMEKEEPER_EN: + if (kvm_gfn_to_hva_cache_init(vcpu->kvm, + &vcpu->arch.pv_timekeeper_g2h, data, + sizeof(struct pvclock_timekeeper))) + vcpu->arch.pv_timekeeper_enabled = false; + else { + vcpu->arch.pv_timekeeper_enabled = true; + atomic_inc(&pv_timekeepers_nr); + } + break; case MSR_KVM_ASYNC_PF_EN: if (kvm_pv_enable_async_pf(vcpu, data)) return 1; @@ -7097,6 +7109,109 @@ static struct perf_guest_info_callbacks kvm_guest_cbs = { }; #ifdef CONFIG_X86_64 +static DEFINE_SPINLOCK(shadow_pvtk_lock); +static struct pvclock_timekeeper shadow_pvtk; + +static void +pvclock_copy_read_base(struct pvclock_read_base *pvtkr, + struct tk_read_base *tkr) +{ + pvtkr->cycle_last = tkr->cycle_last; + pvtkr->mult = tkr->mult; + pvtkr->shift = tkr->shift; + pvtkr->mask = tkr->mask; + pvtkr->xtime_nsec = tkr->xtime_nsec; + pvtkr->base = tkr->base; +} + +static void +kvm_copy_into_pvtk(struct kvm_vcpu *vcpu) +{ + struct pvclock_timekeeper *pvtk; + unsigned long flags; + + if (!vcpu->arch.pv_timekeeper_enabled) + return; + + pvtk = &vcpu->arch.pv_timekeeper; + if (pvclock_gtod_data.clock.vclock_mode == VCLOCK_TSC) { + pvtk->flags |= PVCLOCK_TIMEKEEPER_ENABLED; + spin_lock_irqsave(&shadow_pvtk_lock, flags); + pvtk->tkr_mono = shadow_pvtk.tkr_mono; + pvtk->tkr_raw = shadow_pvtk.tkr_raw; + + pvtk->xtime_sec = shadow_pvtk.xtime_sec; + pvtk->ktime_sec = shadow_pvtk.ktime_sec; + pvtk->wall_to_monotonic_sec = + shadow_pvtk.wall_to_monotonic_sec; + pvtk->wall_to_monotonic_nsec = + shadow_pvtk.wall_to_monotonic_nsec; + pvtk->offs_real = shadow_pvtk.offs_real; + pvtk->offs_boot = shadow_pvtk.offs_boot; + pvtk->offs_tai = shadow_pvtk.offs_tai; + pvtk->raw_sec = shadow_pvtk.raw_sec; + spin_unlock_irqrestore(&shadow_pvtk_lock, flags); + + pvtk->tsc_offset = kvm_x86_ops->read_l1_tsc_offset(vcpu); + } else + pvtk->flags &= ~PVCLOCK_TIMEKEEPER_ENABLED; + + BUILD_BUG_ON(offsetof(struct pvclock_timekeeper, gen) != 0); + + /* + * Make the gen count odd to indicate we are in the process of + * updating. + */ + vcpu->arch.pv_timekeeper.gen++; + vcpu->arch.pv_timekeeper.gen |= 1; + + /* + * See comment in kvm_guest_time_update() for why we have to do + * multiple writes. + */ + kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.pv_timekeeper_g2h, + &vcpu->arch.pv_timekeeper, sizeof(vcpu->arch.pv_timekeeper.gen)); + + smp_wmb(); + + kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.pv_timekeeper_g2h, + &vcpu->arch.pv_timekeeper, sizeof(vcpu->arch.pv_timekeeper)); + + smp_wmb(); + + vcpu->arch.pv_timekeeper.gen++; + + kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.pv_timekeeper_g2h, + &vcpu->arch.pv_timekeeper, sizeof(vcpu->arch.pv_timekeeper.gen)); +} + +static void +update_shadow_pvtk(struct timekeeper *tk) +{ + struct pvclock_timekeeper *pvtk; + unsigned long flags; + + pvtk = &shadow_pvtk; + + if (atomic_read(&pv_timekeepers_nr) == 0 || + pvclock_gtod_data.clock.vclock_mode != VCLOCK_TSC) + return; + + spin_lock_irqsave(&shadow_pvtk_lock, flags); + pvclock_copy_read_base(&pvtk->tkr_mono, &tk->tkr_mono); + pvclock_copy_read_base(&pvtk->tkr_raw, &tk->tkr_raw); + + pvtk->xtime_sec = tk->xtime_sec; + pvtk->ktime_sec = tk->ktime_sec; + pvtk->wall_to_monotonic_sec = tk->wall_to_monotonic.tv_sec; + pvtk->wall_to_monotonic_nsec = tk->wall_to_monotonic.tv_nsec; + pvtk->offs_real = tk->offs_real; + pvtk->offs_boot = tk->offs_boot; + pvtk->offs_tai = tk->offs_tai; + pvtk->raw_sec = tk->raw_sec; + spin_unlock_irqrestore(&shadow_pvtk_lock, flags); +} + static void pvclock_gtod_update_fn(struct work_struct *work) { struct kvm *kvm; @@ -7124,6 +7239,7 @@ static int pvclock_gtod_notify(struct notifier_block *nb, unsigned long unused, struct timekeeper *tk = priv; update_pvclock_gtod(tk); + update_shadow_pvtk(tk); /* disable master clock if host does not trust, or does not * use, TSC based clocksource. @@ -7940,6 +8056,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) bool req_immediate_exit = false; + kvm_copy_into_pvtk(vcpu); + if (kvm_request_pending(vcpu)) { if (kvm_check_request(KVM_REQ_GET_VMCS12_PAGES, vcpu)) kvm_x86_ops->get_vmcs12_pages(vcpu); @@ -9020,6 +9138,9 @@ void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu) kvmclock_reset(vcpu); + if (vcpu->arch.pv_timekeeper_enabled) + atomic_dec(&pv_timekeepers_nr); + kvm_x86_ops->vcpu_free(vcpu); free_cpumask_var(wbinvd_dirty_mask); } From patchwork Thu Oct 10 07:30:55 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Suleiman Souhlal X-Patchwork-Id: 11182811 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3FDA514ED for ; Thu, 10 Oct 2019 07:39:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 100F221D7A for ; Thu, 10 Oct 2019 07:39:15 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="pKfamWhq" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1733143AbfJJHiK (ORCPT ); Thu, 10 Oct 2019 03:38:10 -0400 Received: from mail-qk1-f201.google.com ([209.85.222.201]:45056 "EHLO mail-qk1-f201.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1733068AbfJJHbP (ORCPT ); Thu, 10 Oct 2019 03:31:15 -0400 Received: by mail-qk1-f201.google.com with SMTP id s14so4602398qkg.12 for ; Thu, 10 Oct 2019 00:31:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=lRc4gVxiX9A2vAyfTTykUNKX78s0vfyshPhbgF158mw=; b=pKfamWhqA5o/VmlkmrFWgfcuXXifDhh0VT/OCnIoPXQ1CI2I30bQ0Z4/dFIh8cUZaY CAT75zg54Wx7rcLARaqO7cEDGzLh9YhS2evTOdtwod78dy67s2So03fr7SiSKpnqDlMr qmChzb/MGpoYT+eUXEEziwWNvkwbS8efSWDIIzuqjwRz/Bktlxylc3dQoB3Jipa5fV73 L8bmKSXZjSZXEw2zSVbOut7BL4znRec1t9HMpadJCLc9M/WY+Tc7rDVWpby5DdUbRfHY eW7xXe2QdysDZ1ywuXNUhHjMnvmHVX0wpr5Xy9key8Q1WDnzjRIzfNv1RdQ0xab2qq7t l/lg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=lRc4gVxiX9A2vAyfTTykUNKX78s0vfyshPhbgF158mw=; b=U8bNocKNaRogfoRiws018fY0EPzNcK46iBY+aFTrnvR+lFEDbO/1IbaFOwPjEqNv1z TN8MbwhX7gZSvM50jIaO+s/8WZRnUo4Wz04WLiAHqoBR7LpBSyrOu6YShqIvVnY1YwT9 wqcLdu3xYvyVoQbRowrG0j0XIJKeSJRP0XJfsqGPvtam++Ezr5pCMjD0MzJwuSZsSbNN znRfW/PVrUxKz2oXf3DILlfB4ev7f3VYc6MyUHU0QIHnJhUCp52BLWhP9MLiJWhVHd4t 8sorDsfZbMKEj/MhQN1hdpvgpg+9aGRSNWxrCHQTG3oowTR5LdZf669ibZR3qiVxxWKJ u6Zg== X-Gm-Message-State: APjAAAV/y6SNjJT07olzpvGTi3z5XpJLNjIEsBemqYpjni8+Xwie/AF3 m9wJpd41kjaguddDN6uvbst+6N/Csw5EFQ== X-Google-Smtp-Source: APXvYqxTJ6oZLrEDQCMDaFAXpsBAe3TNsX0YgtXEeErhAIpuxnfa9tVw0IgMH3MwfbGKX98V2thZinTrVlSgdw== X-Received: by 2002:ac8:6992:: with SMTP id o18mr8726329qtq.105.1570692673999; Thu, 10 Oct 2019 00:31:13 -0700 (PDT) Date: Thu, 10 Oct 2019 16:30:55 +0900 In-Reply-To: <20191010073055.183635-1-suleiman@google.com> Message-Id: <20191010073055.183635-3-suleiman@google.com> Mime-Version: 1.0 References: <20191010073055.183635-1-suleiman@google.com> X-Mailer: git-send-email 2.23.0.581.g78d2f28ef7-goog Subject: [RFC v2 2/2] x86/kvmclock: Introduce kvm-hostclock clocksource. From: Suleiman Souhlal To: pbonzini@redhat.com, rkrcmar@redhat.com, tglx@linutronix.de Cc: john.stultz@linaro.org, sboyd@kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, ssouhlal@freebsd.org, tfiga@chromium.org, vkuznets@redhat.com, Suleiman Souhlal Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org When kvm-hostclock is selected, and the host supports it, update our timekeeping parameters to be the same as the host. This lets us have our time synchronized with the host's, even in the presence of host NTP or suspend. Signed-off-by: Suleiman Souhlal --- arch/x86/Kconfig | 9 ++ arch/x86/include/asm/kvmclock.h | 12 +++ arch/x86/kernel/Makefile | 2 + arch/x86/kernel/kvmclock.c | 5 +- arch/x86/kernel/kvmhostclock.c | 130 ++++++++++++++++++++++++++++ include/linux/timekeeper_internal.h | 8 ++ kernel/time/timekeeping.c | 2 + 7 files changed, 167 insertions(+), 1 deletion(-) create mode 100644 arch/x86/kernel/kvmhostclock.c diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index d6e1faa28c58..c5b1257ea969 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -839,6 +839,15 @@ config PARAVIRT_TIME_ACCOUNTING config PARAVIRT_CLOCK bool +config KVM_HOSTCLOCK + bool "kvmclock uses host timekeeping" + depends on KVM_GUEST + help + Select this option to make the guest use the same timekeeping + parameters as the host. This means that time will be almost + exactly the same between the two. Only works if the host uses "tsc" + clocksource. + config JAILHOUSE_GUEST bool "Jailhouse non-root cell support" depends on X86_64 && PCI diff --git a/arch/x86/include/asm/kvmclock.h b/arch/x86/include/asm/kvmclock.h index eceea9299097..de1a590ff97e 100644 --- a/arch/x86/include/asm/kvmclock.h +++ b/arch/x86/include/asm/kvmclock.h @@ -2,6 +2,18 @@ #ifndef _ASM_X86_KVM_CLOCK_H #define _ASM_X86_KVM_CLOCK_H +#include + extern struct clocksource kvm_clock; +unsigned long kvm_get_tsc_khz(void); + +#ifdef CONFIG_KVM_HOSTCLOCK +void kvm_hostclock_init(void); +#else +static inline void kvm_hostclock_init(void) +{ +} +#endif /* KVM_HOSTCLOCK */ + #endif /* _ASM_X86_KVM_CLOCK_H */ diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 3578ad248bc9..bc7be935fc5e 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -17,6 +17,7 @@ CFLAGS_REMOVE_tsc.o = -pg CFLAGS_REMOVE_paravirt-spinlocks.o = -pg CFLAGS_REMOVE_pvclock.o = -pg CFLAGS_REMOVE_kvmclock.o = -pg +CFLAGS_REMOVE_kvmhostclock.o = -pg CFLAGS_REMOVE_ftrace.o = -pg CFLAGS_REMOVE_early_printk.o = -pg CFLAGS_REMOVE_head64.o = -pg @@ -112,6 +113,7 @@ obj-$(CONFIG_AMD_NB) += amd_nb.o obj-$(CONFIG_DEBUG_NMI_SELFTEST) += nmi_selftest.o obj-$(CONFIG_KVM_GUEST) += kvm.o kvmclock.o +obj-$(CONFIG_KVM_HOSTCLOCK) += kvmhostclock.o obj-$(CONFIG_PARAVIRT) += paravirt.o paravirt_patch.o obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= paravirt-spinlocks.o obj-$(CONFIG_PARAVIRT_CLOCK) += pvclock.o diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c index 904494b924c1..4ab862de9777 100644 --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -125,7 +125,7 @@ static inline void kvm_sched_clock_init(bool stable) * poll of guests can be running and trouble each other. So we preset * lpj here */ -static unsigned long kvm_get_tsc_khz(void) +unsigned long kvm_get_tsc_khz(void) { setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ); return pvclock_tsc_khz(this_cpu_pvti()); @@ -366,5 +366,8 @@ void __init kvmclock_init(void) kvm_clock.rating = 299; clocksource_register_hz(&kvm_clock, NSEC_PER_SEC); + + kvm_hostclock_init(); + pv_info.name = "KVM"; } diff --git a/arch/x86/kernel/kvmhostclock.c b/arch/x86/kernel/kvmhostclock.c new file mode 100644 index 000000000000..9971343c2bed --- /dev/null +++ b/arch/x86/kernel/kvmhostclock.c @@ -0,0 +1,130 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * KVM clocksource that uses host timekeeping. + * Copyright (c) 2019 Suleiman Souhlal, Google LLC + */ + +#include +#include +#include +#include +#include +#include + +struct pvclock_timekeeper pv_timekeeper; + +static bool pv_timekeeper_enabled; +static bool pv_timekeeper_present; +static int old_vclock_mode; + +static u64 +kvm_hostclock_get_cycles(struct clocksource *cs) +{ + return rdtsc_ordered(); +} + +static int +kvm_hostclock_enable(struct clocksource *cs) +{ + pv_timekeeper_enabled = 1; + + old_vclock_mode = kvm_clock.archdata.vclock_mode; + kvm_clock.archdata.vclock_mode = VCLOCK_TSC; + return 0; +} + +static void +kvm_hostclock_disable(struct clocksource *cs) +{ + pv_timekeeper_enabled = 0; + kvm_clock.archdata.vclock_mode = old_vclock_mode; +} + +struct clocksource kvm_hostclock = { + .name = "kvm-hostclock", + .read = kvm_hostclock_get_cycles, + .enable = kvm_hostclock_enable, + .disable = kvm_hostclock_disable, + .rating = 401, /* Higher than kvm-clock */ + .mask = CLOCKSOURCE_MASK(64), + .flags = CLOCK_SOURCE_IS_CONTINUOUS, +}; + +static void +pvclock_copy_into_read_base(struct pvclock_timekeeper *pvtk, + struct tk_read_base *tkr, struct pvclock_read_base *pvtkr) +{ + int shift_diff; + + tkr->mask = pvtkr->mask; + tkr->cycle_last = pvtkr->cycle_last + pvtk->tsc_offset; + tkr->mult = pvtkr->mult; + shift_diff = tkr->shift - pvtkr->shift; + tkr->shift = pvtkr->shift; + tkr->xtime_nsec = pvtkr->xtime_nsec; + tkr->base = pvtkr->base; +} + +static u64 +pvtk_read_begin(struct pvclock_timekeeper *pvtk) +{ + u64 gen; + + gen = pvtk->gen & ~1; + /* Make sure that the gen count is read before the data. */ + virt_rmb(); + + return gen; +} + +static bool +pvtk_read_retry(struct pvclock_timekeeper *pvtk, u64 gen) +{ + /* Make sure that the gen count is re-read after the data. */ + virt_rmb(); + return unlikely(gen != pvtk->gen); +} + +void +kvm_clock_copy_into_tk(struct timekeeper *tk) +{ + struct pvclock_timekeeper *pvtk; + u64 gen; + + if (!pv_timekeeper_enabled) + return; + + pvtk = &pv_timekeeper; + do { + gen = pvtk_read_begin(pvtk); + if (!(pv_timekeeper.flags & PVCLOCK_TIMEKEEPER_ENABLED)) + return; + + pvclock_copy_into_read_base(pvtk, &tk->tkr_mono, + &pvtk->tkr_mono); + pvclock_copy_into_read_base(pvtk, &tk->tkr_raw, &pvtk->tkr_raw); + + tk->xtime_sec = pvtk->xtime_sec; + tk->ktime_sec = pvtk->ktime_sec; + tk->wall_to_monotonic.tv_sec = pvtk->wall_to_monotonic_sec; + tk->wall_to_monotonic.tv_nsec = pvtk->wall_to_monotonic_nsec; + tk->offs_real = pvtk->offs_real; + tk->offs_boot = pvtk->offs_boot; + tk->offs_tai = pvtk->offs_tai; + tk->raw_sec = pvtk->raw_sec; + } while (pvtk_read_retry(pvtk, gen)); +} + +void __init +kvm_hostclock_init(void) +{ + unsigned long pa; + + pa = __pa(&pv_timekeeper); + wrmsrl(MSR_KVM_TIMEKEEPER_EN, pa); + if (pv_timekeeper.flags & PVCLOCK_TIMEKEEPER_ENABLED) { + pv_timekeeper_present = 1; + + clocksource_register_khz(&kvm_hostclock, kvm_get_tsc_khz()); + } +} diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper_internal.h index 84ff2844df2a..43b036375cdc 100644 --- a/include/linux/timekeeper_internal.h +++ b/include/linux/timekeeper_internal.h @@ -153,4 +153,12 @@ static inline void update_vsyscall_tz(void) } #endif +#ifdef CONFIG_KVM_HOSTCLOCK +void kvm_clock_copy_into_tk(struct timekeeper *tk); +#else +static inline void kvm_clock_copy_into_tk(struct timekeeper *tk) +{ +} +#endif + #endif /* _LINUX_TIMEKEEPER_INTERNAL_H */ diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index ca69290bee2a..09bcf13b2334 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -2107,6 +2107,8 @@ static void timekeeping_advance(enum timekeeping_adv_mode mode) clock_set |= accumulate_nsecs_to_secs(tk); write_seqcount_begin(&tk_core.seq); + kvm_clock_copy_into_tk(tk); + /* * Update the real timekeeper. *