From patchwork Thu Jul 29 00:10:02 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oliver Upton X-Patchwork-Id: 12407073 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_ADSP_CUSTOM_MED,DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5B992C4338F for ; Thu, 29 Jul 2021 00:14:14 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 233A560F02 for ; Thu, 29 Jul 2021 00:14:14 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 233A560F02 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:References: Mime-Version:Message-Id:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=eHB3VxYnGqXVpnkpMdOYEahZBDdyEO9onXgpMsgpzQ0=; b=WALX7UZekbxhC2Agao5jMxkFvM 7gMXJ4myA4CywK/3A9WOijFQHuEiMozCtPCnGCuodiQnBGNXO7sAL49KeaCNpe/8twvlTXkfNjRlX Rtl31JfkQoW6sF6wki1fsU+jnjDHBdUT2koYx6ZSR/X+ECljvAybX7Wj7Ba5UF1E295aY4/A0Fnqo 1xO8xZ40sxzMl4E2/l7ezYVL3FZl9rj/LXymoA7IYc9ktUHEUaYEjZg4PyA4CsE91JIyrzzvrlBTI e4MpoHj/RKfhI0QBzjiA0+CxoT1QwqMpQUku+I8laFMnt/pXZdASj8yJGydxUmlem89MUlIMuL3mu DbFRkiSA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1m8tdg-002eU3-4q; Thu, 29 Jul 2021 00:11:04 +0000 Received: from mail-io1-xd49.google.com ([2607:f8b0:4864:20::d49]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1m8td4-002eLx-KA for linux-arm-kernel@lists.infradead.org; Thu, 29 Jul 2021 00:10:29 +0000 Received: by mail-io1-xd49.google.com with SMTP id z21-20020a5d84d50000b02904e00bb129f0so2693223ior.18 for ; Wed, 28 Jul 2021 17:10:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=XyRqLIzc237XCfQYWkdhdtxQsXsqd91g1nj8S9xVJ0Q=; b=o+yBPb4ttzcaEPnt4bTYjTs5iRBNXyhrpOGKn7Uv48QckZG6qNyFKXiw6dP2KtDXff qJOYW86N/EZPzvnb7cgowWR2ixyYjEtz4prZamyXV1ebTQE7aCr4L5SmgMVpVqLIBxGV Q65eXvKMYLmLwC1vxKtYIcPZirbhrr8p2keBsRM4mzYRA1zcOahxxBViAisZOYU0BhX9 mAgaeNavx1UxCh6E0zaxz0oq+San4A+wzkpNFlU7Y3OSPCvEQWWSXtNNmtdTnGm2f1z+ QGRZBRBKbNJ/v/nVRBtHghi49RE1ZD/5OGRjZQyPeU+vzmABbetsG5Tdv2xJdUj5ZSON hDQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=XyRqLIzc237XCfQYWkdhdtxQsXsqd91g1nj8S9xVJ0Q=; b=ghEqWUn3VxnNobMCD9/2M2n+68IUybsdJRLiy1uNFuV0MTJVtbwXf9d1wOfVRbmeLe 7cs2ynyfWh037fJWI9I10krxPLgBlcYOVgYrj4noWKlbWG1oz2XtQXFhAkL0XjOtbgWl YLAhfTHkHrgAfjCuZmcfZj4zt0Clj6/Y0kJSAJnpgtYmfq5HkD92zST55vdwojKxhJ5N epuIdh5rMafj4OFpwIh3GH/bUuru6LdCzejmJPeQUm/iiavB/XKDmBN/Z7HP4HUyQoBC MSMaUFwQd0Rjrtoom1FDLSJqrpDD76wQmkX3QC7DfsniJNnfdW3Ssm07yINVn8ghkbVi bvQQ== X-Gm-Message-State: AOAM530JAMauOaTkpJC9t3w8ehQ0OXCAri95UcpmyjKmmBbzwuvVmz1B 4Ex/hds9Hr7OUTsF1KAdO2yir3sfdC0= X-Google-Smtp-Source: ABdhPJx5H36OkSwcCIAWuRYPEx6hpdUqLr/Dc6mSfTMy/fYIQ1uRbi7lkHUDTYHkvWjHliao4VVX8HXijtg= X-Received: from oupton.c.googlers.com ([fda3:e722:ac3:cc00:2b:ff92:c0a8:404]) (user=oupton job=sendgmr) by 2002:a05:6e02:dcd:: with SMTP id l13mr1679282ilj.300.1627517424959; Wed, 28 Jul 2021 17:10:24 -0700 (PDT) Date: Thu, 29 Jul 2021 00:10:02 +0000 In-Reply-To: <20210729001012.70394-1-oupton@google.com> Message-Id: <20210729001012.70394-4-oupton@google.com> Mime-Version: 1.0 References: <20210729001012.70394-1-oupton@google.com> X-Mailer: git-send-email 2.32.0.432.gabb21c7263-goog Subject: [PATCH v4 03/13] KVM: x86: Expose TSC offset controls to userspace From: Oliver Upton To: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu Cc: Paolo Bonzini , Sean Christopherson , Marc Zyngier , Peter Shier , Jim Mattson , David Matlack , Ricardo Koller , Jing Zhang , Raghavendra Rao Anata , James Morse , Alexandru Elisei , Suzuki K Poulose , linux-arm-kernel@lists.infradead.org, Andrew Jones , Oliver Upton , Oliver Upton X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210728_171026_713468_3AA4CE50 X-CRM114-Status: GOOD ( 23.98 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org To date, VMM-directed TSC synchronization and migration has been a bit messy. KVM has some baked-in heuristics around TSC writes to infer if the VMM is attempting to synchronize. This is problematic, as it depends on host userspace writing to the guest's TSC within 1 second of the last write. A much cleaner approach to configuring the guest's views of the TSC is to simply migrate the TSC offset for every vCPU. Offsets are idempotent, and thus not subject to change depending on when the VMM actually reads/writes values from/to KVM. The VMM can then read the TSC once with KVM_GET_CLOCK to capture a (realtime, host_tsc) pair at the instant when the guest is paused. Cc: David Matlack Signed-off-by: Oliver Upton --- Documentation/virt/kvm/devices/vcpu.rst | 57 ++++++++ arch/x86/include/asm/kvm_host.h | 1 + arch/x86/include/uapi/asm/kvm.h | 4 + arch/x86/kvm/x86.c | 167 ++++++++++++++++++++++++ 4 files changed, 229 insertions(+) diff --git a/Documentation/virt/kvm/devices/vcpu.rst b/Documentation/virt/kvm/devices/vcpu.rst index 2acec3b9ef65..b46d5f742e69 100644 --- a/Documentation/virt/kvm/devices/vcpu.rst +++ b/Documentation/virt/kvm/devices/vcpu.rst @@ -161,3 +161,60 @@ Specifies the base address of the stolen time structure for this VCPU. The base address must be 64 byte aligned and exist within a valid guest memory region. See Documentation/virt/kvm/arm/pvtime.rst for more information including the layout of the stolen time structure. + +4. GROUP: KVM_VCPU_TSC_CTRL +=========================== + +:Architectures: x86 + +4.1 ATTRIBUTE: KVM_VCPU_TSC_OFFSET + +:Parameters: 64-bit unsigned TSC offset + +Returns: + + ======= ====================================== + -EFAULT Error reading/writing the provided + parameter address. + -ENXIO Attribute not supported + ======= ====================================== + +Specifies the guest's TSC offset relative to the host's TSC. The guest's +TSC is then derived by the following equation: + + guest_tsc = host_tsc + KVM_VCPU_TSC_OFFSET + +This attribute is useful for the precise migration of a guest's TSC. The +following describes a possible algorithm to use for the migration of a +guest's TSC: + +From the source VMM process: + +1. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (t_0), + kvmclock nanoseconds (k_0), and realtime nanoseconds (r_0). + +2. Read the KVM_VCPU_TSC_OFFSET attribute for every vCPU to record the + guest TSC offset (off_n). + +3. Invoke the KVM_GET_TSC_KHZ ioctl to record the frequency of the + guest's TSC (freq). + +From the destination VMM process: + +4. Invoke the KVM_SET_CLOCK ioctl, providing the kvmclock nanoseconds + (k_0) and realtime nanoseconds (r_0) in their respective fields. + Ensure that the KVM_CLOCK_REALTIME flag is set in the provided + structure. KVM will advance the VM's kvmclock to account for elapsed + time since recording the clock values. + +5. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (t_1) and + kvmclock nanoseconds (k_1). + +6. Adjust the guest TSC offsets for every vCPU to account for (1) time + elapsed since recording state and (2) difference in TSCs between the + source and destination machine: + + new_off_n = t_0 + off_n = (k_1 - k_0) * freq - t_1 + +7. Write the KVM_VCPU_TSC_OFFSET attribute for every vCPU with the + respective value derived in the previous step. diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 65a20c64f959..855698923dd0 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1070,6 +1070,7 @@ struct kvm_arch { u64 last_tsc_nsec; u64 last_tsc_write; u32 last_tsc_khz; + u64 last_tsc_offset; u64 cur_tsc_nsec; u64 cur_tsc_write; u64 cur_tsc_offset; diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h index a6c327f8ad9e..0b22e1e84e78 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -503,4 +503,8 @@ struct kvm_pmu_event_filter { #define KVM_PMU_EVENT_ALLOW 0 #define KVM_PMU_EVENT_DENY 1 +/* for KVM_{GET,SET,HAS}_DEVICE_ATTR */ +#define KVM_VCPU_TSC_CTRL 0 /* control group for the timestamp counter (TSC) */ +#define KVM_VCPU_TSC_OFFSET 0 /* attribute for the TSC offset */ + #endif /* _ASM_X86_KVM_H */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 27435a07fb46..17d87a8d0c75 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2413,6 +2413,11 @@ static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 l1_offset) static_call(kvm_x86_write_tsc_offset)(vcpu, vcpu->arch.tsc_offset); } +static u64 kvm_vcpu_read_tsc_offset(struct kvm_vcpu *vcpu) +{ + return vcpu->arch.l1_tsc_offset; +} + static void kvm_vcpu_write_tsc_multiplier(struct kvm_vcpu *vcpu, u64 l1_multiplier) { vcpu->arch.l1_tsc_scaling_ratio = l1_multiplier; @@ -2469,6 +2474,7 @@ static void __kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 offset, u64 tsc, kvm->arch.last_tsc_nsec = ns; kvm->arch.last_tsc_write = tsc; kvm->arch.last_tsc_khz = vcpu->arch.virtual_tsc_khz; + kvm->arch.last_tsc_offset = offset; vcpu->arch.last_guest_tsc = tsc; @@ -4928,6 +4934,137 @@ static int kvm_set_guest_paused(struct kvm_vcpu *vcpu) return 0; } +static int kvm_arch_tsc_has_attr(struct kvm_vcpu *vcpu, + struct kvm_device_attr *attr) +{ + int r; + + switch (attr->attr) { + case KVM_VCPU_TSC_OFFSET: + r = 0; + break; + default: + r = -ENXIO; + } + + return r; +} + +static int kvm_arch_tsc_get_attr(struct kvm_vcpu *vcpu, + struct kvm_device_attr *attr) +{ + void __user *uaddr = (void __user *)attr->addr; + int r; + + switch (attr->attr) { + case KVM_VCPU_TSC_OFFSET: { + u64 offset; + + offset = kvm_vcpu_read_tsc_offset(vcpu); + r = -EFAULT; + if (copy_to_user(uaddr, &offset, sizeof(offset))) + break; + + r = 0; + break; + } + default: + r = -ENXIO; + } + + return r; +} + +static int kvm_arch_tsc_set_attr(struct kvm_vcpu *vcpu, + struct kvm_device_attr *attr) +{ + void __user *uaddr = (void __user *)attr->addr; + struct kvm *kvm = vcpu->kvm; + int r; + + switch (attr->attr) { + case KVM_VCPU_TSC_OFFSET: { + u64 offset, tsc, ns; + unsigned long flags; + bool matched; + + r = -EFAULT; + if (copy_from_user(&offset, uaddr, sizeof(offset))) + break; + + raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags); + + matched = (vcpu->arch.virtual_tsc_khz && + kvm->arch.last_tsc_khz == vcpu->arch.virtual_tsc_khz && + kvm->arch.last_tsc_offset == offset); + + tsc = kvm_scale_tsc(vcpu, rdtsc(), vcpu->arch.l1_tsc_scaling_ratio) + offset; + ns = get_kvmclock_base_ns(); + + __kvm_synchronize_tsc(vcpu, offset, tsc, ns, matched); + raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags); + + r = 0; + break; + } + default: + r = -ENXIO; + } + + return r; +} + +static int kvm_vcpu_ioctl_has_device_attr(struct kvm_vcpu *vcpu, + struct kvm_device_attr *attr) +{ + int r; + + switch (attr->group) { + case KVM_VCPU_TSC_CTRL: + r = kvm_arch_tsc_has_attr(vcpu, attr); + break; + default: + r = -ENXIO; + break; + } + + return r; +} + +static int kvm_vcpu_ioctl_get_device_attr(struct kvm_vcpu *vcpu, + struct kvm_device_attr *attr) +{ + int r; + + switch (attr->group) { + case KVM_VCPU_TSC_CTRL: + r = kvm_arch_tsc_get_attr(vcpu, attr); + break; + default: + r = -ENXIO; + break; + } + + return r; +} + +static int kvm_vcpu_ioctl_set_device_attr(struct kvm_vcpu *vcpu, + struct kvm_device_attr *attr) +{ + int r; + + switch (attr->group) { + case KVM_VCPU_TSC_CTRL: + r = kvm_arch_tsc_set_attr(vcpu, attr); + break; + default: + r = -ENXIO; + break; + } + + return r; +} + static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu, struct kvm_enable_cap *cap) { @@ -5382,6 +5519,36 @@ long kvm_arch_vcpu_ioctl(struct file *filp, r = __set_sregs2(vcpu, u.sregs2); break; } + case KVM_HAS_DEVICE_ATTR: { + struct kvm_device_attr attr; + + r = -EFAULT; + if (copy_from_user(&attr, argp, sizeof(attr))) + goto out; + + r = kvm_vcpu_ioctl_has_device_attr(vcpu, &attr); + break; + } + case KVM_GET_DEVICE_ATTR: { + struct kvm_device_attr attr; + + r = -EFAULT; + if (copy_from_user(&attr, argp, sizeof(attr))) + goto out; + + r = kvm_vcpu_ioctl_get_device_attr(vcpu, &attr); + break; + } + case KVM_SET_DEVICE_ATTR: { + struct kvm_device_attr attr; + + r = -EFAULT; + if (copy_from_user(&attr, argp, sizeof(attr))) + goto out; + + r = kvm_vcpu_ioctl_set_device_attr(vcpu, &attr); + break; + } default: r = -EINVAL; }