From patchwork Mon Aug 16 00:11:25 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oliver Upton X-Patchwork-Id: 12437507 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_ADSP_CUSTOM_MED,DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EAD3AC4338F for ; Mon, 16 Aug 2021 00:15:55 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B70E2610FD for ; Mon, 16 Aug 2021 00:15:55 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org B70E2610FD Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:References: Mime-Version:Message-Id:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=SErpxV8tH1R6vNxzgON2Vx458sDqpGbecWWHu86dTfk=; b=23kHRyyH2gt8XRMkhS71BnOW0c afM4WZXHRoz/Bt4MeJAVmnMwxbEb0jn7352dC34SBcdep3+31Mghcm2/suQ90Hk+yw8I6jrkimhor mGGJKLWrQgi8ovKJ14/kZSaXGOFqmuSgvKLz4ebt0TmqNan2VRg/KY+LKE2w0uaafFykoMKv3Ma3Q fK25vTpycgWTrBpZU1E2cSs0qCWxoqkXjcL62MPnK98Ft242Co+MeCwJPLCvpUHYqaXTtJRiOgMhA cauNR/q2Aybp4lJlr6TrS/uC4FIdj6Dhs1beIqJAmivKUtaMXFwhfCmFYlArlwd0iToZFbJm7Wsm4 7Kv4nGLQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1mFQEI-00Fl9A-8S; Mon, 16 Aug 2021 00:11:50 +0000 Received: from mail-il1-x149.google.com ([2607:f8b0:4864:20::149]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1mFQE5-00Fl6a-6v for linux-arm-kernel@lists.infradead.org; Mon, 16 Aug 2021 00:11:38 +0000 Received: by mail-il1-x149.google.com with SMTP id a2-20020a9266020000b0290222005f354cso8726871ilc.4 for ; Sun, 15 Aug 2021 17:11:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=J0lJJJkW+NmZlpb6fpFofwzCF2N0YPn/fE7VW8sEN0E=; b=onMRm+c0TmbvvH91s8krImCv1CKEk4b274WYjrqgKk7gKLOUw9VrXJClkZ1DmSna/8 3tepivV3mzLKvagWxhpO+/WQdfkPceIPNKfBb2bGq51hsXJu/WQCIkjs874UB7dk0W9W oXAUvi79qPx6l8QqPRpDusbWhyXuCw7E1AogPxUA0fgJz83AKxK+/O+IR3aKYsxFeqZM JjNSmzkGS23X+RekEHj1AacWn1arDI/RxjwhBttOKfWKf9+zoiG8nYGaSZvEuGaZs74t Vs8UyftyhM2f38KfhifByPOjskfYoVIv781b+80yLp+3M+udeuw+Zi4uFZAtu/XHAmkx u6/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=J0lJJJkW+NmZlpb6fpFofwzCF2N0YPn/fE7VW8sEN0E=; b=CS6nXRjNUa6vXX0u5CM0bCsC1hMUebH0ZPXSYfWufXpE+mVYX03WxCNqbBgHtFwUZi Z/SqD16zV6iI/YtwEY9l1s7pdZuycHU6YJ21AZyVVzpx0mtmMeYWYhNLp7L8HRVgio+H xxj7G3yf8CHEQyJHkdTMbHVGAnj+sAHApAsTURKVisR1vous05QxNBBt6ZmiXmvqDXAT DRfElrSFGXjBUuoEVJBoV5DnqjQ5AkyFTwrm7gfhItgeDTnWg+VoPKJo0AILZowBJqnK KxVm/nk4Va2+upLK/pELUdO6v4ARLDMHUwWX1xW8hXeeTbR6ntKeTF8/Ju9YGUa4ej8j +j/g== X-Gm-Message-State: AOAM5302hQvIGmzV+RSrDGP71ahNo3GLXWI61OybNTxHcN85EnLxlVL8 TlikOui8S86LdM5IarublZ1S+foMKGg= X-Google-Smtp-Source: ABdhPJw09eGgNy3EXbjdqGFxxJgFJV5YrUuGjNKkF5zMC9eqOSxrLDgPkbJpMCJxluIG+Ci2XveMUkYVc70= X-Received: from oupton.c.googlers.com ([fda3:e722:ac3:cc00:2b:ff92:c0a8:404]) (user=oupton job=sendgmr) by 2002:a02:cd09:: with SMTP id g9mr12684753jaq.87.1629072695306; Sun, 15 Aug 2021 17:11:35 -0700 (PDT) Date: Mon, 16 Aug 2021 00:11:25 +0000 In-Reply-To: <20210816001130.3059564-1-oupton@google.com> Message-Id: <20210816001130.3059564-2-oupton@google.com> Mime-Version: 1.0 References: <20210816001130.3059564-1-oupton@google.com> X-Mailer: git-send-email 2.33.0.rc1.237.g0d66db33f3-goog Subject: [PATCH v7 1/6] KVM: x86: Fix potential race in KVM_GET_CLOCK From: Oliver Upton To: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu Cc: Paolo Bonzini , Sean Christopherson , Marc Zyngier , Peter Shier , Jim Mattson , David Matlack , Ricardo Koller , Jing Zhang , Raghavendra Rao Anata , James Morse , Alexandru Elisei , Suzuki K Poulose , linux-arm-kernel@lists.infradead.org, Andrew Jones , Will Deacon , Catalin Marinas , Oliver Upton X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210815_171137_296285_82F709A0 X-CRM114-Status: GOOD ( 15.43 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Sean noticed that KVM_GET_CLOCK was checking kvm_arch.use_master_clock outside of the pvclock sync lock. This is problematic, as the clock value written to the user may or may not actually correspond to a stable TSC. Fix the race by populating the entire kvm_clock_data structure behind the pvclock_gtod_sync_lock. Suggested-by: Sean Christopherson Signed-off-by: Oliver Upton --- arch/x86/kvm/x86.c | 39 ++++++++++++++++++++++++++++----------- 1 file changed, 28 insertions(+), 11 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index fdc0c18339fb..2f3929bd5f58 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2787,19 +2787,20 @@ static void kvm_update_masterclock(struct kvm *kvm) kvm_end_pvclock_update(kvm); } -u64 get_kvmclock_ns(struct kvm *kvm) +static void get_kvmclock(struct kvm *kvm, struct kvm_clock_data *data) { struct kvm_arch *ka = &kvm->arch; struct pvclock_vcpu_time_info hv_clock; unsigned long flags; - u64 ret; spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags); if (!ka->use_master_clock) { spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); - return get_kvmclock_base_ns() + ka->kvmclock_offset; + data->clock = get_kvmclock_base_ns() + ka->kvmclock_offset; + return; } + data->flags |= KVM_CLOCK_TSC_STABLE; hv_clock.tsc_timestamp = ka->master_cycle_now; hv_clock.system_time = ka->master_kernel_ns + ka->kvmclock_offset; spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); @@ -2811,13 +2812,26 @@ u64 get_kvmclock_ns(struct kvm *kvm) kvm_get_time_scale(NSEC_PER_SEC, __this_cpu_read(cpu_tsc_khz) * 1000LL, &hv_clock.tsc_shift, &hv_clock.tsc_to_system_mul); - ret = __pvclock_read_cycles(&hv_clock, rdtsc()); - } else - ret = get_kvmclock_base_ns() + ka->kvmclock_offset; + data->clock = __pvclock_read_cycles(&hv_clock, rdtsc()); + } else { + data->clock = get_kvmclock_base_ns() + ka->kvmclock_offset; + } put_cpu(); +} - return ret; +u64 get_kvmclock_ns(struct kvm *kvm) +{ + struct kvm_clock_data data; + + /* + * Zero flags as it's accessed RMW, leave everything else uninitialized + * as clock is always written and no other fields are consumed. + */ + data.flags = 0; + + get_kvmclock(kvm, &data); + return data.clock; } static void kvm_setup_pvclock_page(struct kvm_vcpu *v, @@ -6098,11 +6112,14 @@ long kvm_arch_vm_ioctl(struct file *filp, } case KVM_GET_CLOCK: { struct kvm_clock_data user_ns; - u64 now_ns; - now_ns = get_kvmclock_ns(kvm); - user_ns.clock = now_ns; - user_ns.flags = kvm->arch.use_master_clock ? KVM_CLOCK_TSC_STABLE : 0; + /* + * Zero flags as it is accessed RMW, leave everything else + * uninitialized as clock is always written and no other fields + * are consumed. + */ + user_ns.flags = 0; + get_kvmclock(kvm, &user_ns); memset(&user_ns.pad, 0, sizeof(user_ns.pad)); r = -EFAULT; From patchwork Mon Aug 16 00:11:26 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oliver Upton X-Patchwork-Id: 12437449 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_ADSP_CUSTOM_MED,DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 67E2DC4338F for ; Mon, 16 Aug 2021 00:14:38 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3A1D96137D for ; Mon, 16 Aug 2021 00:14:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 3A1D96137D Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:References: Mime-Version:Message-Id:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=FyIjTNku1XlarDw9K4F8Aj9nyi6pPW1GUrN/OWBmSj4=; b=d9O4Th/U2CjGbrz39c3YBWKDGj V+42zn8bfW5Wz6BCedfTBj8AUkbRTCzv9xRniTg3n7N5GkWiU/rHabYhwse1iiwNgJTNwHBaAum8n +C8cGJIUd9g4hWvQdZgEpr9gQa2eO37amYDwQiWYsSa2hcpN8l3NAGi3gJiPP+0/O1uzm2787e4Vh pTCT9rN6ybAzAfjegaos7HlLCjf7jSws+37JaE6Cj5q91CKoutk7+xnmUElKHJVvcUVbnfbjsPzXd cJdtk0ml1zfJHqbqGNARI6GYqtTmuk3tSb8KlgpvCW0hLSFZCxM+8RM+LDJAicGku44kK8MqL2uco VLe+Hdaw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1mFQES-00Fl9c-IM; Mon, 16 Aug 2021 00:12:00 +0000 Received: from mail-il1-x149.google.com ([2607:f8b0:4864:20::149]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1mFQE5-00Fl6p-OC for linux-arm-kernel@lists.infradead.org; Mon, 16 Aug 2021 00:11:39 +0000 Received: by mail-il1-x149.google.com with SMTP id l4-20020a9270040000b02901bb78581beaso8711064ilc.12 for ; Sun, 15 Aug 2021 17:11:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=JjAheCeU8OLcl85r3fMj4Tdvexc0vXT5B0Nj3krTsUc=; b=d6cpi6xW3Siu7V3Ocspqj9m35qtPwhy9VjX9IpuhYvzlBO9bm8dw1P5J7VBwrG/v2d Fupr4JhmaYS9+w6umx9EqHBP3LfnMtK1Kr/jxVbEcNDyy8oQGQ3no2t92dY3FKEEIXEa CfCckQags7sN+kVlG5Hdj57ck9GltxZaJKxUBfcwB427IahQVbWxenKXA58a5qIeC3DG 8MrhNb0gHlaZT3vFri2g7jpaq5huj5XKJlGzrhzAtTdOSINjh7hF5DnQT/nJ3MjbGZNe c04JMAGppd3t5hTbFCi+s+kwGbL+qIHiUT7IsFop3H7TGFWcKRZxC8p8YgqscoZog1mz mhbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=JjAheCeU8OLcl85r3fMj4Tdvexc0vXT5B0Nj3krTsUc=; b=oabUt1feXtcF5Wb2UY5TMhgc9UE5VEHao88VHZRjUnUQHpQn8FdwNY1GlDGbNkbOJF wh36jrjXE1vcvg7OUaq5iigF3+TR+uxhw68DUtZgVd5LBLBHNe0Nu4rYMffmM1zw7Y4a AKExyrugzXN4seWdG/tteYEbpcaYhakxJiImSlIHZ9DG+UDTwQGNu/qhyJC+DhfYXYcH 3kB6qRsE9Sy45iI20f8dvf1JM0PTZ59vijrXZfqn9kgTgrQTVchCLUR1Rz/uyDaWjAM9 g0m/xd963Rg+bgvq3cL9sSd70Y/1HGER/T/moGNOS3fnBtJI8ZVvaLsQH5totcD91pju ODgw== X-Gm-Message-State: AOAM530mo/JTHseTM1MSL/hIiylaXGgX/NfGGLHGzee4SLXmzZ3hpB0h A+B3n88ME0mbOhplh7VvW/SZlbrX3x0= X-Google-Smtp-Source: ABdhPJzql/ws9nv1W9hhmm8kA+C+wyM8/Yyi52JOtJkxfAmPdz1NtGSbK43RlGb0QaSB6Cb2d7x8Jf49cT8= X-Received: from oupton.c.googlers.com ([fda3:e722:ac3:cc00:2b:ff92:c0a8:404]) (user=oupton job=sendgmr) by 2002:a02:cca8:: with SMTP id t8mr13047843jap.51.1629072696400; Sun, 15 Aug 2021 17:11:36 -0700 (PDT) Date: Mon, 16 Aug 2021 00:11:26 +0000 In-Reply-To: <20210816001130.3059564-1-oupton@google.com> Message-Id: <20210816001130.3059564-3-oupton@google.com> Mime-Version: 1.0 References: <20210816001130.3059564-1-oupton@google.com> X-Mailer: git-send-email 2.33.0.rc1.237.g0d66db33f3-goog Subject: [PATCH v7 2/6] KVM: x86: Create helper methods for KVM_{GET,SET}_CLOCK ioctls From: Oliver Upton To: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu Cc: Paolo Bonzini , Sean Christopherson , Marc Zyngier , Peter Shier , Jim Mattson , David Matlack , Ricardo Koller , Jing Zhang , Raghavendra Rao Anata , James Morse , Alexandru Elisei , Suzuki K Poulose , linux-arm-kernel@lists.infradead.org, Andrew Jones , Will Deacon , Catalin Marinas , Oliver Upton X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210815_171137_832431_BECF8B84 X-CRM114-Status: GOOD ( 17.78 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Wrap the existing implementation of the KVM_{GET,SET}_CLOCK ioctls in helper methods. No functional change intended. Signed-off-by: Oliver Upton --- arch/x86/kvm/x86.c | 107 ++++++++++++++++++++++++--------------------- 1 file changed, 57 insertions(+), 50 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 2f3929bd5f58..39eaa2fb2001 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5833,12 +5833,65 @@ int kvm_arch_pm_notifier(struct kvm *kvm, unsigned long state) } #endif /* CONFIG_HAVE_KVM_PM_NOTIFIER */ +static int kvm_vm_ioctl_get_clock(struct kvm *kvm, void __user *argp) +{ + struct kvm_clock_data data; + + /* + * Zero flags as it is accessed RMW, leave everything else + * uninitialized as clock is always written and no other fields + * are consumed. + */ + data.flags = 0; + get_kvmclock(kvm, &data); + memset(&data.pad, 0, sizeof(data.pad)); + + if (copy_to_user(argp, &data, sizeof(data))) + return -EFAULT; + + return 0; +} + +static int kvm_vm_ioctl_set_clock(struct kvm *kvm, void __user *argp) +{ + struct kvm_arch *ka = &kvm->arch; + struct kvm_clock_data data; + u64 now_ns; + + if (copy_from_user(&data, argp, sizeof(data))) + return -EFAULT; + + if (data.flags) + return -EINVAL; + + kvm_hv_invalidate_tsc_page(kvm); + kvm_start_pvclock_update(kvm); + pvclock_update_vm_gtod_copy(kvm); + + /* + * This pairs with kvm_guest_time_update(): when masterclock is + * in use, we use master_kernel_ns + kvmclock_offset to set + * unsigned 'system_time' so if we use get_kvmclock_ns() (which + * is slightly ahead) here we risk going negative on unsigned + * 'system_time' when 'data.clock' is very small. + */ + if (kvm->arch.use_master_clock) + now_ns = ka->master_kernel_ns; + else + now_ns = get_kvmclock_base_ns(); + ka->kvmclock_offset = data.clock - now_ns; + kvm_end_pvclock_update(kvm); + + return 0; +} + long kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { struct kvm *kvm = filp->private_data; void __user *argp = (void __user *)arg; int r = -ENOTTY; + /* * This union makes it completely explicit to gcc-3.x * that these two variables' stack usage should be @@ -6076,58 +6129,12 @@ long kvm_arch_vm_ioctl(struct file *filp, break; } #endif - case KVM_SET_CLOCK: { - struct kvm_arch *ka = &kvm->arch; - struct kvm_clock_data user_ns; - u64 now_ns; - - r = -EFAULT; - if (copy_from_user(&user_ns, argp, sizeof(user_ns))) - goto out; - - r = -EINVAL; - if (user_ns.flags) - goto out; - - r = 0; - - kvm_hv_invalidate_tsc_page(kvm); - kvm_start_pvclock_update(kvm); - pvclock_update_vm_gtod_copy(kvm); - - /* - * This pairs with kvm_guest_time_update(): when masterclock is - * in use, we use master_kernel_ns + kvmclock_offset to set - * unsigned 'system_time' so if we use get_kvmclock_ns() (which - * is slightly ahead) here we risk going negative on unsigned - * 'system_time' when 'user_ns.clock' is very small. - */ - if (kvm->arch.use_master_clock) - now_ns = ka->master_kernel_ns; - else - now_ns = get_kvmclock_base_ns(); - ka->kvmclock_offset = user_ns.clock - now_ns; - kvm_end_pvclock_update(kvm); + case KVM_SET_CLOCK: + r = kvm_vm_ioctl_set_clock(kvm, argp); break; - } - case KVM_GET_CLOCK: { - struct kvm_clock_data user_ns; - - /* - * Zero flags as it is accessed RMW, leave everything else - * uninitialized as clock is always written and no other fields - * are consumed. - */ - user_ns.flags = 0; - get_kvmclock(kvm, &user_ns); - memset(&user_ns.pad, 0, sizeof(user_ns.pad)); - - r = -EFAULT; - if (copy_to_user(argp, &user_ns, sizeof(user_ns))) - goto out; - r = 0; + case KVM_GET_CLOCK: + r = kvm_vm_ioctl_get_clock(kvm, argp); break; - } case KVM_MEMORY_ENCRYPT_OP: { r = -ENOTTY; if (kvm_x86_ops.mem_enc_op) From patchwork Mon Aug 16 00:11:27 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oliver Upton X-Patchwork-Id: 12437445 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_ADSP_CUSTOM_MED,DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 42C21C4338F for ; Mon, 16 Aug 2021 00:14:29 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 094056138F for ; Mon, 16 Aug 2021 00:14:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 094056138F Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:References: Mime-Version:Message-Id:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=TALmN8emIb4kf/65kV5Fg5Yt3JvbPC40StKBiUsRbdo=; b=QD1+a1b574qBsArMYUaCCtRpb/ fQQzVUgZnWfNP3T0yH+GFIOBw721VuREsOh/mIIajBpDu+cdmTybBStkEyKI4vtjKT1S3ki4aV6Nr /XFt10djYMfw6UkTvAyTxZI4OWwPmshfFagoEYXUEVuD/Af5AGLNtVLfB4xSE4UscsCmveoaLToaS HWwCVBq0eidnLWOWdNJRPQ51CFGdklkq0p4BYNu+xqzbWIyCi9lKzp9VY48g6GVwoYW0vJ1Y2HQTS BK6Pvq9cz/ymAZBHGRs1sSjYHihNb4VC3rKlVaCETNQ/EKVvFWSb95nCKT202LRv+RQf7y0rOXbin uUyR6iUQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1mFQEc-00FlA6-Rv; Mon, 16 Aug 2021 00:12:11 +0000 Received: from mail-yb1-xb49.google.com ([2607:f8b0:4864:20::b49]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1mFQE7-00Fl7P-CN for linux-arm-kernel@lists.infradead.org; Mon, 16 Aug 2021 00:11:41 +0000 Received: by mail-yb1-xb49.google.com with SMTP id o3-20020a2541030000b0290557cf3415f8so15201111yba.1 for ; Sun, 15 Aug 2021 17:11:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=fOqzwvgotEAk4OwdvYZ5BEZVM02Y3lbMzgT+LwM/G40=; b=gTZB/ZB/2P56kjMc1lKSCKMylc+9raK1aQ56bRXEHO3bpP/HN84Z2Z7FPl0iTMF2t0 3eYmTEPW2KF4yFED9pzDdLF6LaqF/eeuCylPGwe/vhiaVb8hiCPZSoDq1lgtw/NeP7yS CSMDPoQkJ3ZsB1IA+jgpZv4Y0ADwHzHhGyHxRWhwDTANC3C3iM37jl7shZi414dwxyBL oKe01LTg4IyRAqwhEhEye2DqNrU/rofkIMgqsV1rZR3pRqK3eD/3a1Ub1VQro+dytRhc BRGWJYN5CK3oeH2kj/UxJHm/9pi0im1RVruUevIWBgbRQb0dq/F9OKW0cxb2T1ghkiPL Pwyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=fOqzwvgotEAk4OwdvYZ5BEZVM02Y3lbMzgT+LwM/G40=; b=kVeSyr0dzMcWwt/4ecYclWPYP211wu0/ZEpD0rMZsQnXqf4SoDEa4cGdrMLv3U6yQJ JLJ+S7U04uHtCyOSbRKvp7nH4Zf8vnHhWEljBlPdRmMRlPWMiyj+vMclx+bEtVNZKSmF QNuIGBlEO+m0r8E7YAfc4dMFrJlZqYYIhFAS8+6zZHlHVrEfLwsJHpXU6lk3aLACfg1v l+0qnDKerxAqal1Vo8h7GNAeIYfSjWBNtxEMieTaxNJgVoQjCP/MQSHtNNjH62ZnVyXU 7lu2AMcP7renQ0Y993Li77bkGaluvdjBZxzvGCHwRAKAIIiSMG9KvoWcZS3riIcBPmOn 9KKQ== X-Gm-Message-State: AOAM533KLOx1tGPwKYsHmc3OaTNS7wboXt0gXUe8JIvdZq92owam0Duk +d6y6/GYU+GFghReGYenY4D/gLwxEE4= X-Google-Smtp-Source: ABdhPJxATQFnfStab+dOOYxVKWby13GU3DJ5ljaK+HxeHk2KhPLVTDc5Fu1HrzQZMUMtpIj5HWzGmsC1BrY= X-Received: from oupton.c.googlers.com ([fda3:e722:ac3:cc00:2b:ff92:c0a8:404]) (user=oupton job=sendgmr) by 2002:a25:7405:: with SMTP id p5mr18675318ybc.94.1629072697457; Sun, 15 Aug 2021 17:11:37 -0700 (PDT) Date: Mon, 16 Aug 2021 00:11:27 +0000 In-Reply-To: <20210816001130.3059564-1-oupton@google.com> Message-Id: <20210816001130.3059564-4-oupton@google.com> Mime-Version: 1.0 References: <20210816001130.3059564-1-oupton@google.com> X-Mailer: git-send-email 2.33.0.rc1.237.g0d66db33f3-goog Subject: [PATCH v7 3/6] KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK From: Oliver Upton To: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu Cc: Paolo Bonzini , Sean Christopherson , Marc Zyngier , Peter Shier , Jim Mattson , David Matlack , Ricardo Koller , Jing Zhang , Raghavendra Rao Anata , James Morse , Alexandru Elisei , Suzuki K Poulose , linux-arm-kernel@lists.infradead.org, Andrew Jones , Will Deacon , Catalin Marinas , Oliver Upton X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210815_171139_476581_443876E2 X-CRM114-Status: GOOD ( 23.21 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Handling the migration of TSCs correctly is difficult, in part because Linux does not provide userspace with the ability to retrieve a (TSC, realtime) clock pair for a single instant in time. In lieu of a more convenient facility, KVM can report similar information in the kvm_clock structure. Provide userspace with a host TSC & realtime pair iff the realtime clock is based on the TSC. If userspace provides KVM_SET_CLOCK with a valid realtime value, advance the KVM clock by the amount of elapsed time. Do not step the KVM clock backwards, though, as it is a monotonic oscillator. Suggested-by: Paolo Bonzini Signed-off-by: Oliver Upton --- Documentation/virt/kvm/api.rst | 42 ++++++++++++++++++++++++++------- arch/x86/include/asm/kvm_host.h | 3 +++ arch/x86/kvm/x86.c | 34 ++++++++++++++++++-------- include/uapi/linux/kvm.h | 7 +++++- 4 files changed, 66 insertions(+), 20 deletions(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 86d7ad3a126c..b3d12bf9fbf5 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -993,20 +993,34 @@ such as migration. When KVM_CAP_ADJUST_CLOCK is passed to KVM_CHECK_EXTENSION, it returns the set of bits that KVM can return in struct kvm_clock_data's flag member. -The only flag defined now is KVM_CLOCK_TSC_STABLE. If set, the returned -value is the exact kvmclock value seen by all VCPUs at the instant -when KVM_GET_CLOCK was called. If clear, the returned value is simply -CLOCK_MONOTONIC plus a constant offset; the offset can be modified -with KVM_SET_CLOCK. KVM will try to make all VCPUs follow this clock, -but the exact value read by each VCPU could differ, because the host -TSC is not stable. +FLAGS: + +KVM_CLOCK_TSC_STABLE. If set, the returned value is the exact kvmclock +value seen by all VCPUs at the instant when KVM_GET_CLOCK was called. +If clear, the returned value is simply CLOCK_MONOTONIC plus a constant +offset; the offset can be modified with KVM_SET_CLOCK. KVM will try +to make all VCPUs follow this clock, but the exact value read by each +VCPU could differ, because the host TSC is not stable. + +KVM_CLOCK_REALTIME. If set, the `realtime` field in the kvm_clock_data +structure is populated with the value of the host's real time +clocksource at the instant when KVM_GET_CLOCK was called. If clear, +the `realtime` field does not contain a value. + +KVM_CLOCK_HOST_TSC. If set, the `host_tsc` field in the kvm_clock_data +structure is populated with the value of the host's timestamp counter (TSC) +at the instant when KVM_GET_CLOCK was called. If clear, the `host_tsc` field +does not contain a value. :: struct kvm_clock_data { __u64 clock; /* kvmclock current value */ __u32 flags; - __u32 pad[9]; + __u32 pad0; + __u64 realtime; + __u64 host_tsc; + __u32 pad[4]; }; @@ -1023,12 +1037,22 @@ Sets the current timestamp of kvmclock to the value specified in its parameter. In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on scenarios such as migration. +FLAGS: + +KVM_CLOCK_REALTIME. If set, KVM will compare the value of the `realtime` field +with the value of the host's real time clocksource at the instant when +KVM_SET_CLOCK was called. The difference in elapsed time is added to the final +kvmclock value that will be provided to guests. + :: struct kvm_clock_data { __u64 clock; /* kvmclock current value */ __u32 flags; - __u32 pad[9]; + __u32 pad0; + __u64 realtime; + __u64 host_tsc; + __u32 pad[4]; }; diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 20daaf67a5bf..7fad2615f4a9 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1916,4 +1916,7 @@ int kvm_cpu_dirty_log_size(void); int alloc_all_memslots_rmaps(struct kvm *kvm); +#define KVM_CLOCK_VALID_FLAGS \ + (KVM_CLOCK_TSC_STABLE | KVM_CLOCK_REALTIME | KVM_CLOCK_HOST_TSC) + #endif /* _ASM_X86_KVM_HOST_H */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 39eaa2fb2001..b1e9a4885be6 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2809,10 +2809,20 @@ static void get_kvmclock(struct kvm *kvm, struct kvm_clock_data *data) get_cpu(); if (__this_cpu_read(cpu_tsc_khz)) { +#ifdef CONFIG_X86_64 + struct timespec64 ts; + + if (kvm_get_walltime_and_clockread(&ts, &data->host_tsc)) { + data->realtime = ts.tv_nsec + NSEC_PER_SEC * ts.tv_sec; + data->flags |= KVM_CLOCK_REALTIME | KVM_CLOCK_HOST_TSC; + } else +#endif + data->host_tsc = rdtsc(); + kvm_get_time_scale(NSEC_PER_SEC, __this_cpu_read(cpu_tsc_khz) * 1000LL, &hv_clock.tsc_shift, &hv_clock.tsc_to_system_mul); - data->clock = __pvclock_read_cycles(&hv_clock, rdtsc()); + data->clock = __pvclock_read_cycles(&hv_clock, data->host_tsc); } else { data->clock = get_kvmclock_base_ns() + ka->kvmclock_offset; } @@ -4052,7 +4062,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) r = KVM_SYNC_X86_VALID_FIELDS; break; case KVM_CAP_ADJUST_CLOCK: - r = KVM_CLOCK_TSC_STABLE; + r = KVM_CLOCK_VALID_FLAGS; break; case KVM_CAP_X86_DISABLE_EXITS: r |= KVM_X86_DISABLE_EXITS_HLT | KVM_X86_DISABLE_EXITS_PAUSE | @@ -5837,14 +5847,8 @@ static int kvm_vm_ioctl_get_clock(struct kvm *kvm, void __user *argp) { struct kvm_clock_data data; - /* - * Zero flags as it is accessed RMW, leave everything else - * uninitialized as clock is always written and no other fields - * are consumed. - */ - data.flags = 0; + memset(&data, 0, sizeof(data)); get_kvmclock(kvm, &data); - memset(&data.pad, 0, sizeof(data.pad)); if (copy_to_user(argp, &data, sizeof(data))) return -EFAULT; @@ -5861,13 +5865,23 @@ static int kvm_vm_ioctl_set_clock(struct kvm *kvm, void __user *argp) if (copy_from_user(&data, argp, sizeof(data))) return -EFAULT; - if (data.flags) + if (data.flags & ~KVM_CLOCK_REALTIME) return -EINVAL; kvm_hv_invalidate_tsc_page(kvm); kvm_start_pvclock_update(kvm); pvclock_update_vm_gtod_copy(kvm); + if (data.flags & KVM_CLOCK_REALTIME) { + u64 now_real_ns = ktime_get_real_ns(); + + /* + * Avoid stepping the kvmclock backwards. + */ + if (now_real_ns > data.realtime) + data.clock += now_real_ns - data.realtime; + } + /* * This pairs with kvm_guest_time_update(): when masterclock is * in use, we use master_kernel_ns + kvmclock_offset to set diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index a067410ebea5..d228bf394465 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1223,11 +1223,16 @@ struct kvm_irqfd { /* Do not use 1, KVM_CHECK_EXTENSION returned it before we had flags. */ #define KVM_CLOCK_TSC_STABLE 2 +#define KVM_CLOCK_REALTIME (1 << 2) +#define KVM_CLOCK_HOST_TSC (1 << 3) struct kvm_clock_data { __u64 clock; __u32 flags; - __u32 pad[9]; + __u32 pad0; + __u64 realtime; + __u64 host_tsc; + __u32 pad[4]; }; /* For KVM_CAP_SW_TLB */ From patchwork Mon Aug 16 00:11:28 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oliver Upton X-Patchwork-Id: 12437447 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_ADSP_CUSTOM_MED,DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E371FC4338F for ; Mon, 16 Aug 2021 00:14:32 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id ABDEF61378 for ; Mon, 16 Aug 2021 00:14:32 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org ABDEF61378 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:References: Mime-Version:Message-Id:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=K81zBrPhkDB//xY1XNJup0bpW5kpQrgLJntXD/dv5Xc=; b=3aRSGxqx6yf00QY0xKFBSaIX0w dlhRnNUG8yG2wo0DxqemxmgodKEV6wxlLqoRghNdnHUDUCitQx/ZS95XL3bZG+oOsvOkSRu96sJfn Vw4x3TouWjFHMKcpnMAZnFEgok419NiQjNolIEQFVRM/3zMSC/M74h4DORJTtkcr2dUQpfvM/Zqnp 6TosbomW18E+cy4Vn2IF6cAbXWFZkzykwBFkPywkayzTFnlOvkqBkHQUeiCFo524/VgYlPpsi2NsH ZemUvyQ8PO5Grz7L6HvhYuV+iC6JsyyMkUsIGrQ0N/Tsa4LUBS7IW50Ng7nMmqN6Dv4B2eEttzi8i VrTzlehQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1mFQEn-00FlAP-An; Mon, 16 Aug 2021 00:12:21 +0000 Received: from mail-yb1-xb49.google.com ([2607:f8b0:4864:20::b49]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1mFQE7-00Fl7Y-VK for linux-arm-kernel@lists.infradead.org; Mon, 16 Aug 2021 00:11:41 +0000 Received: by mail-yb1-xb49.google.com with SMTP id a62-20020a254d410000b0290592f360b0ccso14951528ybb.14 for ; Sun, 15 Aug 2021 17:11:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=kyk+rbWBkERQd/V+m5Ky7yGxQv7VTu6C8mlmv2dRF6E=; b=eUHubHDxtjyylPDdc7bFedowojWI9zLrQ6psYgTsmP3GTPLnhTiyG3JkHUedJyRGuV 2eC7QcQGo03/1Q1arzvIjJknUW+aRLXuYxOvgHebRDZQ+7hdr2FgzaxHCH/W37K8ExbY 9uhaQUa0ugMkV3wHWqJYNohGT7pEbAs0kvTvoALG1Idd6riRwTK8ubVHttvqg6pDcmgY R1VdJ2k/NjLhzNLeWBJXngB3Qv5y1wbhHQFfsF/Ri1L5Yh2n58lKXFT3Y2CGEQAj+riK Fl5HPuvhjJMALXNobtZyVFmABCcoh04VnbuWQBWv7Lpu4R2JoEVIcnsrmG4ADhqIYrQA rhBQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=kyk+rbWBkERQd/V+m5Ky7yGxQv7VTu6C8mlmv2dRF6E=; b=hWiNlUrOhVkDbWPKc31oQPGsOr8I8FkMTErbKOHw0A2BXVgcuZxaBVFJUI2U4U+nCg YADUG48b2S9ktWR8jU7VGL+38cpyrxtQyqVpN02djeGYV5Wvbl2tmFMkhM6nyp1D1Mkh 2tzyW6F50b1XtoQ4uBsiW0by+W5QCvJ9eXeHQbM2GccJmDhiDSGYmAxYgACSo6dkKZ7k i/2MZ6DZ2v+RwEXgTfsPCscVTg3JNBlSVjgKbm3WrBhfW4VP7YgDe2xCEITk3OIB0ZON xO8EABpVQZW6Xx84hBMjugvPSG9tW5adWhcnx10xaAv7vHq9BLnfszl3YtD5V8HSjYVH aBFw== X-Gm-Message-State: AOAM532+J+7AyoOkXVrqUE0HxJqRrgioyMlcF1/AMiahlBvFHZOlYxCx XLes1z4IeoW58EGP7+36IsJTjm3B4X8= X-Google-Smtp-Source: ABdhPJyMH3Sq1NjbgLuSEifGRtyldSI+jrBmr/Nsj225QWnfzIp+Bgns1yDUBBrraJg58Tk3MfA7ngTcz3A= X-Received: from oupton.c.googlers.com ([fda3:e722:ac3:cc00:2b:ff92:c0a8:404]) (user=oupton job=sendgmr) by 2002:a25:7a04:: with SMTP id v4mr17335731ybc.261.1629072698596; Sun, 15 Aug 2021 17:11:38 -0700 (PDT) Date: Mon, 16 Aug 2021 00:11:28 +0000 In-Reply-To: <20210816001130.3059564-1-oupton@google.com> Message-Id: <20210816001130.3059564-5-oupton@google.com> Mime-Version: 1.0 References: <20210816001130.3059564-1-oupton@google.com> X-Mailer: git-send-email 2.33.0.rc1.237.g0d66db33f3-goog Subject: [PATCH v7 4/6] KVM: x86: Take the pvclock sync lock behind the tsc_write_lock From: Oliver Upton To: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu Cc: Paolo Bonzini , Sean Christopherson , Marc Zyngier , Peter Shier , Jim Mattson , David Matlack , Ricardo Koller , Jing Zhang , Raghavendra Rao Anata , James Morse , Alexandru Elisei , Suzuki K Poulose , linux-arm-kernel@lists.infradead.org, Andrew Jones , Will Deacon , Catalin Marinas , Oliver Upton X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210815_171140_096641_0F26A5D4 X-CRM114-Status: GOOD ( 12.35 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org A later change requires that the pvclock sync lock be taken while holding the tsc_write_lock. Change the locking in kvm_synchronize_tsc() to align with the requirement to isolate the locking change to its own commit. Cc: Sean Christopherson Signed-off-by: Oliver Upton --- Documentation/virt/kvm/locking.rst | 11 +++++++++++ arch/x86/kvm/x86.c | 2 +- 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst index 8138201efb09..0bf346adac2a 100644 --- a/Documentation/virt/kvm/locking.rst +++ b/Documentation/virt/kvm/locking.rst @@ -36,6 +36,9 @@ On x86: holding kvm->arch.mmu_lock (typically with ``read_lock``, otherwise there's no need to take kvm->arch.tdp_mmu_pages_lock at all). +- kvm->arch.tsc_write_lock is taken outside + kvm->arch.pvclock_gtod_sync_lock + Everything else is a leaf: no other lock is taken inside the critical sections. @@ -222,6 +225,14 @@ time it will be set using the Dirty tracking mechanism described above. :Comment: 'raw' because hardware enabling/disabling must be atomic /wrt migration. +:Name: kvm_arch::pvclock_gtod_sync_lock +:Type: raw_spinlock_t +:Arch: x86 +:Protects: kvm_arch::{cur_tsc_generation,cur_tsc_nsec,cur_tsc_write, + cur_tsc_offset,nr_vcpus_matched_tsc} +:Comment: 'raw' because updating the kvm master clock must not be + preempted. + :Name: kvm_arch::tsc_write_lock :Type: raw_spinlock :Arch: x86 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b1e9a4885be6..f1434cd388b9 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2533,7 +2533,6 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data) vcpu->arch.this_tsc_write = kvm->arch.cur_tsc_write; kvm_vcpu_write_tsc_offset(vcpu, offset); - raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags); spin_lock_irqsave(&kvm->arch.pvclock_gtod_sync_lock, flags); if (!matched) { @@ -2544,6 +2543,7 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data) kvm_track_tsc_matching(vcpu); spin_unlock_irqrestore(&kvm->arch.pvclock_gtod_sync_lock, flags); + raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags); } static inline void adjust_tsc_offset_guest(struct kvm_vcpu *vcpu, From patchwork Mon Aug 16 00:11:29 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oliver Upton X-Patchwork-Id: 12437451 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_ADSP_CUSTOM_MED,DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2D79EC4338F for ; Mon, 16 Aug 2021 00:15:02 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E3D9A6137D for ; Mon, 16 Aug 2021 00:15:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org E3D9A6137D Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:References: Mime-Version:Message-Id:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=v7pWGrPGWF5jSDX45s7cqH2QCqYcijm4GbkYfOqOisQ=; b=lBIszEBalwjPpzeHsHXsT/gfCF tgUR61Zpo+do0cg8L+hxrGU1cLiuMHc5ZIQQneY5A0HzfonXWM9JVtHxm5qST655R08fMhKvwwA71 qJ7vDVGjkpl7ws0GnpJBK0rgD9vFMYdB4A+nlNv45Zs4Ovy3Y4OtKWATv4wd3ocI4UBcbgcgEumNX aXtvF6CYsnxIDAHC7Y8/iLz87lGVJ2upVWPZ94LY3QCay3yarGH8kEbSrJ+de/MdksdygsiSC5UFQ hOb4A0rJXdtF6OE9L57UiqBIqA0zEMooTU3jMQZgKqsxugl4SdMlCC1/OancQBsvy0H0NX4EjcsBx +02Gn9+A==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1mFQF2-00FlFX-AL; Mon, 16 Aug 2021 00:12:36 +0000 Received: from mail-yb1-xb49.google.com ([2607:f8b0:4864:20::b49]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1mFQE9-00Fl7x-2Z for linux-arm-kernel@lists.infradead.org; Mon, 16 Aug 2021 00:11:42 +0000 Received: by mail-yb1-xb49.google.com with SMTP id i32-20020a25b2200000b02904ed415d9d84so15005824ybj.0 for ; Sun, 15 Aug 2021 17:11:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=Dz4OQO7fko2xnm4bF9gtGHyuF6byVH0iKDoKQ9+K0Hk=; b=nG02hDlOtRC4KMB166Vy/WGh+WM1WqWLqoL9SZhFphHrdt7+VGLAwmWCuohkTHG09W qutphgh99ADbQ04rhvA8LKDsAP+PLuBGZVfX6CsIlwTBHn6j1ge13dmrDOQc/smYQfij KPWPWRIVRlB3EriEo/xoctVVzJUObu/x7xovD9XeuNYhRk6uYKiZw+BV5pFtGdV9a4tc nvsgIvqHBuviNCQWPM8HO56W+tOa30KchZXhGEeUXYEC/ZDclh79E9PPgJx/+wcbJCHb /w6w1lECCO5Ja8sYwFNhywkGt3KmcdRiSDJcb6AlSJRaOymWy44wwnqT4ZcbZ+U1s16l nAag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=Dz4OQO7fko2xnm4bF9gtGHyuF6byVH0iKDoKQ9+K0Hk=; b=MKSCw5ZdWXnIL6JkC2sTjK/39lKvk6fgRdv5gBsusMuFQQTP7lk6kGU7agWGqIM1Ya FKXJGxA71kutPNey7V85XAho6IAD44qq5146q//B2U3vYR4oAPSZfC7zPgMyvgwYkGPs ZuQIh9Bdnyq+RElnfyiYv1tq9O9EOreHs91lLGbjVhOCN9ol7B3lSgMmx0baSHPLQ+mG YaiLC2C+lIURfcCebPk8iYqdhPHv7koUa0cUVVFxd5nnXGBQQjEl1nMgb0jeiNzvJmSg 9+/YyAjqqCmisk2hZaLWIHS1o9fNqAtXUuf0pkNwxbi2F09Hmu4L9ZNTULrISzTW+mDG V4Bw== X-Gm-Message-State: AOAM530By3dT5MWdrrADk9FyhT6nKeYCT3CBIIJRA/qkWZWBK8WcLOLH C6ppQZm693UG1Y1j1UAvr//X0OciEhc= X-Google-Smtp-Source: ABdhPJz9cl6CBqtVJqMeLgOBubeEV1ZIVpCy0Cv8/HD8GcR/AFOkA5Fnz9sYJeChQ3DsMEpVB1XuZCGCFoc= X-Received: from oupton.c.googlers.com ([fda3:e722:ac3:cc00:2b:ff92:c0a8:404]) (user=oupton job=sendgmr) by 2002:a25:41ce:: with SMTP id o197mr18562489yba.365.1629072699675; Sun, 15 Aug 2021 17:11:39 -0700 (PDT) Date: Mon, 16 Aug 2021 00:11:29 +0000 In-Reply-To: <20210816001130.3059564-1-oupton@google.com> Message-Id: <20210816001130.3059564-6-oupton@google.com> Mime-Version: 1.0 References: <20210816001130.3059564-1-oupton@google.com> X-Mailer: git-send-email 2.33.0.rc1.237.g0d66db33f3-goog Subject: [PATCH v7 5/6] KVM: x86: Refactor tsc synchronization code From: Oliver Upton To: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu Cc: Paolo Bonzini , Sean Christopherson , Marc Zyngier , Peter Shier , Jim Mattson , David Matlack , Ricardo Koller , Jing Zhang , Raghavendra Rao Anata , James Morse , Alexandru Elisei , Suzuki K Poulose , linux-arm-kernel@lists.infradead.org, Andrew Jones , Will Deacon , Catalin Marinas , Oliver Upton X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210815_171141_183279_696F1D7F X-CRM114-Status: GOOD ( 15.41 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Refactor kvm_synchronize_tsc to make a new function that allows callers to specify TSC parameters (offset, value, nanoseconds, etc.) explicitly for the sake of participating in TSC synchronization. Signed-off-by: Oliver Upton --- arch/x86/kvm/x86.c | 105 ++++++++++++++++++++++++++------------------- 1 file changed, 61 insertions(+), 44 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index f1434cd388b9..9d0445527dad 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2447,13 +2447,71 @@ static inline bool kvm_check_tsc_unstable(void) return check_tsc_unstable(); } +/* + * Infers attempts to synchronize the guest's tsc from host writes. Sets the + * offset for the vcpu and tracks the TSC matching generation that the vcpu + * participates in. + */ +static void __kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 offset, u64 tsc, + u64 ns, bool matched) +{ + struct kvm *kvm = vcpu->kvm; + bool already_matched; + + lockdep_assert_held(&kvm->arch.tsc_write_lock); + + already_matched = + (vcpu->arch.this_tsc_generation == kvm->arch.cur_tsc_generation); + + /* + * We track the most recent recorded KHZ, write and time to + * allow the matching interval to be extended at each write. + */ + kvm->arch.last_tsc_nsec = ns; + kvm->arch.last_tsc_write = tsc; + kvm->arch.last_tsc_khz = vcpu->arch.virtual_tsc_khz; + + vcpu->arch.last_guest_tsc = tsc; + + /* Keep track of which generation this VCPU has synchronized to */ + vcpu->arch.this_tsc_generation = kvm->arch.cur_tsc_generation; + vcpu->arch.this_tsc_nsec = kvm->arch.cur_tsc_nsec; + vcpu->arch.this_tsc_write = kvm->arch.cur_tsc_write; + + kvm_vcpu_write_tsc_offset(vcpu, offset); + + if (!matched) { + /* + * We split periods of matched TSC writes into generations. + * For each generation, we track the original measured + * nanosecond time, offset, and write, so if TSCs are in + * sync, we can match exact offset, and if not, we can match + * exact software computation in compute_guest_tsc() + * + * These values are tracked in kvm->arch.cur_xxx variables. + */ + kvm->arch.cur_tsc_generation++; + kvm->arch.cur_tsc_nsec = ns; + kvm->arch.cur_tsc_write = tsc; + kvm->arch.cur_tsc_offset = offset; + + spin_lock(&kvm->arch.pvclock_gtod_sync_lock); + kvm->arch.nr_vcpus_matched_tsc = 0; + } else if (!already_matched) { + spin_lock(&kvm->arch.pvclock_gtod_sync_lock); + kvm->arch.nr_vcpus_matched_tsc++; + } + + kvm_track_tsc_matching(vcpu); + spin_unlock(&kvm->arch.pvclock_gtod_sync_lock); +} + static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data) { struct kvm *kvm = vcpu->kvm; u64 offset, ns, elapsed; unsigned long flags; - bool matched; - bool already_matched; + bool matched = false; bool synchronizing = false; raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags); @@ -2499,50 +2557,9 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data) offset = kvm_compute_l1_tsc_offset(vcpu, data); } matched = true; - already_matched = (vcpu->arch.this_tsc_generation == kvm->arch.cur_tsc_generation); - } else { - /* - * We split periods of matched TSC writes into generations. - * For each generation, we track the original measured - * nanosecond time, offset, and write, so if TSCs are in - * sync, we can match exact offset, and if not, we can match - * exact software computation in compute_guest_tsc() - * - * These values are tracked in kvm->arch.cur_xxx variables. - */ - kvm->arch.cur_tsc_generation++; - kvm->arch.cur_tsc_nsec = ns; - kvm->arch.cur_tsc_write = data; - kvm->arch.cur_tsc_offset = offset; - matched = false; } - /* - * We also track th most recent recorded KHZ, write and time to - * allow the matching interval to be extended at each write. - */ - kvm->arch.last_tsc_nsec = ns; - kvm->arch.last_tsc_write = data; - kvm->arch.last_tsc_khz = vcpu->arch.virtual_tsc_khz; - - vcpu->arch.last_guest_tsc = data; - - /* Keep track of which generation this VCPU has synchronized to */ - vcpu->arch.this_tsc_generation = kvm->arch.cur_tsc_generation; - vcpu->arch.this_tsc_nsec = kvm->arch.cur_tsc_nsec; - vcpu->arch.this_tsc_write = kvm->arch.cur_tsc_write; - - kvm_vcpu_write_tsc_offset(vcpu, offset); - - spin_lock_irqsave(&kvm->arch.pvclock_gtod_sync_lock, flags); - if (!matched) { - kvm->arch.nr_vcpus_matched_tsc = 0; - } else if (!already_matched) { - kvm->arch.nr_vcpus_matched_tsc++; - } - - kvm_track_tsc_matching(vcpu); - spin_unlock_irqrestore(&kvm->arch.pvclock_gtod_sync_lock, flags); + __kvm_synchronize_tsc(vcpu, offset, data, ns, matched); raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags); } From patchwork Mon Aug 16 00:11:30 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oliver Upton X-Patchwork-Id: 12437453 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_ADSP_CUSTOM_MED,DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A3FBBC4338F for ; Mon, 16 Aug 2021 00:15:17 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6B51E6137D for ; Mon, 16 Aug 2021 00:15:17 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 6B51E6137D Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:References: Mime-Version:Message-Id:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=hJOX0CMm2hXD5kD7zUgo/UEBwG84pcIGnS+vhlk7SzA=; b=SsyJjpFScLCB3IjGAb/iOx9iVz ZL9b1pvQYqh4z8pVS2uvRyW4eZZVz+YAERyS91ej0xMzlS5ukhcJ4jlNfMgWWaZnHLjRTxcfBp0ez RGobuJ8e7QN588Cb5vyWsu1wAIWCaDdw9tMzVSmMJmv20iXPk3DjkNzAV6o03fQWva4/JVdIl1QF7 YE6g0ejX6f9vgKLSdDFdykXXW+BC19GLbO23NpuAFYJxwIBe6YLGnEbF9DZua28HW4xQFThCuV6z/ gc3Nu56mjIuWwQlkDK58Z3+y7DMSqlwzbDDrwZPkFIeW+PbCZJ3lC+LTObjPvtKzKh/TQrjqdgJoi 2+DWBmKQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1mFQFY-00FlRt-Nv; Mon, 16 Aug 2021 00:13:09 +0000 Received: from mail-qv1-xf4a.google.com ([2607:f8b0:4864:20::f4a]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1mFQEA-00Fl8P-RJ for linux-arm-kernel@lists.infradead.org; Mon, 16 Aug 2021 00:11:47 +0000 Received: by mail-qv1-xf4a.google.com with SMTP id iw1-20020a0562140f2100b0035f58985cecso4010076qvb.10 for ; Sun, 15 Aug 2021 17:11:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=J/3kqHaHqhmnKwrHoMPM0ZlNGfNvJm2eth5nUFdCMDA=; b=lWAe2bkLnP2RZHENEsb3Km2+TAYR519X/MWjvSccLXFB2GldsN1VyRPQ5mW0F2BYlw E9v7oknYR2V65GsyUntrQnqDKFtRBVHvGhVVUDjamUf3oiSkNNuze/bWX8zPVJVJzXNC hHHwv6p56PtA8BIjOvzL7w53kdQqVjJbYIugRTjWxNWXnk+r/vDs2rqGf1iZY9bqSh9C DeEN2q7ZSoOT3+VNV9HhGyx1woN8GWQ0Vfj07KcMpKumaafSVDj+JBsFmuQeHd/cj1eI 4vd3yGpSENidn2i2tzDtNxKbl97FMVgvxNyFkAVx2x4+p4OLafFcs1G9EAKuhyf6/H8X P+pA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=J/3kqHaHqhmnKwrHoMPM0ZlNGfNvJm2eth5nUFdCMDA=; b=ss7WfmpEWbNIsYi3MEEEHDuItH95BjZooigfcFZopVMG7XoyXyt+Q2xBYlBTH10wPh uaMwtO0gtZlCNo65mo362TQJzA2P7EjTX1kArA+QVTfliWYo20m7/HSXgz5nwXIPFdZs FmrpPH9H8IxBrghso8IG8g/C3Q+sQ8ouGa2msffCGwnuTNd3ME68kH+FxfO7I/mhYePe aiLZcKyYcxmX8uq2kch1sFkSGUNOmt0E/8ld7UMBwCbsfgTYQBS4ypK1i9NG0BA7vmNK +ytbcWpNdeWMV+eF3Z+VaoDmC18qcvG8w8mk0XRiI1UCefqnEuraCT9ahKZTY38hhTUJ zJSQ== X-Gm-Message-State: AOAM533IfrzJvgDpCjFZ2CW2upBdiPg0d73vm22dhtmQqh6bw/iJ99Bi Dv4WYsj20E5UtBOh62GjVVSxy7EQrzQ= X-Google-Smtp-Source: ABdhPJyRvEL0jJdTP2zROUoA9tF1O2ZqfbMeyOeNxHBxzx4l3GZm/jafv62X7H59aGC5dQa5XzTEAGZtmbo= X-Received: from oupton.c.googlers.com ([fda3:e722:ac3:cc00:2b:ff92:c0a8:404]) (user=oupton job=sendgmr) by 2002:a05:6214:2465:: with SMTP id im5mr13584587qvb.46.1629072700634; Sun, 15 Aug 2021 17:11:40 -0700 (PDT) Date: Mon, 16 Aug 2021 00:11:30 +0000 In-Reply-To: <20210816001130.3059564-1-oupton@google.com> Message-Id: <20210816001130.3059564-7-oupton@google.com> Mime-Version: 1.0 References: <20210816001130.3059564-1-oupton@google.com> X-Mailer: git-send-email 2.33.0.rc1.237.g0d66db33f3-goog Subject: [PATCH v7 6/6] KVM: x86: Expose TSC offset controls to userspace From: Oliver Upton To: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu Cc: Paolo Bonzini , Sean Christopherson , Marc Zyngier , Peter Shier , Jim Mattson , David Matlack , Ricardo Koller , Jing Zhang , Raghavendra Rao Anata , James Morse , Alexandru Elisei , Suzuki K Poulose , linux-arm-kernel@lists.infradead.org, Andrew Jones , Will Deacon , Catalin Marinas , Oliver Upton X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210815_171142_939933_26B07B0A X-CRM114-Status: GOOD ( 24.32 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org To date, VMM-directed TSC synchronization and migration has been a bit messy. KVM has some baked-in heuristics around TSC writes to infer if the VMM is attempting to synchronize. This is problematic, as it depends on host userspace writing to the guest's TSC within 1 second of the last write. A much cleaner approach to configuring the guest's views of the TSC is to simply migrate the TSC offset for every vCPU. Offsets are idempotent, and thus not subject to change depending on when the VMM actually reads/writes values from/to KVM. The VMM can then read the TSC once with KVM_GET_CLOCK to capture a (realtime, host_tsc) pair at the instant when the guest is paused. Cc: David Matlack Cc: Sean Christopherson Signed-off-by: Oliver Upton --- Documentation/virt/kvm/devices/vcpu.rst | 57 +++++++++++++ arch/x86/include/asm/kvm_host.h | 1 + arch/x86/include/uapi/asm/kvm.h | 4 + arch/x86/kvm/x86.c | 109 ++++++++++++++++++++++++ 4 files changed, 171 insertions(+) diff --git a/Documentation/virt/kvm/devices/vcpu.rst b/Documentation/virt/kvm/devices/vcpu.rst index 2acec3b9ef65..3b399d727c11 100644 --- a/Documentation/virt/kvm/devices/vcpu.rst +++ b/Documentation/virt/kvm/devices/vcpu.rst @@ -161,3 +161,60 @@ Specifies the base address of the stolen time structure for this VCPU. The base address must be 64 byte aligned and exist within a valid guest memory region. See Documentation/virt/kvm/arm/pvtime.rst for more information including the layout of the stolen time structure. + +4. GROUP: KVM_VCPU_TSC_CTRL +=========================== + +:Architectures: x86 + +4.1 ATTRIBUTE: KVM_VCPU_TSC_OFFSET + +:Parameters: 64-bit unsigned TSC offset + +Returns: + + ======= ====================================== + -EFAULT Error reading/writing the provided + parameter address. + -ENXIO Attribute not supported + ======= ====================================== + +Specifies the guest's TSC offset relative to the host's TSC. The guest's +TSC is then derived by the following equation: + + guest_tsc = host_tsc + KVM_VCPU_TSC_OFFSET + +This attribute is useful for the precise migration of a guest's TSC. The +following describes a possible algorithm to use for the migration of a +guest's TSC: + +From the source VMM process: + +1. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (t_0), + kvmclock nanoseconds (k_0), and realtime nanoseconds (r_0). + +2. Read the KVM_VCPU_TSC_OFFSET attribute for every vCPU to record the + guest TSC offset (off_n). + +3. Invoke the KVM_GET_TSC_KHZ ioctl to record the frequency of the + guest's TSC (freq). + +From the destination VMM process: + +4. Invoke the KVM_SET_CLOCK ioctl, providing the kvmclock nanoseconds + (k_0) and realtime nanoseconds (r_0) in their respective fields. + Ensure that the KVM_CLOCK_REALTIME flag is set in the provided + structure. KVM will advance the VM's kvmclock to account for elapsed + time since recording the clock values. + +5. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (t_1) and + kvmclock nanoseconds (k_1). + +6. Adjust the guest TSC offsets for every vCPU to account for (1) time + elapsed since recording state and (2) difference in TSCs between the + source and destination machine: + + new_off_n = t_0 + off_n + (k_1 - k_0) * freq - t_1 + +7. Write the KVM_VCPU_TSC_OFFSET attribute for every vCPU with the + respective value derived in the previous step. diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 7fad2615f4a9..376b26a294c9 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1071,6 +1071,7 @@ struct kvm_arch { u64 last_tsc_nsec; u64 last_tsc_write; u32 last_tsc_khz; + u64 last_tsc_offset; u64 cur_tsc_nsec; u64 cur_tsc_write; u64 cur_tsc_offset; diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h index a6c327f8ad9e..0b22e1e84e78 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -503,4 +503,8 @@ struct kvm_pmu_event_filter { #define KVM_PMU_EVENT_ALLOW 0 #define KVM_PMU_EVENT_DENY 1 +/* for KVM_{GET,SET,HAS}_DEVICE_ATTR */ +#define KVM_VCPU_TSC_CTRL 0 /* control group for the timestamp counter (TSC) */ +#define KVM_VCPU_TSC_OFFSET 0 /* attribute for the TSC offset */ + #endif /* _ASM_X86_KVM_H */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 9d0445527dad..0b1398d439c0 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2470,6 +2470,7 @@ static void __kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 offset, u64 tsc, kvm->arch.last_tsc_nsec = ns; kvm->arch.last_tsc_write = tsc; kvm->arch.last_tsc_khz = vcpu->arch.virtual_tsc_khz; + kvm->arch.last_tsc_offset = offset; vcpu->arch.last_guest_tsc = tsc; @@ -4923,6 +4924,109 @@ static int kvm_set_guest_paused(struct kvm_vcpu *vcpu) return 0; } +static int kvm_arch_tsc_has_attr(struct kvm_vcpu *vcpu, + struct kvm_device_attr *attr) +{ + int r; + + switch (attr->attr) { + case KVM_VCPU_TSC_OFFSET: + r = 0; + break; + default: + r = -ENXIO; + } + + return r; +} + +static int kvm_arch_tsc_get_attr(struct kvm_vcpu *vcpu, + struct kvm_device_attr *attr) +{ + u64 __user *uaddr = (u64 __user *)attr->addr; + int r; + + switch (attr->attr) { + case KVM_VCPU_TSC_OFFSET: + r = -EFAULT; + if (put_user(vcpu->arch.l1_tsc_offset, uaddr)) + break; + r = 0; + break; + default: + r = -ENXIO; + } + + return r; +} + +static int kvm_arch_tsc_set_attr(struct kvm_vcpu *vcpu, + struct kvm_device_attr *attr) +{ + u64 __user *uaddr = (u64 __user *)attr->addr; + struct kvm *kvm = vcpu->kvm; + int r; + + switch (attr->attr) { + case KVM_VCPU_TSC_OFFSET: { + u64 offset, tsc, ns; + unsigned long flags; + bool matched; + + r = -EFAULT; + if (get_user(offset, uaddr)) + break; + + raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags); + + matched = (vcpu->arch.virtual_tsc_khz && + kvm->arch.last_tsc_khz == vcpu->arch.virtual_tsc_khz && + kvm->arch.last_tsc_offset == offset); + + tsc = kvm_scale_tsc(vcpu, rdtsc(), vcpu->arch.l1_tsc_scaling_ratio) + offset; + ns = get_kvmclock_base_ns(); + + __kvm_synchronize_tsc(vcpu, offset, tsc, ns, matched); + raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags); + + r = 0; + break; + } + default: + r = -ENXIO; + } + + return r; +} + +static int kvm_vcpu_ioctl_device_attr(struct kvm_vcpu *vcpu, + unsigned int ioctl, + void __user *argp) +{ + struct kvm_device_attr attr; + int r; + + if (copy_from_user(&attr, argp, sizeof(attr))) + return -EFAULT; + + if (attr.group != KVM_VCPU_TSC_CTRL) + return -ENXIO; + + switch (ioctl) { + case KVM_HAS_DEVICE_ATTR: + r = kvm_arch_tsc_has_attr(vcpu, &attr); + break; + case KVM_GET_DEVICE_ATTR: + r = kvm_arch_tsc_get_attr(vcpu, &attr); + break; + case KVM_SET_DEVICE_ATTR: + r = kvm_arch_tsc_set_attr(vcpu, &attr); + break; + } + + return r; +} + static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu, struct kvm_enable_cap *cap) { @@ -5377,6 +5481,11 @@ long kvm_arch_vcpu_ioctl(struct file *filp, r = __set_sregs2(vcpu, u.sregs2); break; } + case KVM_HAS_DEVICE_ATTR: + case KVM_GET_DEVICE_ATTR: + case KVM_SET_DEVICE_ATTR: + r = kvm_vcpu_ioctl_device_attr(vcpu, ioctl, argp); + break; default: r = -EINVAL; }