From patchwork Tue Aug 2 12:54:20 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nadav Har'El X-Patchwork-Id: 1029382 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by demeter1.kernel.org (8.14.4/8.14.4) with ESMTP id p72CsDtl002176 for ; Tue, 2 Aug 2011 12:56:50 GMT Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754402Ab1HBMyl (ORCPT ); Tue, 2 Aug 2011 08:54:41 -0400 Received: from mtagate7.uk.ibm.com ([194.196.100.167]:48436 "EHLO mtagate7.uk.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754004Ab1HBMyk (ORCPT ); Tue, 2 Aug 2011 08:54:40 -0400 Received: from d06nrmr1307.portsmouth.uk.ibm.com (d06nrmr1307.portsmouth.uk.ibm.com [9.149.38.129]) by mtagate7.uk.ibm.com (8.13.1/8.13.1) with ESMTP id p72CsPZP030513 for ; Tue, 2 Aug 2011 12:54:25 GMT Received: from d06av07.portsmouth.uk.ibm.com (d06av07.portsmouth.uk.ibm.com [9.149.37.248]) by d06nrmr1307.portsmouth.uk.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id p72CsOCb1503440 for ; Tue, 2 Aug 2011 13:54:24 +0100 Received: from d06av07.portsmouth.uk.ibm.com (loopback [127.0.0.1]) by d06av07.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id p72CsNTh023053 for ; Tue, 2 Aug 2011 06:54:24 -0600 Received: from rice.haifa.ibm.com (rice.haifa.ibm.com [9.148.8.217]) by d06av07.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id p72CsMQP023030 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 2 Aug 2011 06:54:23 -0600 Received: from rice.haifa.ibm.com (lnx-nyh.haifa.ibm.com [127.0.0.1]) by rice.haifa.ibm.com (8.14.5/8.14.4) with ESMTP id p72CsLIm002132; Tue, 2 Aug 2011 15:54:22 +0300 Received: (from nyh@localhost) by rice.haifa.ibm.com (8.14.5/8.14.5/Submit) id p72CsKAL002130; Tue, 2 Aug 2011 15:54:20 +0300 Date: Tue, 2 Aug 2011 15:54:20 +0300 Message-Id: <201108021254.p72CsKAL002130@rice.haifa.ibm.com> X-Authentication-Warning: rice.haifa.ibm.com: nyh set sender to "Nadav Har'El" using -f Cc: "Roedel, Joerg" , Zachary Amsden , Bandan Das , Marcelo Tosatti , avi@redhat.com To: kvm@vger.kernel.org From: "Nadav Har'El" References: <1312289591-nyh@il.ibm.com> Subject: [PATCH 1/3] L1 TSC handling Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.2.6 (demeter1.kernel.org [140.211.167.41]); Tue, 02 Aug 2011 12:57:19 +0000 (UTC) KVM assumed in several places that reading the TSC MSR returns the value for L1. This is incorrect, because when L2 is running, the correct TSC read exit emulation is to return L2's value. We therefore add a new x86_ops function, read_l1_tsc, to use in places that specifically need to read the L1 TSC, NOT the TSC of the current level of guest. Note that one change, of one line in kvm_arch_vcpu_load, is made redundant by a different patch sent by Zachary Amsden (and not yet applied): kvm_arch_vcpu_load() should not read the guest TSC, and if it didn't, of course we didn't have to change the call of kvm_get_msr() to read_l1_tsc(). Signed-off-by: Nadav Har'El --- arch/x86/include/asm/kvm_host.h | 2 ++ arch/x86/kvm/svm.c | 9 +++++++++ arch/x86/kvm/vmx.c | 17 +++++++++++++++++ arch/x86/kvm/x86.c | 8 ++++---- 4 files changed, 32 insertions(+), 4 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html --- .before/arch/x86/include/asm/kvm_host.h 2011-08-02 15:51:02.000000000 +0300 +++ .after/arch/x86/include/asm/kvm_host.h 2011-08-02 15:51:02.000000000 +0300 @@ -636,6 +636,8 @@ struct kvm_x86_ops { struct x86_instruction_info *info, enum x86_intercept_stage stage); + u64 (*read_l1_tsc)(struct kvm_vcpu *vcpu); + const struct trace_print_flags *exit_reasons_str; }; --- .before/arch/x86/kvm/vmx.c 2011-08-02 15:51:02.000000000 +0300 +++ .after/arch/x86/kvm/vmx.c 2011-08-02 15:51:02.000000000 +0300 @@ -1748,6 +1748,21 @@ static u64 guest_read_tsc(void) } /* + * Like guest_read_tsc, but always returns L1's notion of the timestamp + * counter, even if a nested guest (L2) is currently running. + */ +u64 vmx_read_l1_tsc(struct kvm_vcpu *vcpu) +{ + u64 host_tsc, tsc_offset; + + rdtscll(host_tsc); + tsc_offset = is_guest_mode(vcpu) ? + to_vmx(vcpu)->nested.vmcs01_tsc_offset : + vmcs_read64(TSC_OFFSET); + return host_tsc + tsc_offset; +} + +/* * Empty call-back. Needs to be implemented when VMX enables the SET_TSC_KHZ * ioctl. In this case the call-back should update internal vmx state to make * the changes effective. @@ -7059,6 +7074,8 @@ static struct kvm_x86_ops vmx_x86_ops = .set_tdp_cr3 = vmx_set_cr3, .check_intercept = vmx_check_intercept, + + .read_l1_tsc = vmx_read_l1_tsc, }; static int __init vmx_init(void) --- .before/arch/x86/kvm/svm.c 2011-08-02 15:51:02.000000000 +0300 +++ .after/arch/x86/kvm/svm.c 2011-08-02 15:51:02.000000000 +0300 @@ -2894,6 +2894,13 @@ static int cr8_write_interception(struct return 0; } +u64 svm_read_l1_tsc(struct kvm_vcpu *vcpu) +{ + struct vmcb *vmcb = get_host_vmcb(to_svm(vcpu)); + return vmcb->control.tsc_offset + + svm_scale_tsc(vcpu, native_read_tsc()); +} + static int svm_get_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 *data) { struct vcpu_svm *svm = to_svm(vcpu); @@ -4243,6 +4250,8 @@ static struct kvm_x86_ops svm_x86_ops = .set_tdp_cr3 = set_tdp_cr3, .check_intercept = svm_check_intercept, + + .read_l1_tsc = svm_read_l1_tsc, }; static int __init svm_init(void) --- .before/arch/x86/kvm/x86.c 2011-08-02 15:51:02.000000000 +0300 +++ .after/arch/x86/kvm/x86.c 2011-08-02 15:51:02.000000000 +0300 @@ -1098,7 +1098,7 @@ static int kvm_guest_time_update(struct /* Keep irq disabled to prevent changes to the clock */ local_irq_save(flags); - kvm_get_msr(v, MSR_IA32_TSC, &tsc_timestamp); + tsc_timestamp = kvm_x86_ops->read_l1_tsc(v); kernel_ns = get_kernel_ns(); this_tsc_khz = vcpu_tsc_khz(v); if (unlikely(this_tsc_khz == 0)) { @@ -2215,7 +2215,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu s64 tsc_delta; u64 tsc; - kvm_get_msr(vcpu, MSR_IA32_TSC, &tsc); + tsc = kvm_x86_ops->read_l1_tsc(vcpu); tsc_delta = !vcpu->arch.last_guest_tsc ? 0 : tsc - vcpu->arch.last_guest_tsc; @@ -2239,7 +2239,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu * { kvm_x86_ops->vcpu_put(vcpu); kvm_put_guest_fpu(vcpu); - kvm_get_msr(vcpu, MSR_IA32_TSC, &vcpu->arch.last_guest_tsc); + vcpu->arch.last_guest_tsc = kvm_x86_ops->read_l1_tsc(vcpu); } static int is_efer_nx(void) @@ -5722,7 +5722,7 @@ static int vcpu_enter_guest(struct kvm_v if (hw_breakpoint_active()) hw_breakpoint_restore(); - kvm_get_msr(vcpu, MSR_IA32_TSC, &vcpu->arch.last_guest_tsc); + vcpu->arch.last_guest_tsc = kvm_x86_ops->read_l1_tsc(vcpu); vcpu->mode = OUTSIDE_GUEST_MODE; smp_wmb();