From patchwork Fri Sep 18 19:51:09 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mark Langsdorf X-Patchwork-Id: 48624 Received: from vger.kernel.org (vger.kernel.org [209.132.176.167]) by demeter.kernel.org (8.14.2/8.14.2) with ESMTP id n8IJsUoC005374 for ; Fri, 18 Sep 2009 19:54:30 GMT Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753225AbZIRTxc (ORCPT ); Fri, 18 Sep 2009 15:53:32 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752422AbZIRTxc (ORCPT ); Fri, 18 Sep 2009 15:53:32 -0400 Received: from va3ehsobe005.messaging.microsoft.com ([216.32.180.15]:15165 "EHLO VA3EHSOBE006.bigfish.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752223AbZIRTxb (ORCPT ); Fri, 18 Sep 2009 15:53:31 -0400 Received: from mail180-va3-R.bigfish.com (10.7.14.239) by VA3EHSOBE006.bigfish.com (10.7.40.26) with Microsoft SMTP Server id 8.1.340.0; Fri, 18 Sep 2009 19:53:33 +0000 Received: from mail180-va3 (localhost.localdomain [127.0.0.1]) by mail180-va3-R.bigfish.com (Postfix) with ESMTP id 8D3B8EB0211; Fri, 18 Sep 2009 19:53:33 +0000 (UTC) X-SpamScore: -2 X-BigFish: VPS-2(zzef8Kzz1202hzzz32i6bh203h67h) X-Spam-TCS-SCL: 6:0 X-FB-SS: 5, Received: by mail180-va3 (MessageSwitch) id 1253303611992344_29141; Fri, 18 Sep 2009 19:53:31 +0000 (UCT) Received: from ausb3extmailp02.amd.com (ausb3extmailp02.amd.com [163.181.251.22]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail180-va3.bigfish.com (Postfix) with ESMTP id CF3D932805A; Fri, 18 Sep 2009 19:53:31 +0000 (UTC) Received: from ausb3twp01.amd.com (ausb3twp01.amd.com [163.181.250.37]) by ausb3extmailp02.amd.com (Switch-3.2.7/Switch-3.2.7) with ESMTP id n8IJrRvw000898; Fri, 18 Sep 2009 14:53:31 -0500 X-WSS-ID: 0KQ6N8Y-01-FTM-02 X-M-MSG: Received: from sausexbh2.amd.com (SAUSEXBH2.amd.com [163.181.22.102]) by ausb3twp01.amd.com (Tumbleweed MailGate 3.7.0) with ESMTP id 25CD3102850E; Fri, 18 Sep 2009 14:53:22 -0500 (CDT) Received: from sausexmb4.amd.com ([163.181.3.15]) by sausexbh2.amd.com with Microsoft SMTPSVC(6.0.3790.3959); Fri, 18 Sep 2009 14:53:26 -0500 Received: from wshpnow.amd.com ([10.236.48.99]) by sausexmb4.amd.com with Microsoft SMTPSVC(6.0.3790.3959); Fri, 18 Sep 2009 14:53:26 -0500 From: Mark Langsdorf To: linux-kernel@vger.kernel.org, avi@qumranet.com, kvm@vger.kernel.org Subject: [PATCH][KVM][2/2] Support Pause Filter in AMD processors Date: Fri, 18 Sep 2009 14:51:09 -0500 User-Agent: KMail/1.9.10 MIME-Version: 1.0 Content-Disposition: inline Message-ID: <200909181451.09942.mark.langsdorf@amd.com> X-OriginalArrivalTime: 18 Sep 2009 19:53:26.0117 (UTC) FILETIME=[AFA09950:01CA3899] Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch depends on "[PATCH] Prevent immediate process rescheduling" that I just submitted. New AMD processors (Family 0x10 models 8+) support the Pause Filter Feature. This feature creates a new field in the VMCB called Pause Filter Count. If Pause Filter Count is greater than 0 and intercepting PAUSEs is enabled, the processor will increment an internal counter when a PAUSE instruction occurs instead of intercepting. When the internal counter reaches the Pause Filter Count value, a PAUSE intercept will occur. This feature can be used to detect contended spinlocks, especially when the lock holding VCPU is not scheduled. Rescheduling another VCPU prevents the VCPU seeking the lock from wasting its quantum by spinning idly. Experimental results show that most spinlocks are held for less than 1000 PAUSE cycles or more than a few thousand. Default the Pause Filter Counter to 3000 to detect the contended spinlocks. Processor support for this feature is indicated by a CPUID bit. On a 24 core system running 4 guests each with 16 VCPUs, this patch improved overall performance of each guest's 32 job kernbench by approximately 3-5% when combined with a scheduler algorithm that guaranteed that yielding proccess would not be immediately rescheduled. Further performance improvement may be possible with a more sophisticated yield algorithm. -Mark Langsdorf Operating System Research Center AMD Signed-off-by: Mark Langsdorf --- arch/x86/include/asm/svm.h | 3 ++- arch/x86/kvm/svm.c | 13 +++++++++++++ 2 files changed, 15 insertions(+), 1 deletions(-) diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h index 85574b7..1fecb7e 100644 --- a/arch/x86/include/asm/svm.h +++ b/arch/x86/include/asm/svm.h @@ -57,7 +57,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area { u16 intercept_dr_write; u32 intercept_exceptions; u64 intercept; - u8 reserved_1[44]; + u8 reserved_1[42]; + u16 pause_filter_count; u64 iopm_base_pa; u64 msrpm_base_pa; u64 tsc_offset; diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index a2f2d43..28c49d0 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -46,6 +46,7 @@ MODULE_LICENSE("GPL"); #define SVM_FEATURE_NPT (1 << 0) #define SVM_FEATURE_LBRV (1 << 1) #define SVM_FEATURE_SVML (1 << 2) +#define SVM_FEATURE_PAUSE_FILTER (1 << 10) #define NESTED_EXIT_HOST 0 /* Exit handled on host level */ #define NESTED_EXIT_DONE 1 /* Exit caused nested vmexit */ @@ -654,6 +655,11 @@ static void init_vmcb(struct vcpu_svm *svm) svm->nested.vmcb = 0; svm->vcpu.arch.hflags = 0; + if (svm_has(SVM_FEATURE_PAUSE_FILTER)) { + control->pause_filter_count = 3000; + control->intercept |= (1ULL << INTERCEPT_PAUSE); + } + enable_gif(svm); } @@ -2255,6 +2261,12 @@ static int interrupt_window_interception(struct vcpu_svm *svm) return 1; } +static int pause_interception(struct vcpu_svm *svm) +{ + schedule(); + return 1; +} + static int (*svm_exit_handlers[])(struct vcpu_svm *svm) = { [SVM_EXIT_READ_CR0] = emulate_on_interception, [SVM_EXIT_READ_CR3] = emulate_on_interception, @@ -2290,6 +2302,7 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm) = { [SVM_EXIT_CPUID] = cpuid_interception, [SVM_EXIT_IRET] = iret_interception, [SVM_EXIT_INVD] = emulate_on_interception, + [SVM_EXIT_PAUSE] = pause_interception, [SVM_EXIT_HLT] = halt_interception, [SVM_EXIT_INVLPG] = invlpg_interception, [SVM_EXIT_INVLPGA] = invlpga_interception,