Message ID | 200905071000.14038.mark.langsdorf@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
(copying Ingo) Mark Langsdorf wrote: > commit 01813db8627e74018c8cec90df7e345839351f23 > Author: root <root@xendinar01.amd.com> > Date: Thu May 7 09:44:10 2009 -0500 > > New AMD processors will support the Pause Filter Feature. > This feature creates a new field in the VMCB called Pause > Filter Count. If Pause Filter Count is greater than 0 and > intercepting PAUSEs is enabled, the processor will increment > an internal counter when a PAUSE instruction occurs instead > of intercepting. When the internal counter reaches the > Pause Filter Count value, a PAUSE intercept will occur. > > This feature can be used to detect contended spinlocks, > especially when the lock holding VCPU is not scheduled. > Rescheduling another VCPU prevents the VCPU seeking the > lock from wasting its quantum by spinning idly. > > Experimental results show that most spinlocks are held > for less than 1000 PAUSE cycles or more than a few > thousand. Default the Pause Filter Counter to 3000 to > detect the contended spinlocks. > > Processor support for this feature is indicated by a CPUID > bit. > > On a 24 core system running 4 guests each with 16 VCPUs, > this patch improved overall performance of each guest's > 32 job kernbench by approximately 1%. Further performance > improvement may be possible with a more sophisticated > yield algorithm. > > -Mark Langsdorf > Operating System Research Center > AMD > > Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com> > (please use git format-patch rather than git show, and set up user.name and user.email properly) > > svm->nested_vmcb = 0; > svm->vcpu.arch.hflags = HF_GIF_MASK; > + > + if (svm_has(SVM_FEATURE_PAUSE_FILTER)) { > + control->pause_filter_count = 5000; > + control->intercept |= (1ULL << INTERCEPT_PAUSE); > + } > + > } 3000 or 5000? > > +static int pause_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run) > +{ > + /* Simple yield */ > + vcpu_put(&svm->vcpu); > + schedule(); > + vcpu_load(&svm->vcpu); > + return 1; > + Ingo, will this do anything under CFS, or will CFS note that nothing has changed in the accounting and reschedule us immediately?
* Avi Kivity <avi@redhat.com> wrote: >> +static int pause_interception(struct vcpu_svm *svm, struct kvm_run >> *kvm_run) >> +{ >> + /* Simple yield */ >> + vcpu_put(&svm->vcpu); >> + schedule(); >> + vcpu_load(&svm->vcpu); >> + return 1; >> + > > Ingo, will this do anything under CFS, or will CFS note that > nothing has changed in the accounting and reschedule us > immediately? The scheduler will yield to another task only if the current task has become ineligible. I.e schedule() is largely a NOP on TASK_RUNNING tasks (i.e. here). I.e. this is a somewhat poor solution as far as scheduling goes. But i'm wondering what the CPU side does. Can REP-NOP really take thousands of cycles? If yes, under what circumstances? Ingo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Ingo Molnar wrote: > >>> +static int pause_interception(struct vcpu_svm *svm, struct kvm_run >>> *kvm_run) >>> +{ >>> + /* Simple yield */ >>> + vcpu_put(&svm->vcpu); >>> + schedule(); >>> + vcpu_load(&svm->vcpu); >>> + return 1; >>> + >>> >> Ingo, will this do anything under CFS, or will CFS note that >> nothing has changed in the accounting and reschedule us >> immediately? >> > > The scheduler will yield to another task only if the current task > has become ineligible. I.e schedule() is largely a NOP on > TASK_RUNNING tasks (i.e. here). > Especially on preemptible kernels, where the schedule() would have already happened if it could cause anything, IIUC. > I.e. this is a somewhat poor solution as far as scheduling goes. But > i'm wondering what the CPU side does. Can REP-NOP really take > thousands of cycles? If yes, under what circumstances? > The guest is running rep-nop in a loop while trying to acquire a spinlock. The hardware detects this (most likely, repeated rep-nop with the same rip) and exits. We can program the loop count; obviously if we're spinning for only a short while it's better to keep spinning while hoping the lock will be released soon. The idea is to detect that the guest is not making forward progress and yield. If I could tell the scheduler, you may charge me a couple of milliseconds, I promise not to sue, that would be ideal. Other tasks can become eligible, hopefully the task holding the spinlock, and by the time we're scheduled back the long running task will have finished and released the lock. For newer Linux as a guest we're better off paravirtualizing this, so we can tell the host which vcpu holds the lock; in this case kvm will want to say, take a couple milliseconds off my account and transfer it to this task (so called directed yield). However there's no reason to paravirtualize all cpu_relax() calls.
* Avi Kivity <avi@redhat.com> wrote: >> I.e. this is a somewhat poor solution as far as scheduling goes. >> But i'm wondering what the CPU side does. Can REP-NOP really take >> thousands of cycles? If yes, under what circumstances? > > The guest is running rep-nop in a loop while trying to acquire a > spinlock. The hardware detects this (most likely, repeated > rep-nop with the same rip) and exits. We can program the loop > count; obviously if we're spinning for only a short while it's > better to keep spinning while hoping the lock will be released > soon. > > The idea is to detect that the guest is not making forward > progress and yield. If I could tell the scheduler, you may charge > me a couple of milliseconds, I promise not to sue, that would be > ideal. [...] Ok, with such a waiver, who could refuse? This really needs a new kernel-internal scheduler API though, which does a lot of fancy things to do: se->vruntime += 1000000; i.e. add 1 msec worth of nanoseconds to the task's timeline. (first remove it from the rbtree, then add it back, and nice-weight it as well) And only do it if there's other tasks running on this CPU or so. _That_ would be pretty efficient, and would do the right thing when two (or more) vcpus run on the same CPU, and it would also do the right thing if there are repeated VM-exits due to pause filtering. Please dont even think about using yield for this though - that will just add a huge hit to this task and wont result in any sane behavior - and yield is bound to some historic user-space behavior as well. A gradual and linear back-off from the current timeline is more of a fair negotiation process between vcpus and results in more or less sane (and fair) scheduling, and no unnecessary looping. You could even do an exponential backoff up to a limit of 1-10 msecs or so, starting at 100 usecs. Ingo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2009-05-11 at 17:24 +0300, Avi Kivity wrote: > > I.e. this is a somewhat poor solution as far as scheduling goes. But > > i'm wondering what the CPU side does. Can REP-NOP really take > > thousands of cycles? If yes, under what circumstances? > > > > The guest is running rep-nop in a loop while trying to acquire a > spinlock. The hardware detects this (most likely, repeated rep-nop with > the same rip) and exits. We can program the loop count; obviously if > we're spinning for only a short while it's better to keep spinning while > hoping the lock will be released soon. > > The idea is to detect that the guest is not making forward progress and > yield. If I could tell the scheduler, you may charge me a couple of > milliseconds, I promise not to sue, that would be ideal. Other tasks > can become eligible, hopefully the task holding the spinlock, and by the > time we're scheduled back the long running task will have finished and > released the lock. > > For newer Linux as a guest we're better off paravirtualizing this, so we > can tell the host which vcpu holds the lock; in this case kvm will want > to say, take a couple milliseconds off my account and transfer it to > this task (so called directed yield). However there's no reason to > paravirtualize all cpu_relax() calls. So we're now officially giving up on (soft) realtime virtualization? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Ingo Molnar wrote: > * Avi Kivity <avi@redhat.com> wrote: > > >>> I.e. this is a somewhat poor solution as far as scheduling goes. >>> But i'm wondering what the CPU side does. Can REP-NOP really take >>> thousands of cycles? If yes, under what circumstances? >>> >> The guest is running rep-nop in a loop while trying to acquire a >> spinlock. The hardware detects this (most likely, repeated >> rep-nop with the same rip) and exits. We can program the loop >> count; obviously if we're spinning for only a short while it's >> better to keep spinning while hoping the lock will be released >> soon. >> >> The idea is to detect that the guest is not making forward >> progress and yield. If I could tell the scheduler, you may charge >> me a couple of milliseconds, I promise not to sue, that would be >> ideal. [...] >> > > Ok, with such a waiver, who could refuse? > > This really needs a new kernel-internal scheduler API though, which > does a lot of fancy things to do: > > se->vruntime += 1000000; > > i.e. add 1 msec worth of nanoseconds to the task's timeline. (first > remove it from the rbtree, then add it back, and nice-weight it as > well) I suspected it would be as simple as this. > And only do it if there's other tasks running on this CPU or > so. > What would happen if there weren't? I'd guess the task would continue running (but with a warped vruntime)? > _That_ would be pretty efficient, and would do the right thing when > two (or more) vcpus run on the same CPU, and it would also do the > right thing if there are repeated VM-exits due to pause filtering. > > Please dont even think about using yield for this though - that will > just add a huge hit to this task and wont result in any sane > behavior - and yield is bound to some historic user-space behavior > as well. > > A gradual and linear back-off from the current timeline is more of a > fair negotiation process between vcpus and results in more or less > sane (and fair) scheduling, and no unnecessary looping. > > You could even do an exponential backoff up to a limit of 1-10 msecs > or so, starting at 100 usecs. > Good idea, it eliminates another variable to be tuned.
* Avi Kivity <avi@redhat.com> wrote: >> And only do it if there's other tasks running on this CPU or so. > > What would happen if there weren't? I'd guess the task would > continue running (but with a warped vruntime)? We dont want that warping to occur - we just want to go back and burn CPU time in VM context. The problem is (as Peter pointed it out) that this hw facility is incomplete and does not give us any event (interrupt) and does not give us any event key (address we are waiting for) either. So the next best thing to do is to go back to the guest, because that is where we make the most progress, more likely, and that is where we want to be to make progress immediately, with the shortest latency. ( Perhaps we could also increase vruntime beyond the standard latency value to make sure any freshly woken task gets executed first if we are still looping. ) >> _That_ would be pretty efficient, and would do the right thing when >> two (or more) vcpus run on the same CPU, and it would also do the >> right thing if there are repeated VM-exits due to pause filtering. >> >> Please dont even think about using yield for this though - that will >> just add a huge hit to this task and wont result in any sane behavior - >> and yield is bound to some historic user-space behavior as well. >> >> A gradual and linear back-off from the current timeline is more of a >> fair negotiation process between vcpus and results in more or less >> sane (and fair) scheduling, and no unnecessary looping. >> >> You could even do an exponential backoff up to a limit of 1-10 msecs >> or so, starting at 100 usecs. >> > > Good idea, it eliminates another variable to be tuned. It could be made fully self-tuning, if the filter threshold can be tuned fast enough. (an MSR write? A VM context field update?) I.e. the 3000 cycles value itself could be eliminated as well. (with just a common-sense max of say 100,000 cycles enforced) Ingo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2009-05-11 at 17:51 +0300, Avi Kivity wrote: > Ingo Molnar wrote: > > * Avi Kivity <avi@redhat.com> wrote: > > > > > >>> I.e. this is a somewhat poor solution as far as scheduling goes. > >>> But i'm wondering what the CPU side does. Can REP-NOP really take > >>> thousands of cycles? If yes, under what circumstances? > >>> > >> The guest is running rep-nop in a loop while trying to acquire a > >> spinlock. The hardware detects this (most likely, repeated > >> rep-nop with the same rip) and exits. We can program the loop > >> count; obviously if we're spinning for only a short while it's > >> better to keep spinning while hoping the lock will be released > >> soon. > >> > >> The idea is to detect that the guest is not making forward > >> progress and yield. If I could tell the scheduler, you may charge > >> me a couple of milliseconds, I promise not to sue, that would be > >> ideal. [...] > >> > > > > Ok, with such a waiver, who could refuse? > > > > This really needs a new kernel-internal scheduler API though, which > > does a lot of fancy things to do: > > > > se->vruntime += 1000000; > > > > i.e. add 1 msec worth of nanoseconds to the task's timeline. (first > > remove it from the rbtree, then add it back, and nice-weight it as > > well) > > I suspected it would be as simple as this. Is that thread guaranteed to run as SCHED_OTHER? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Peter Zijlstra wrote: > On Mon, 2009-05-11 at 17:24 +0300, Avi Kivity wrote: > > >>> I.e. this is a somewhat poor solution as far as scheduling goes. But >>> i'm wondering what the CPU side does. Can REP-NOP really take >>> thousands of cycles? If yes, under what circumstances? >>> >>> >> The guest is running rep-nop in a loop while trying to acquire a >> spinlock. The hardware detects this (most likely, repeated rep-nop with >> the same rip) and exits. We can program the loop count; obviously if >> we're spinning for only a short while it's better to keep spinning while >> hoping the lock will be released soon. >> >> The idea is to detect that the guest is not making forward progress and >> yield. If I could tell the scheduler, you may charge me a couple of >> milliseconds, I promise not to sue, that would be ideal. Other tasks >> can become eligible, hopefully the task holding the spinlock, and by the >> time we're scheduled back the long running task will have finished and >> released the lock. >> >> For newer Linux as a guest we're better off paravirtualizing this, so we >> can tell the host which vcpu holds the lock; in this case kvm will want >> to say, take a couple milliseconds off my account and transfer it to >> this task (so called directed yield). However there's no reason to >> paravirtualize all cpu_relax() calls. >> > > So we're now officially giving up on (soft) realtime virtualization? > > Wouldn't realtime guests be in a realtime scheduling class? That ought to ignore this time_yield() (or however it is eventually named).
Peter Zijlstra wrote: >>> This really needs a new kernel-internal scheduler API though, which >>> does a lot of fancy things to do: >>> >>> se->vruntime += 1000000; >>> >>> i.e. add 1 msec worth of nanoseconds to the task's timeline. (first >>> remove it from the rbtree, then add it back, and nice-weight it as >>> well) >>> >> I suspected it would be as simple as this. >> > > Is that thread guaranteed to run as SCHED_OTHER? > No, it's user specified. But if it isn't SCHED_OTHER, we don't mind if it spins (and we hope realtime systems aren't overcommitted).
Ingo Molnar wrote: > * Avi Kivity <avi@redhat.com> wrote: > > >>> And only do it if there's other tasks running on this CPU or so. >>> >> What would happen if there weren't? I'd guess the task would >> continue running (but with a warped vruntime)? >> > > We dont want that warping to occur - we just want to go back and > burn CPU time in VM context. The problem is (as Peter pointed it > out) that this hw facility is incomplete and does not give us any > event (interrupt) and does not give us any event key (address we are > waiting for) either. > > So the next best thing to do is to go back to the guest, because > that is where we make the most progress, more likely, and that is > where we want to be to make progress immediately, with the shortest > latency. > Right, but I thought vruntime += blah would go back into the guest if there were no other runnable tasks. Oh I see -- we'd exit immediately and warp vruntime very fast as long as we're spinning. > ( Perhaps we could also increase vruntime beyond the standard > latency value to make sure any freshly woken task gets executed > first if we are still looping. ) > We could be halfway through our remaining time. We could set it to the next task + epsilon. But that may be too aggressive. If the lock holder is on another cpu, it may well complete before the standard latency value expires. Since we're giving up cpu time potentially to other guests, we don't want to give too much. If we wake up too soon, we'll spin for a few microseconds and yield again. So I think your 100 us + exponential backoff is best. >>> _That_ would be pretty efficient, and would do the right thing when >>> two (or more) vcpus run on the same CPU, and it would also do the >>> right thing if there are repeated VM-exits due to pause filtering. >>> >>> Please dont even think about using yield for this though - that will >>> just add a huge hit to this task and wont result in any sane behavior - >>> and yield is bound to some historic user-space behavior as well. >>> >>> A gradual and linear back-off from the current timeline is more of a >>> fair negotiation process between vcpus and results in more or less >>> sane (and fair) scheduling, and no unnecessary looping. >>> >>> You could even do an exponential backoff up to a limit of 1-10 msecs >>> or so, starting at 100 usecs. >>> >>> >> Good idea, it eliminates another variable to be tuned. >> > > It could be made fully self-tuning, if the filter threshold can be > tuned fast enough. (an MSR write? A VM context field update?) > The latter. > I.e. the 3000 cycles value itself could be eliminated as well. (with > just a common-sense max of say 100,000 cycles enforced) > Yeah, though that has a much smaller effect as it's only responsible for a few microseconds of spinning.
* Avi Kivity <avi@redhat.com> wrote: >> I.e. the 3000 cycles value itself could be eliminated as well. >> (with just a common-sense max of say 100,000 cycles enforced) > > Yeah, though that has a much smaller effect as it's only > responsible for a few microseconds of spinning. 3000 cycles would be 1-2 usecs. Isnt the VM exit+entry cost still in that range? Ingo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Ingo Molnar wrote: > * Avi Kivity <avi@redhat.com> wrote: > > >>> I.e. the 3000 cycles value itself could be eliminated as well. >>> (with just a common-sense max of say 100,000 cycles enforced) >>> >> Yeah, though that has a much smaller effect as it's only >> responsible for a few microseconds of spinning. >> > > 3000 cycles would be 1-2 usecs. Isnt the VM exit+entry cost still in > that range? > It's 3000 executions of rep nop, so you need to account for the entire spinlock loop body. The Linux spinlock is "1:\t" "cmpl %0, %2\n\t" "je 2f\n\t" "rep ; nop\n\t" "movzwl %1, %2\n\t" /* don't need lfence here, because loads are in-order */ "jmp 1b\n" 5 instructions, maybe 2-3 cycles, not counting any special rep nop overhead. Mark, any idea what the spin time is? VM entry/exit is around 1us on the newer processors.
> Ingo Molnar wrote: > > * Avi Kivity <avi@redhat.com> wrote: > > > > > >>> I.e. the 3000 cycles value itself could be eliminated as well. > >>> (with just a common-sense max of say 100,000 cycles enforced) > >>> > >> Yeah, though that has a much smaller effect as it's only > >> responsible for a few microseconds of spinning. > >> > > > > 3000 cycles would be 1-2 usecs. Isnt the VM exit+entry cost > > still in that range? For the processors that support this feature, VM exit+entry is a little over 2000 cycles. > > It's 3000 executions of rep nop, so you need to account for > the entire > spinlock loop body. > > The Linux spinlock is > > "1:\t" > "cmpl %0, %2\n\t" > "je 2f\n\t" > "rep ; nop\n\t" > "movzwl %1, %2\n\t" > /* don't need lfence here, because loads are in-order */ > "jmp 1b\n" > > 5 instructions, maybe 2-3 cycles, not counting any special rep nop > overhead. Mark, any idea what the spin time is? If I'm understanding the question right, the contested spin locks are being held for 5K to 10K iterations of PAUSE. So 10K to 30K cycles if your estimate of the spinlock cycle time is correct. -Mark Langsdorf Operating System Research Center AMD -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Langsdorf, Mark wrote: >> The Linux spinlock is >> >> "1:\t" >> "cmpl %0, %2\n\t" >> "je 2f\n\t" >> "rep ; nop\n\t" >> "movzwl %1, %2\n\t" >> /* don't need lfence here, because loads are in-order */ >> "jmp 1b\n" >> >> 5 instructions, maybe 2-3 cycles, not counting any special rep nop >> overhead. Mark, any idea what the spin time is? >> > > If I'm understanding the question right, the contested > spin locks are being held for 5K to 10K iterations of PAUSE. > So 10K to 30K cycles if your estimate of the spinlock > cycle time is correct. > My estimate is not very reliable. Can you measure this? btw, I'd expect you'd get much more significant improvement on Windows. Booting 16-vcpu Windows 2008 x64 took forever on my dual core host, and my guess is spinlock contention. Of course you'd need a working yield, as Ingo said schedule() does very little.
> >> Please dont even think about using yield for this though - Oops. I'm glad I waited to get some benchmark results before submitting that version. > >> A gradual and linear back-off from the current timeline is > >> more of a fair negotiation process between vcpus and > >> results in more or less > >> sane (and fair) scheduling, and no unnecessary looping. > >> > >> You could even do an exponential backoff up to a limit of > >> 1-10 msecs or so, starting at 100 usecs. > > > > Good idea, it eliminates another variable to be tuned. > > It could be made fully self-tuning, if the filter threshold can be > tuned fast enough. (an MSR write? A VM context field update?) VMCB field update. So the suggestion is to add a function similar to set_task_cpu() that increases the vmruntime with an exponential backoff? Is that sufficient to cause a new VCPU to step in? I'm obviously not very familiar with the scheduler code. > I.e. the 3000 cycles value itself could be eliminated as well. (with > just a common-sense max of say 100,000 cycles enforced) I don't understand what you're saying here. There needs to be some value in the pause filter counter to trigger the intercept. -Mark Langsdorf Operating System Research Center AMD -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h index 85574b7..1fecb7e 100644 --- a/arch/x86/include/asm/svm.h +++ b/arch/x86/include/asm/svm.h @@ -57,7 +57,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area { u16 intercept_dr_write; u32 intercept_exceptions; u64 intercept; - u8 reserved_1[44]; + u8 reserved_1[42]; + u16 pause_filter_count; u64 iopm_base_pa; u64 msrpm_base_pa; u64 tsc_offset; diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index ef43a18..4279141 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -45,6 +45,7 @@ MODULE_LICENSE("GPL"); #define SVM_FEATURE_NPT (1 << 0) #define SVM_FEATURE_LBRV (1 << 1) #define SVM_FEATURE_SVML (1 << 2) +#define SVM_FEATURE_PAUSE_FILTER (1 << 10) #define DEBUGCTL_RESERVED_BITS (~(0x3fULL)) @@ -575,6 +576,12 @@ static void init_vmcb(struct vcpu_svm *svm) svm->nested_vmcb = 0; svm->vcpu.arch.hflags = HF_GIF_MASK; + + if (svm_has(SVM_FEATURE_PAUSE_FILTER)) { + control->pause_filter_count = 5000; + control->intercept |= (1ULL << INTERCEPT_PAUSE); + } + } static int svm_vcpu_reset(struct kvm_vcpu *vcpu) @@ -2087,6 +2094,15 @@ static int interrupt_window_interception(struct vcpu_svm *svm, return 1; } +static int pause_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run) +{ + /* Simple yield */ + vcpu_put(&svm->vcpu); + schedule(); + vcpu_load(&svm->vcpu); + return 1; +} + static int (*svm_exit_handlers[])(struct vcpu_svm *svm, struct kvm_run *kvm_run) = { [SVM_EXIT_READ_CR0] = emulate_on_interception, @@ -2123,6 +2139,7 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm, [SVM_EXIT_CPUID] = cpuid_interception, [SVM_EXIT_IRET] = iret_interception, [SVM_EXIT_INVD] = emulate_on_interception, + [SVM_EXIT_PAUSE] = pause_interception, [SVM_EXIT_HLT] = halt_interception, [SVM_EXIT_INVLPG] = invlpg_interception, [SVM_EXIT_INVLPGA] = invalid_op_interception, diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 2b73e19..e2b730d 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -710,6 +710,7 @@ void vcpu_load(struct kvm_vcpu *vcpu) kvm_arch_vcpu_load(vcpu, cpu); put_cpu(); } +EXPORT_SYMBOL_GPL(vcpu_load); void vcpu_put(struct kvm_vcpu *vcpu) { @@ -719,6 +720,7 @@ void vcpu_put(struct kvm_vcpu *vcpu) preempt_enable(); mutex_unlock(&vcpu->mutex); } +EXPORT_SYMBOL_GPL(vcpu_put); static void ack_flush(void *_completed) {
commit 01813db8627e74018c8cec90df7e345839351f23 Author: root <root@xendinar01.amd.com> Date: Thu May 7 09:44:10 2009 -0500 New AMD processors will support the Pause Filter Feature. This feature creates a new field in the VMCB called Pause Filter Count. If Pause Filter Count is greater than 0 and intercepting PAUSEs is enabled, the processor will increment an internal counter when a PAUSE instruction occurs instead of intercepting. When the internal counter reaches the Pause Filter Count value, a PAUSE intercept will occur. This feature can be used to detect contended spinlocks, especially when the lock holding VCPU is not scheduled. Rescheduling another VCPU prevents the VCPU seeking the lock from wasting its quantum by spinning idly. Experimental results show that most spinlocks are held for less than 1000 PAUSE cycles or more than a few thousand. Default the Pause Filter Counter to 3000 to detect the contended spinlocks. Processor support for this feature is indicated by a CPUID bit. On a 24 core system running 4 guests each with 16 VCPUs, this patch improved overall performance of each guest's 32 job kernbench by approximately 1%. Further performance improvement may be possible with a more sophisticated yield algorithm. -Mark Langsdorf Operating System Research Center AMD Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com> -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html