Message ID | 1501309377-195256-1-git-send-email-longpeng2@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
[no idea if this change makes sense (and especially if it has any bad side effects), do you have performance numbers? I'll just have a look at the general structure of the patch in the meanwhile] > +bool kvm_arch_vcpu_spin_kernmode(struct kvm_vcpu *vcpu) kvm_arch_vcpu_in_kernel() ? > +{ > + return kvm_x86_ops->get_cpl(vcpu) == 0; > +} > + > int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu) > { > return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE; > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index 648b34c..f8f0d74 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -272,6 +272,9 @@ struct kvm_vcpu { > } spin_loop; > #endif > bool preempted; > + /* If vcpu is in kernel-mode when preempted */ > + bool in_kernmode; > + Why do you have to store that ... [...]> + me->in_kernmode = kvm_arch_vcpu_spin_kernmode(me); > kvm_vcpu_set_in_spin_loop(me, true); > /* > * We boost the priority of a VCPU that is runnable but not > @@ -2351,6 +2353,8 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me) > continue; > if (swait_active(&vcpu->wq) && !kvm_arch_vcpu_runnable(vcpu)) > continue; > + if (me->in_kernmode && !vcpu->in_kernmode) Wouldn't it be easier to simply have in_kernel = kvm_arch_vcpu_in_kernel(me); ... if (in_kernel && !kvm_arch_vcpu_in_kernel(vcpu)) ... > + continue; > if (!kvm_vcpu_eligible_for_directed_yield(vcpu)) > continue; > > @@ -4009,8 +4013,11 @@ static void kvm_sched_out(struct preempt_notifier *pn, > { > struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn); > > - if (current->state == TASK_RUNNING) > + if (current->state == TASK_RUNNING) { > vcpu->preempted = true; > + vcpu->in_kernmode = kvm_arch_vcpu_spin_kernmode(vcpu); > + } > + so you don't have to do this change, too. > kvm_arch_vcpu_put(vcpu); > } > >
Hi David, On 2017/7/31 19:31, David Hildenbrand wrote: > [no idea if this change makes sense (and especially if it has any bad > side effects), do you have performance numbers? I'll just have a look at > the general structure of the patch in the meanwhile] > I haven't any test results yet, could you give me some suggestion about what benchmarks are suitable ? >> +bool kvm_arch_vcpu_spin_kernmode(struct kvm_vcpu *vcpu) > > kvm_arch_vcpu_in_kernel() ? > Um...yes, I'll correct this. >> +{ >> + return kvm_x86_ops->get_cpl(vcpu) == 0; >> +} >> + >> int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu) >> { >> return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE; >> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h >> index 648b34c..f8f0d74 100644 >> --- a/include/linux/kvm_host.h >> +++ b/include/linux/kvm_host.h >> @@ -272,6 +272,9 @@ struct kvm_vcpu { >> } spin_loop; >> #endif >> bool preempted; >> + /* If vcpu is in kernel-mode when preempted */ >> + bool in_kernmode; >> + > > Why do you have to store that ... > > [...]> + me->in_kernmode = kvm_arch_vcpu_spin_kernmode(me); >> kvm_vcpu_set_in_spin_loop(me, true); >> /* >> * We boost the priority of a VCPU that is runnable but not >> @@ -2351,6 +2353,8 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me) >> continue; >> if (swait_active(&vcpu->wq) && !kvm_arch_vcpu_runnable(vcpu)) >> continue; >> + if (me->in_kernmode && !vcpu->in_kernmode) > > Wouldn't it be easier to simply have > > in_kernel = kvm_arch_vcpu_in_kernel(me); > ... > if (in_kernel && !kvm_arch_vcpu_in_kernel(vcpu)) > ... > I'm not sure whether the operation of get the vcpu's priority-level is expensive on all architectures, so I record it in kvm_sched_out() for minimal the extra cycles cost in kvm_vcpu_on_spin(). >> + continue; >> if (!kvm_vcpu_eligible_for_directed_yield(vcpu)) >> continue; >> >> @@ -4009,8 +4013,11 @@ static void kvm_sched_out(struct preempt_notifier *pn, >> { >> struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn); >> >> - if (current->state == TASK_RUNNING) >> + if (current->state == TASK_RUNNING) { >> vcpu->preempted = true; >> + vcpu->in_kernmode = kvm_arch_vcpu_spin_kernmode(vcpu); >> + } >> + > > so you don't have to do this change, too. > >> kvm_arch_vcpu_put(vcpu); >> } >> >> > >
> I'm not sure whether the operation of get the vcpu's priority-level is > expensive on all architectures, so I record it in kvm_sched_out() for > minimal the extra cycles cost in kvm_vcpu_on_spin(). > as you only care for x86 right now either way, you can directly optimize here for the good (here: x86) case (keeping changes and therefore possible bugs minimal).
On Mon, 31 Jul 2017 20:08:14 +0800 "Longpeng (Mike)" <longpeng2@huawei.com> wrote: > Hi David, > > On 2017/7/31 19:31, David Hildenbrand wrote: > >> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > >> index 648b34c..f8f0d74 100644 > >> --- a/include/linux/kvm_host.h > >> +++ b/include/linux/kvm_host.h > >> @@ -272,6 +272,9 @@ struct kvm_vcpu { > >> } spin_loop; > >> #endif > >> bool preempted; > >> + /* If vcpu is in kernel-mode when preempted */ > >> + bool in_kernmode; > >> + > > > > Why do you have to store that ... > > > > > [...]> + me->in_kernmode = kvm_arch_vcpu_spin_kernmode(me); > >> kvm_vcpu_set_in_spin_loop(me, true); > >> /* > >> * We boost the priority of a VCPU that is runnable but not > >> @@ -2351,6 +2353,8 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me) > >> continue; > >> if (swait_active(&vcpu->wq) && !kvm_arch_vcpu_runnable(vcpu)) > >> continue; > >> + if (me->in_kernmode && !vcpu->in_kernmode) > > > > Wouldn't it be easier to simply have > > > > in_kernel = kvm_arch_vcpu_in_kernel(me); > > ... > > if (in_kernel && !kvm_arch_vcpu_in_kernel(vcpu)) > > ... > > > > I'm not sure whether the operation of get the vcpu's priority-level is > expensive on all architectures, so I record it in kvm_sched_out() for > minimal the extra cycles cost in kvm_vcpu_on_spin(). As it is now, this handling looks a bit inconsistent. You only update the field on sched-out via preemption _or_ if kvm_vcpu_on_spin is called for the vcpu. In most contexts, this field will have stale content. Also, would checking for kernel mode be more expensive than the various other checks already done in this function? [I like David's suggestion.] > > >> + continue; > >> if (!kvm_vcpu_eligible_for_directed_yield(vcpu)) > >> continue; > >>
On 2017/7/31 20:31, Cornelia Huck wrote: > On Mon, 31 Jul 2017 20:08:14 +0800 > "Longpeng (Mike)" <longpeng2@huawei.com> wrote: > >> Hi David, >> >> On 2017/7/31 19:31, David Hildenbrand wrote: > >>>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h >>>> index 648b34c..f8f0d74 100644 >>>> --- a/include/linux/kvm_host.h >>>> +++ b/include/linux/kvm_host.h >>>> @@ -272,6 +272,9 @@ struct kvm_vcpu { >>>> } spin_loop; >>>> #endif >>>> bool preempted; >>>> + /* If vcpu is in kernel-mode when preempted */ >>>> + bool in_kernmode; >>>> + >>> >>> Why do you have to store that ... >>> >> >>> [...]> + me->in_kernmode = kvm_arch_vcpu_spin_kernmode(me); >>>> kvm_vcpu_set_in_spin_loop(me, true); >>>> /* >>>> * We boost the priority of a VCPU that is runnable but not >>>> @@ -2351,6 +2353,8 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me) >>>> continue; >>>> if (swait_active(&vcpu->wq) && !kvm_arch_vcpu_runnable(vcpu)) >>>> continue; >>>> + if (me->in_kernmode && !vcpu->in_kernmode) >>> >>> Wouldn't it be easier to simply have >>> >>> in_kernel = kvm_arch_vcpu_in_kernel(me); >>> ... >>> if (in_kernel && !kvm_arch_vcpu_in_kernel(vcpu)) >>> ... >>> >> >> I'm not sure whether the operation of get the vcpu's priority-level is >> expensive on all architectures, so I record it in kvm_sched_out() for >> minimal the extra cycles cost in kvm_vcpu_on_spin(). > > As it is now, this handling looks a bit inconsistent. You only update > the field on sched-out via preemption _or_ if kvm_vcpu_on_spin is > called for the vcpu. In most contexts, this field will have stale > content. > > Also, would checking for kernel mode be more expensive than the various > other checks already done in this function? > > [I like David's suggestion.] > Hi Cornelia & David, I'll take your suggestion, thanks :) >> >>>> + continue; >>>> if (!kvm_vcpu_eligible_for_directed_yield(vcpu)) >>>> continue; >>>> > > . >
On 31/07/2017 14:27, David Hildenbrand wrote: >> I'm not sure whether the operation of get the vcpu's priority-level is >> expensive on all architectures, so I record it in kvm_sched_out() for >> minimal the extra cycles cost in kvm_vcpu_on_spin(). >> > as you only care for x86 right now either way, you can directly optimize > here for the good (here: x86) case (keeping changes and therefore > possible bugs minimal). I agree with Cornelia that this is inconsistent, so you shouldn't update me->in_kernmode in kvm_vcpu_on_spin. However, get_cpl requires vcpu_load on Intel x86, so Mike's patch is necessary (vmx_get_cpl -> vmx_read_guest_seg_ar -> vmcs_read32). Alternatively, we can add a new callback kvm_x86_ops->sched_out to x86 KVM, and call vmx_get_cpl from the Intel implementation (vmx_sched_out). This will cache the result until the next sched_in, so that kvm_vcpu_on_spin can use it. Paolo
On 29/07/17 07:22, Longpeng(Mike) wrote: > We had disscuss the idea here: > https://www.spinics.net/lists/kvm/msg140593.html > > I think it's also suitable for other architectures. > > If the vcpu(me) exit due to request a usermode spinlock, then > the spinlock-holder may be preempted in usermode or kernmode. > But if the vcpu(me) is in kernmode, then the holder must be > preempted in kernmode, so we should choose a vcpu in kernmode > as the most eligible candidate. That seems to preclude any form of locking between userspace and kernel (which probably wouldn't be Linux). Are you sure that this form of construct is not used anywhere? I have the feeling this patch could break this scenario... Thanks, M.
On 31/07/2017 15:42, Marc Zyngier wrote: >> If the vcpu(me) exit due to request a usermode spinlock, then >> the spinlock-holder may be preempted in usermode or kernmode. >> But if the vcpu(me) is in kernmode, then the holder must be >> preempted in kernmode, so we should choose a vcpu in kernmode >> as the most eligible candidate. > > That seems to preclude any form of locking between userspace and kernel > (which probably wouldn't be Linux). Are you sure that this form of > construct is not used anywhere? I have the feeling this patch could > break this scenario... It's just a heuristic; it would only be broken if you overcommit, and it would be just as broken as if KVM didn't implement directed yield at all. Paolo
On 2017/7/31 21:20, Paolo Bonzini wrote: > On 31/07/2017 14:27, David Hildenbrand wrote: >>> I'm not sure whether the operation of get the vcpu's priority-level is >>> expensive on all architectures, so I record it in kvm_sched_out() for >>> minimal the extra cycles cost in kvm_vcpu_on_spin(). >>> >> as you only care for x86 right now either way, you can directly optimize >> here for the good (here: x86) case (keeping changes and therefore >> possible bugs minimal). > > I agree with Cornelia that this is inconsistent, so you shouldn't update > me->in_kernmode in kvm_vcpu_on_spin. However, get_cpl requires > vcpu_load on Intel x86, so Mike's patch is necessary (vmx_get_cpl -> > vmx_read_guest_seg_ar -> vmcs_read32). > Hi Paolo, It seems that other architectures(e.g. arm/s390) needn't to cache the result, but x86 need, so I need to move 'in_kernmode' into kvm_vcpu_arch and only add this field to x86, right? > Alternatively, we can add a new callback kvm_x86_ops->sched_out to x86 > KVM, and call vmx_get_cpl from the Intel implementation (vmx_sched_out). In this approach, vmx_sched_out would only call vmx_get_cpl, isn't too redundant, because we can just call kvm_x86_ops->get_cpl instead at the right place? > This will cache the result until the next sched_in, so that 'until the next sched_in' --> Do we need to clear the result in sched in ? > kvm_vcpu_on_spin can use it. > > Paolo > > . >
On 01/08/2017 05:26, Longpeng (Mike) wrote: > > > On 2017/7/31 21:20, Paolo Bonzini wrote: > >> On 31/07/2017 14:27, David Hildenbrand wrote: >>>> I'm not sure whether the operation of get the vcpu's priority-level is >>>> expensive on all architectures, so I record it in kvm_sched_out() for >>>> minimal the extra cycles cost in kvm_vcpu_on_spin(). >>>> >>> as you only care for x86 right now either way, you can directly optimize >>> here for the good (here: x86) case (keeping changes and therefore >>> possible bugs minimal). >> >> I agree with Cornelia that this is inconsistent, so you shouldn't update >> me->in_kernmode in kvm_vcpu_on_spin. However, get_cpl requires >> vcpu_load on Intel x86, so Mike's patch is necessary (vmx_get_cpl -> >> vmx_read_guest_seg_ar -> vmcs_read32). >> > > Hi Paolo, > > It seems that other architectures(e.g. arm/s390) needn't to cache the result, > but x86 need, so I need to move 'in_kernmode' into kvm_vcpu_arch and only add > this field to x86, right? That's another way to do it, and I like it. >> This will cache the result until the next sched_in, so that > > 'until the next sched_in' --> Do we need to clear the result in sched in ? No, thanks. Paolo
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c index d4b2ad1..2e2701d 100644 --- a/arch/mips/kvm/mips.c +++ b/arch/mips/kvm/mips.c @@ -98,6 +98,11 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) return !!(vcpu->arch.pending_exceptions); } +bool kvm_arch_vcpu_spin_kernmode(struct kvm_vcpu *vcpu) +{ + return false; +} + int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu) { return 1; diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 1a75c0b..2489f64 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -58,6 +58,11 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *v) return !!(v->arch.pending_exceptions) || kvm_request_pending(v); } +bool kvm_arch_vcpu_spin_kernmode(struct kvm_vcpu *vcpu) +{ + return false; +} + int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu) { return 1; diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 3f2884e..9d7c42e 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -2443,6 +2443,11 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) return kvm_s390_vcpu_has_irq(vcpu, 0); } +bool kvm_arch_vcpu_spin_kernmode(struct kvm_vcpu *vcpu) +{ + return false; +} + void kvm_s390_vcpu_block(struct kvm_vcpu *vcpu) { atomic_or(PROG_BLOCK_SIE, &vcpu->arch.sie_block->prog20); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 82a63c5..b5a2e53 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8435,6 +8435,11 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) return kvm_vcpu_running(vcpu) || kvm_vcpu_has_events(vcpu); } +bool kvm_arch_vcpu_spin_kernmode(struct kvm_vcpu *vcpu) +{ + return kvm_x86_ops->get_cpl(vcpu) == 0; +} + int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu) { return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE; diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 648b34c..f8f0d74 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -272,6 +272,9 @@ struct kvm_vcpu { } spin_loop; #endif bool preempted; + /* If vcpu is in kernel-mode when preempted */ + bool in_kernmode; + struct kvm_vcpu_arch arch; struct dentry *debugfs_dentry; }; @@ -797,6 +800,7 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, void kvm_arch_hardware_unsetup(void); void kvm_arch_check_processor_compat(void *rtn); int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu); +bool kvm_arch_vcpu_spin_kernmode(struct kvm_vcpu *vcpu); int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu); #ifndef __KVM_HAVE_ARCH_VM_ALLOC diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c index a39a1e1..ca6a394 100644 --- a/virt/kvm/arm/arm.c +++ b/virt/kvm/arm/arm.c @@ -416,6 +416,11 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *v) && !v->arch.power_off && !v->arch.pause); } +bool kvm_arch_vcpu_spin_kernmode(struct kvm_vcpu *vcpu) +{ + return false; +} + /* Just ensure a guest exit from a particular CPU */ static void exit_vm_noop(void *info) { diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 82987d4..8d83caa 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -290,6 +290,7 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id) kvm_vcpu_set_in_spin_loop(vcpu, false); kvm_vcpu_set_dy_eligible(vcpu, false); vcpu->preempted = false; + vcpu->in_kernmode = false; r = kvm_arch_vcpu_init(vcpu); if (r < 0) @@ -2330,6 +2331,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me) int pass; int i; + me->in_kernmode = kvm_arch_vcpu_spin_kernmode(me); kvm_vcpu_set_in_spin_loop(me, true); /* * We boost the priority of a VCPU that is runnable but not @@ -2351,6 +2353,8 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me) continue; if (swait_active(&vcpu->wq) && !kvm_arch_vcpu_runnable(vcpu)) continue; + if (me->in_kernmode && !vcpu->in_kernmode) + continue; if (!kvm_vcpu_eligible_for_directed_yield(vcpu)) continue; @@ -4009,8 +4013,11 @@ static void kvm_sched_out(struct preempt_notifier *pn, { struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn); - if (current->state == TASK_RUNNING) + if (current->state == TASK_RUNNING) { vcpu->preempted = true; + vcpu->in_kernmode = kvm_arch_vcpu_spin_kernmode(vcpu); + } + kvm_arch_vcpu_put(vcpu); }
We had disscuss the idea here: https://www.spinics.net/lists/kvm/msg140593.html I think it's also suitable for other architectures. If the vcpu(me) exit due to request a usermode spinlock, then the spinlock-holder may be preempted in usermode or kernmode. But if the vcpu(me) is in kernmode, then the holder must be preempted in kernmode, so we should choose a vcpu in kernmode as the most eligible candidate. PS: I only implement X86 arch currently for I'm not familiar with other architecture. Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> --- arch/mips/kvm/mips.c | 5 +++++ arch/powerpc/kvm/powerpc.c | 5 +++++ arch/s390/kvm/kvm-s390.c | 5 +++++ arch/x86/kvm/x86.c | 5 +++++ include/linux/kvm_host.h | 4 ++++ virt/kvm/arm/arm.c | 5 +++++ virt/kvm/kvm_main.c | 9 ++++++++- 7 files changed, 37 insertions(+), 1 deletion(-)