Message ID | 1634631160-67276-3-git-send-email-wanpengli@tencent.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3,1/3] KVM: emulate: Don't inject #GP when emulating RDMPC if CR0.PE=0 | expand |
On 19/10/21 10:12, Wanpeng Li wrote: > - if (kvm_vcpu_wake_up(vcpu)) > - return; > + me = get_cpu(); > + > + if (rcuwait_active(kvm_arch_vcpu_get_wait(vcpu)) && kvm_vcpu_wake_up(vcpu)) > + goto out; This is racy. You are basically doing the same check that rcuwait_wake_up does, but without the memory barrier before. Also here: > + if (vcpu == __this_cpu_read(kvm_running_vcpu)) { > + WARN_ON_ONCE(vcpu->mode == IN_GUEST_MODE); it's better to do if (vcpu == ... && !WARN_ON_ONCE(vcpu->mode == IN_GUEST_MODE)) goto out; so that if the bug happens you do get a smp_send_reschedule() and fail safely. Paolo > + goto out; > + }
On Tue, Oct 19, 2021, Paolo Bonzini wrote: > On 19/10/21 10:12, Wanpeng Li wrote: > > - if (kvm_vcpu_wake_up(vcpu)) > > - return; > > + me = get_cpu(); > > + > > + if (rcuwait_active(kvm_arch_vcpu_get_wait(vcpu)) && kvm_vcpu_wake_up(vcpu)) > > + goto out; > > This is racy. You are basically doing the same check that rcuwait_wake_up > does, but without the memory barrier before. I was worried that was the case[*], but I didn't have the two hours it would have taken me to verify there was indeed a problem :-) The intent of the extra check was to avoid the locked instruction that comes with disabling preemption via rcu_read_lock(). But thinking more, the extra op should be little more than a basic arithmetic operation in the grand scheme on modern x86 since the cache line is going to be locked and written no matter what, either immediately before or immediately after. So with Paolo's other comment, maybe just this? And if this doesn't provide the desired performance boost, changes to the rcuwait behavior should go in separate patch. diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 9ec99f5b972c..ebc6d4f2fbfa 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3333,11 +3333,22 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu) * vCPU also requires it to leave IN_GUEST_MODE. */ me = get_cpu(); + + /* + * Avoid the moderately expensive "should kick" operation if this pCPU + * is currently running the target vCPU, in which case it's a KVM bug + * if the vCPU is in the inner run loop. + */ + if (vcpu == __this_cpu_read(kvm_running_vcpu) && + !WARN_ON_ONCE(vcpu->mode == IN_GUEST_MODE)) + goto out; + if (kvm_arch_vcpu_should_kick(vcpu)) { cpu = READ_ONCE(vcpu->cpu); if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu)) smp_send_reschedule(cpu); } +out: put_cpu(); } EXPORT_SYMBOL_GPL(kvm_vcpu_kick); [*] https://lkml.kernel.org/r/YWoOG40Ap0Islpu2@google.com
On Wed, 20 Oct 2021 at 01:34, Sean Christopherson <seanjc@google.com> wrote: > > On Tue, Oct 19, 2021, Paolo Bonzini wrote: > > On 19/10/21 10:12, Wanpeng Li wrote: > > > - if (kvm_vcpu_wake_up(vcpu)) > > > - return; > > > + me = get_cpu(); > > > + > > > + if (rcuwait_active(kvm_arch_vcpu_get_wait(vcpu)) && kvm_vcpu_wake_up(vcpu)) > > > + goto out; > > > > This is racy. You are basically doing the same check that rcuwait_wake_up > > does, but without the memory barrier before. > > I was worried that was the case[*], but I didn't have the two hours it would have > taken me to verify there was indeed a problem :-) > > The intent of the extra check was to avoid the locked instruction that comes with > disabling preemption via rcu_read_lock(). But thinking more, the extra op should > be little more than a basic arithmetic operation in the grand scheme on modern x86 > since the cache line is going to be locked and written no matter what, either > immediately before or immediately after. I observe the main overhead of rcuwait_wake_up() is from rcu operations, especially rcu_read_lock/unlock(). > > So with Paolo's other comment, maybe just this? And if this doesn't provide the > desired performance boost, changes to the rcuwait behavior should go in separate > patch. Ok. Wanpeng
On 20/10/21 04:49, Wanpeng Li wrote: >> The intent of the extra check was to avoid the locked instruction that comes with >> disabling preemption via rcu_read_lock(). But thinking more, the extra op should >> be little more than a basic arithmetic operation in the grand scheme on modern x86 >> since the cache line is going to be locked and written no matter what, either >> immediately before or immediately after. > > I observe the main overhead of rcuwait_wake_up() is from rcu > operations, especially rcu_read_lock/unlock(). Do you have CONFIG_PREEMPT_RCU set? If so, maybe something like this would help: diff --git a/kernel/exit.c b/kernel/exit.c index fd1c04193e18..ca1e60a1234d 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -235,8 +235,6 @@ int rcuwait_wake_up(struct rcuwait *w) int ret = 0; struct task_struct *task; - rcu_read_lock(); - /* * Order condition vs @task, such that everything prior to the load * of @task is visible. This is the condition as to why the user called @@ -250,6 +248,14 @@ int rcuwait_wake_up(struct rcuwait *w) */ smp_mb(); /* (B) */ +#ifdef CONFIG_PREEMPT_RCU + /* The cost of rcu_read_lock() is nontrivial for preemptable RCU. */ + if (!rcuwait_active(w)) + return ret; +#endif + + rcu_read_lock(); + task = rcu_dereference(w->task); if (task) ret = wake_up_process(task); (If you don't, rcu_read_lock is essentially preempt_disable() and it should not have a large overhead). You still need the memory barrier though, in order to avoid missed wakeups; shameless plug for my article at https://lwn.net/Articles/847481/. Paolo >> So with Paolo's other comment, maybe just this? And if this doesn't provide the >> desired performance boost, changes to the rcuwait behavior should go in separate >> patch. > Ok.
On Wed, 20 Oct 2021 at 14:47, Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 20/10/21 04:49, Wanpeng Li wrote: > >> The intent of the extra check was to avoid the locked instruction that comes with > >> disabling preemption via rcu_read_lock(). But thinking more, the extra op should > >> be little more than a basic arithmetic operation in the grand scheme on modern x86 > >> since the cache line is going to be locked and written no matter what, either > >> immediately before or immediately after. > > > > I observe the main overhead of rcuwait_wake_up() is from rcu > > operations, especially rcu_read_lock/unlock(). > > Do you have CONFIG_PREEMPT_RCU set? If so, maybe something like this would help: Yes. > > diff --git a/kernel/exit.c b/kernel/exit.c > index fd1c04193e18..ca1e60a1234d 100644 > --- a/kernel/exit.c > +++ b/kernel/exit.c > @@ -235,8 +235,6 @@ int rcuwait_wake_up(struct rcuwait *w) > int ret = 0; > struct task_struct *task; > > - rcu_read_lock(); > - > /* > * Order condition vs @task, such that everything prior to the load > * of @task is visible. This is the condition as to why the user called > @@ -250,6 +248,14 @@ int rcuwait_wake_up(struct rcuwait *w) > */ > smp_mb(); /* (B) */ > > +#ifdef CONFIG_PREEMPT_RCU > + /* The cost of rcu_read_lock() is nontrivial for preemptable RCU. */ > + if (!rcuwait_active(w)) > + return ret; > +#endif > + > + rcu_read_lock(); > + > task = rcu_dereference(w->task); > if (task) > ret = wake_up_process(task); > > (If you don't, rcu_read_lock is essentially preempt_disable() and it > should not have a large overhead). You still need the memory barrier > though, in order to avoid missed wakeups; shameless plug for my > article at https://lwn.net/Articles/847481/. You are right, the cost of rcu_read_lock() for preemptable RCU introduces too much overhead, do you want to send a separate patch? Wanpeng
On 19/10/21 19:34, Sean Christopherson wrote: > The intent of the extra check was to avoid the locked instruction that comes with > disabling preemption via rcu_read_lock(). But thinking more, the extra op should > be little more than a basic arithmetic operation in the grand scheme on modern x86 > since the cache line is going to be locked and written no matter what, either > immediately before or immediately after. There should be no locked instructions unless you're using PREEMPT_RT/PREEMPT_RCU, no? The preempt_disable count is in a percpu variable. > > + /* > + * Avoid the moderately expensive "should kick" operation if this pCPU > + * is currently running the target vCPU, in which case it's a KVM bug > + * if the vCPU is in the inner run loop. > + */ > + if (vcpu == __this_cpu_read(kvm_running_vcpu) && > + !WARN_ON_ONCE(vcpu->mode == IN_GUEST_MODE)) > + goto out; > + It should not even be a problem if vcpu->mode == IN_GUEST_MODE, you just set it to EXITING_GUEST_MODE without even the need for atomic_cmpxchg. I'll send a few patches out, since I think I found some related issues. Paolo
On 20/10/21 12:02, Wanpeng Li wrote: >> +#ifdef CONFIG_PREEMPT_RCU >> + /* The cost of rcu_read_lock() is nontrivial for preemptable RCU. */ >> + if (!rcuwait_active(w)) >> + return ret; >> +#endif >> + >> + rcu_read_lock(); >> + >> task = rcu_dereference(w->task); >> if (task) >> ret = wake_up_process(task); >> >> (If you don't, rcu_read_lock is essentially preempt_disable() and it >> should not have a large overhead). You still need the memory barrier >> though, in order to avoid missed wakeups; shameless plug for my >> article athttps://lwn.net/Articles/847481/. > You are right, the cost of rcu_read_lock() for preemptable RCU > introduces too much overhead, do you want to send a separate patch? Yes, I'll take care of this. Thanks! Paolo
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 7851f3a1b5f7..1bc52eab0a7d 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3314,8 +3314,15 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu) { int me, cpu; - if (kvm_vcpu_wake_up(vcpu)) - return; + me = get_cpu(); + + if (rcuwait_active(kvm_arch_vcpu_get_wait(vcpu)) && kvm_vcpu_wake_up(vcpu)) + goto out; + + if (vcpu == __this_cpu_read(kvm_running_vcpu)) { + WARN_ON_ONCE(vcpu->mode == IN_GUEST_MODE); + goto out; + } /* * Note, the vCPU could get migrated to a different pCPU at any point @@ -3324,12 +3331,12 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu) * IPI is to force the vCPU to leave IN_GUEST_MODE, and migrating the * vCPU also requires it to leave IN_GUEST_MODE. */ - me = get_cpu(); if (kvm_arch_vcpu_should_kick(vcpu)) { cpu = READ_ONCE(vcpu->cpu); if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu)) smp_send_reschedule(cpu); } +out: put_cpu(); } EXPORT_SYMBOL_GPL(kvm_vcpu_kick);