diff mbox

[RFC,1/2] KVM: don't check for PF_VCPU when yielding

Message ID 1416931449-24585-2-git-send-email-dahi@linux.vnet.ibm.com (mailing list archive)
State New, archived
Headers show

Commit Message

David Hildenbrand Nov. 25, 2014, 4:04 p.m. UTC
As some architectures (e.g. s390) can't disable preemption while
entering/leaving the guest, they won't receive the yield in all situations.

kvm_enter_guest() has to be called with preemption_disabled and will set
PF_VCPU. After that point e.g. s390 reenables preemption and starts to execute the
guest. The thread might therefore be scheduled out between kvm_enter_guest() and
kvm_exit_guest(), resulting in PF_VCPU being set but not being run.

Please note that preemption has to stay enabled in order to correctly process
page faults on s390.

Current code takes PF_VCPU as a hint that the VCPU thread is running and
therefore needs no yield. yield_to() checks whether the target thread is running,
so let's use the inbuilt functionality to make it independent of PF_VCPU and
preemption.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
---
 virt/kvm/kvm_main.c | 4 ----
 1 file changed, 4 deletions(-)

Comments

Christian Borntraeger Nov. 26, 2014, 7:51 a.m. UTC | #1
Am 25.11.2014 um 17:04 schrieb David Hildenbrand:
> As some architectures (e.g. s390) can't disable preemption while
> entering/leaving the guest, they won't receive the yield in all situations.
> 
> kvm_enter_guest() has to be called with preemption_disabled and will set
> PF_VCPU. After that point e.g. s390 reenables preemption and starts to execute the
> guest. The thread might therefore be scheduled out between kvm_enter_guest() and
> kvm_exit_guest(), resulting in PF_VCPU being set but not being run.
> 
> Please note that preemption has to stay enabled in order to correctly process
> page faults on s390.
> 
> Current code takes PF_VCPU as a hint that the VCPU thread is running and
> therefore needs no yield. yield_to() checks whether the target thread is running,
> so let's use the inbuilt functionality to make it independent of PF_VCPU and
> preemption.

This change is a trade-off.
PRO: This patch would improve the case of preemption on s390. This is probably a corner case as most distros have preemption off anyway.
CON: The downside is that kvm_vcpu_yield_to is called also from kvm_vcpu_on_spin. Here we want to avoid the scheduler overhead for a wrong decision. 

So I think this patch is probably not what we want in most cases.

> 
> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
> ---
>  virt/kvm/kvm_main.c | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 5b45330..184f52e 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1782,10 +1782,6 @@ int kvm_vcpu_yield_to(struct kvm_vcpu *target)
>  	rcu_read_unlock();
>  	if (!task)
>  		return ret;
> -	if (task->flags & PF_VCPU) {
> -		put_task_struct(task);
> -		return ret;
> -	}
>  	ret = yield_to(task, 1);
>  	put_task_struct(task);
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Hildenbrand Nov. 26, 2014, 9:23 a.m. UTC | #2
> This change is a trade-off.
> PRO: This patch would improve the case of preemption on s390. This is probably a corner case as most distros have preemption off anyway.
> CON: The downside is that kvm_vcpu_yield_to is called also from kvm_vcpu_on_spin. Here we want to avoid the scheduler overhead for a wrong decision.   

Won't most of that part be covered by:
	if (!ACCESS_ONCE(vcpu->preempted))

vcpu->preempted is only set when scheduled out involuntarily. It is cleared
when scheduled in. s390 sets it manually, to speed up waking up a vcpu.

So when our task is scheduled in (an PF_VCPU is set), this check will already
avoid scheduler overhead in kvm_vcpu_on_spin() or am I missing something?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christian Borntraeger Nov. 26, 2014, 9:31 a.m. UTC | #3
Am 26.11.2014 um 10:23 schrieb David Hildenbrand:
>> This change is a trade-off.
>> PRO: This patch would improve the case of preemption on s390. This is probably a corner case as most distros have preemption off anyway.
>> CON: The downside is that kvm_vcpu_yield_to is called also from kvm_vcpu_on_spin. Here we want to avoid the scheduler overhead for a wrong decision.   
> 
> Won't most of that part be covered by:
> 	if (!ACCESS_ONCE(vcpu->preempted))

Hmm, right. Checking vcpu->preempted and PF_VCPU might boil down to the same.
Would be good if to have to performance regression test, though. 

> 
> vcpu->preempted is only set when scheduled out involuntarily. It is cleared
> when scheduled in. s390 sets it manually, to speed up waking up a vcpu.
> 
> So when our task is scheduled in (an PF_VCPU is set), this check will already
> avoid scheduler overhead in kvm_vcpu_on_spin() or am I missing something?
> 

CC Raghavendra KT. Could be rerun your kernbench/sysbench/ebizzy setup on x86 to see if the patch in this thread causes any regression? If think your commit 7bc7ae25b143"kvm: Iterate over only vcpus that are preempted" might have really made the PF_VCPU check unnecessary

CC Michael Mueller, do we still have our yield performance setup handy to check if this patch causes any regression?


Christian

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Raghavendra K T Nov. 28, 2014, 10:08 a.m. UTC | #4
Was able to test the patch, here is the result: I have not tested with
bigger VMs though. Results make it difficult to talk about any side
effect of
patch if any.

System 16 core 32cpu (+ht) sandybridge
with 4 guests of 16vcpu each

+-----------+-----------+-----------+------------+-----------+
             kernbench (time taken lower is better)
+-----------+-----------+-----------+------------+-----------+
     base       %stdev      patched      %stdev    %improvement
+-----------+-----------+-----------+------------+-----------+
1x   53.1421     2.3086        54.6671     2.9673      -2.86966
2x   89.6858     6.4540        94.0626     6.8317      -4.88015
+-----------+-----------+-----------+------------+-----------+

+-----------+-----------+-----------+------------+-----------+
             ebizzy  (recors/sec higher is better)
+-----------+-----------+-----------+------------+-----------+
     base        %stdev          patched      %stdev    %improvement
+-----------+-----------+-----------+------------+-----------+
1x 14523.2500     8.4388    14928.8750     3.0478       2.79294
2x  3338.8750     1.4592     3270.8750     2.3980      -2.03661
+-----------+-----------+-----------+------------+-----------+
+-----------+-----------+-----------+------------+-----------+
             dbench  (Throughput higher is better)
+-----------+-----------+-----------+------------+-----------+
     base       %stdev           patched      %stdev    %improvement
+-----------+-----------+-----------+------------+-----------+
1x  6386.4737     1.0487    6703.9113     1.2298       4.97047
2x  2571.4712     1.3733    2571.8175     1.6919       0.01347
+-----------+-----------+-----------+------------+-----------+

Raghu

On Wed, Nov 26, 2014 at 3:01 PM, Christian Borntraeger
<borntraeger@de.ibm.com> wrote:
> Am 26.11.2014 um 10:23 schrieb David Hildenbrand:
>>> This change is a trade-off.
>>> PRO: This patch would improve the case of preemption on s390. This is probably a corner case as most distros have preemption off anyway.
>>> CON: The downside is that kvm_vcpu_yield_to is called also from kvm_vcpu_on_spin. Here we want to avoid the scheduler overhead for a wrong decision.
>>
>> Won't most of that part be covered by:
>>       if (!ACCESS_ONCE(vcpu->preempted))
>
> Hmm, right. Checking vcpu->preempted and PF_VCPU might boil down to the same.
> Would be good if to have to performance regression test, though.
>
>>
>> vcpu->preempted is only set when scheduled out involuntarily. It is cleared
>> when scheduled in. s390 sets it manually, to speed up waking up a vcpu.
>>
>> So when our task is scheduled in (an PF_VCPU is set), this check will already
>> avoid scheduler overhead in kvm_vcpu_on_spin() or am I missing something?
>>
>
> CC Raghavendra KT. Could be rerun your kernbench/sysbench/ebizzy setup on x86 to see if the patch in this thread causes any regression? If think your commit 7bc7ae25b143"kvm: Iterate over only vcpus that are preempted" might have really made the PF_VCPU check unnecessary
>
> CC Michael Mueller, do we still have our yield performance setup handy to check if this patch causes any regression?
>
>
> Christian
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christian Borntraeger Nov. 28, 2014, 10:58 a.m. UTC | #5
Am 28.11.2014 um 11:08 schrieb Raghavendra KT:
> Was able to test the patch, here is the result: I have not tested with
> bigger VMs though. Results make it difficult to talk about any side
> effect of
> patch if any.

Thanks a log.

If our assumption is correct, then this patch should have no side effect on x86. Do you have any confidence guess if the numbers below mean: no-change vs. regression vs improvement?

Christian


> 
> System 16 core 32cpu (+ht) sandybridge
> with 4 guests of 16vcpu each
> 
> +-----------+-----------+-----------+------------+-----------+
>              kernbench (time taken lower is better)
> +-----------+-----------+-----------+------------+-----------+
>      base       %stdev      patched      %stdev    %improvement
> +-----------+-----------+-----------+------------+-----------+
> 1x   53.1421     2.3086        54.6671     2.9673      -2.86966
> 2x   89.6858     6.4540        94.0626     6.8317      -4.88015
> +-----------+-----------+-----------+------------+-----------+
> 
> +-----------+-----------+-----------+------------+-----------+
>              ebizzy  (recors/sec higher is better)
> +-----------+-----------+-----------+------------+-----------+
>      base        %stdev          patched      %stdev    %improvement
> +-----------+-----------+-----------+------------+-----------+
> 1x 14523.2500     8.4388    14928.8750     3.0478       2.79294
> 2x  3338.8750     1.4592     3270.8750     2.3980      -2.03661
> +-----------+-----------+-----------+------------+-----------+
> +-----------+-----------+-----------+------------+-----------+
>              dbench  (Throughput higher is better)
> +-----------+-----------+-----------+------------+-----------+
>      base       %stdev           patched      %stdev    %improvement
> +-----------+-----------+-----------+------------+-----------+
> 1x  6386.4737     1.0487    6703.9113     1.2298       4.97047
> 2x  2571.4712     1.3733    2571.8175     1.6919       0.01347
> +-----------+-----------+-----------+------------+-----------+
> 
> Raghu
> 
> On Wed, Nov 26, 2014 at 3:01 PM, Christian Borntraeger
> <borntraeger@de.ibm.com> wrote:
>> Am 26.11.2014 um 10:23 schrieb David Hildenbrand:
>>>> This change is a trade-off.
>>>> PRO: This patch would improve the case of preemption on s390. This is probably a corner case as most distros have preemption off anyway.
>>>> CON: The downside is that kvm_vcpu_yield_to is called also from kvm_vcpu_on_spin. Here we want to avoid the scheduler overhead for a wrong decision.
>>>
>>> Won't most of that part be covered by:
>>>       if (!ACCESS_ONCE(vcpu->preempted))
>>
>> Hmm, right. Checking vcpu->preempted and PF_VCPU might boil down to the same.
>> Would be good if to have to performance regression test, though.
>>
>>>
>>> vcpu->preempted is only set when scheduled out involuntarily. It is cleared
>>> when scheduled in. s390 sets it manually, to speed up waking up a vcpu.
>>>
>>> So when our task is scheduled in (an PF_VCPU is set), this check will already
>>> avoid scheduler overhead in kvm_vcpu_on_spin() or am I missing something?
>>>
>>
>> CC Raghavendra KT. Could be rerun your kernbench/sysbench/ebizzy setup on x86 to see if the patch in this thread causes any regression? If think your commit 7bc7ae25b143"kvm: Iterate over only vcpus that are preempted" might have really made the PF_VCPU check unnecessary
>>
>> CC Michael Mueller, do we still have our yield performance setup handy to check if this patch causes any regression?
>>
>>
>> Christian
>>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Raghavendra K T Nov. 28, 2014, 11:40 a.m. UTC | #6
On 11/28/2014 04:28 PM, Christian Borntraeger wrote:
> Am 28.11.2014 um 11:08 schrieb Raghavendra KT:
>> Was able to test the patch, here is the result: I have not tested with
>> bigger VMs though. Results make it difficult to talk about any side
>> effect of
>> patch if any.
>
> Thanks a log.
>
> If our assumption is correct, then this patch should have no side effect on x86. Do you have any confidence guess if the numbers below mean: no-change vs. regression vs improvement?
>

I am seeing very small improvement in <= 1x commit cases
and for >1x overcommit, a very slight regression. But considering the
test environment noises, I do not see much effect from the
patch.

But I admit, I have not explored deeply about,
1. assumption of preempted approximately equals PF_VCPU case logic,
2. whether it helps for any future usages of yield_to against current
sole usage of virtualization.



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Hildenbrand Dec. 1, 2014, 9:54 a.m. UTC | #7
> On 11/28/2014 04:28 PM, Christian Borntraeger wrote:
> > Am 28.11.2014 um 11:08 schrieb Raghavendra KT:
> >> Was able to test the patch, here is the result: I have not tested with
> >> bigger VMs though. Results make it difficult to talk about any side
> >> effect of
> >> patch if any.
> >
> > Thanks a log.
> >
> > If our assumption is correct, then this patch should have no side effect on x86. Do you have any confidence guess if the numbers below mean: no-change vs. regression vs improvement?
> >
> 
> I am seeing very small improvement in <= 1x commit cases
> and for >1x overcommit, a very slight regression. But considering the
> test environment noises, I do not see much effect from the
> patch.
> 
> But I admit, I have not explored deeply about,
> 1. assumption of preempted approximately equals PF_VCPU case logic,

PF_VCPU is only a hint whether the target vcpu is executing the guest.
If preemption is off or !s390, PF_VCPU means that the target vcpu is running
and can't be preempted.

Although for preemption on and s390, this statement is false. Therefore this
check is not always right.

> 2. whether it helps for any future usages of yield_to against current
> sole usage of virtualization.
> 
> 
> 
Thanks for your test!

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini Dec. 3, 2014, 12:53 p.m. UTC | #8
On 28/11/2014 12:40, Raghavendra K T wrote:
> I am seeing very small improvement in <= 1x commit cases
> and for >1x overcommit, a very slight regression. But considering the
> test environment noises, I do not see much effect from the
> patch.

I think these results are the only one that could be statisically significant:

                   base     %stdev      patched     %stdev    %improvement
kernbench 1x    53.1421     2.3086      54.6671     2.9673      -2.86966
dbench    1x  6386.4737     1.0487    6703.9113     1.2298       4.97047

and, of course :) one of them says things get worse and the other
says things get better.

Paolo

> But I admit, I have not explored deeply about,
> 1. assumption of preempted approximately equals PF_VCPU case logic,
> 2. whether it helps for any future usages of yield_to against current
> sole usage of virtualization.
> 
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini Dec. 3, 2014, 1:02 p.m. UTC | #9
On 25/11/2014 17:04, David Hildenbrand wrote:
> As some architectures (e.g. s390) can't disable preemption while
> entering/leaving the guest, they won't receive the yield in all situations.
> 
> kvm_enter_guest() has to be called with preemption_disabled and will set
> PF_VCPU. After that point e.g. s390 reenables preemption and starts to execute the
> guest. The thread might therefore be scheduled out between kvm_enter_guest() and
> kvm_exit_guest(), resulting in PF_VCPU being set but not being run.
> 
> Please note that preemption has to stay enabled in order to correctly process
> page faults on s390.
> 
> Current code takes PF_VCPU as a hint that the VCPU thread is running and
> therefore needs no yield. yield_to() checks whether the target thread is running,
> so let's use the inbuilt functionality to make it independent of PF_VCPU and
> preemption.
> 
> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
> ---
>  virt/kvm/kvm_main.c | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 5b45330..184f52e 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1782,10 +1782,6 @@ int kvm_vcpu_yield_to(struct kvm_vcpu *target)
>  	rcu_read_unlock();
>  	if (!task)
>  		return ret;
> -	if (task->flags & PF_VCPU) {
> -		put_task_struct(task);
> -		return ret;
> -	}
>  	ret = yield_to(task, 1);
>  	put_task_struct(task);
>  
> 

Applied with a rewritten commit message:

KVM: don't check for PF_VCPU when yielding

kvm_enter_guest() has to be called with preemption disabled and will
set PF_VCPU.  Current code takes PF_VCPU as a hint that the VCPU thread
is running and therefore needs no yield.

However, the check on PF_VCPU is wrong on s390, where preemption
has to stay enabled on s390 in order to correctly process page faults.
Thus, s390 reenables preemption and starts to execute the guest.
The thread might be scheduled out between kvm_enter_guest() and
kvm_exit_guest(), resulting in PF_VCPU being set but not being run.
When this happens, the opportunity for directed yield is missed.

However, this check is done already in kvm_vcpu_on_spin before calling
kvm_vcpu_yield_loop:

        if (!ACCESS_ONCE(vcpu->preempted))
                continue;

so the check on PF_VCPU is superfluous in general, and this patch 
removes it.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Hildenbrand Dec. 3, 2014, 1:04 p.m. UTC | #10
> Applied with a rewritten commit message:
> 
> KVM: don't check for PF_VCPU when yielding
> 
> kvm_enter_guest() has to be called with preemption disabled and will
> set PF_VCPU.  Current code takes PF_VCPU as a hint that the VCPU thread
> is running and therefore needs no yield.
> 
> However, the check on PF_VCPU is wrong on s390, where preemption
> has to stay enabled on s390 in order to correctly process page faults.
> Thus, s390 reenables preemption and starts to execute the guest.
> The thread might be scheduled out between kvm_enter_guest() and
> kvm_exit_guest(), resulting in PF_VCPU being set but not being run.
> When this happens, the opportunity for directed yield is missed.
> 
> However, this check is done already in kvm_vcpu_on_spin before calling
> kvm_vcpu_yield_loop:
> 
>         if (!ACCESS_ONCE(vcpu->preempted))
>                 continue;
> 
> so the check on PF_VCPU is superfluous in general, and this patch 
> removes it.
> 
> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> 

Perfect, thanks!

David

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 5b45330..184f52e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1782,10 +1782,6 @@  int kvm_vcpu_yield_to(struct kvm_vcpu *target)
 	rcu_read_unlock();
 	if (!task)
 		return ret;
-	if (task->flags & PF_VCPU) {
-		put_task_struct(task);
-		return ret;
-	}
 	ret = yield_to(task, 1);
 	put_task_struct(task);