Message ID | 20210727111247.55510-1-lirongqing@baidu.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] KVM: use cpu_relax when halt polling | expand |
> SMT siblings share caches and other hardware, and busy halt polling > will degrade its sibling performance if its sibling is working > > Sean Christopherson suggested as below: > > "Rather than disallowing halt-polling entirely, on x86 it should be > sufficient to simply have the hardware thread yield to its sibling(s) > via PAUSE. It probably won't get back all performance, but I would > expect it to be close. > This compiles on all KVM architectures, and AFAICT the intended usage > of cpu_relax() is identical for all architectures." For sure change to cpu_relax() is better. Was just curious to know if you got descent performance improvement compared to previously reported with Unixbench. Thanks, Pankaj > > Suggested-by: Sean Christopherson <seanjc@google.com> > Signed-off-by: Li RongQing <lirongqing@baidu.com> > --- > diff v1: using cpu_relax, rather that stop halt-polling > > virt/kvm/kvm_main.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 7d95126..1679728 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -3110,6 +3110,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu) > ++vcpu->stat.generic.halt_poll_invalid; > goto out; > } > + cpu_relax(); > poll_end = cur = ktime_get(); > } while (kvm_vcpu_can_poll(cur, stop)); > } > -- > 2.9.4 >
> -----邮件原件----- > 发件人: Pankaj Gupta <pankaj.gupta.linux@gmail.com> > 发送时间: 2021年7月28日 2:16 > 收件人: Li,Rongqing <lirongqing@baidu.com> > 抄送: kvm@vger.kernel.org; Paolo Bonzini <pbonzini@redhat.com>; > seanjc@google.com > 主题: Re: [PATCH][v2] KVM: use cpu_relax when halt polling > > > SMT siblings share caches and other hardware, and busy halt polling > > will degrade its sibling performance if its sibling is working > > > > Sean Christopherson suggested as below: > > > > "Rather than disallowing halt-polling entirely, on x86 it should be > > sufficient to simply have the hardware thread yield to its sibling(s) > > via PAUSE. It probably won't get back all performance, but I would > > expect it to be close. > > This compiles on all KVM architectures, and AFAICT the intended usage > > of cpu_relax() is identical for all architectures." > > For sure change to cpu_relax() is better. > Was just curious to know if you got descent performance improvement > compared to previously reported with Unixbench. > > Thanks, > Pankaj The test as below: 1. run unixbench dhry2reg: ./Run -c 1 dhry2reg -i 1 without SMT disturbance, the score is 3172 with a {while(1)i++} SMT disturbance, the score is 1583 with a {while(1)(rep nop/pause)} SMT disturbance, the score is 1729.4 seems cpu_relax can not get back all performance , what wrong? 2. back to haltpoll run unixbench dhry2reg ./Run -c 1 dhry2reg -i 1 without SMT disturbance, the score is 3172 with redis-benchmark SMT disturbance, redis-benchmark takes 90%cpu: without patch, the score is 1776.9 with my first patch, the score is 1782.3 with cpu_relax patch, the score is 1778 with redis-benchmark SMT disturbance, redis-benchmark takes 33%cpu: without patch, the score is 1929.9 with my first patch, the score is 2294.6 with cpu_relax patch, the score is 2005.3 cpu_relax give less than stop halt polling, but it should have little effect for redis-benchmark which get benefit from halt polling -Li > > > > Suggested-by: Sean Christopherson <seanjc@google.com> > > Signed-off-by: Li RongQing <lirongqing@baidu.com> > > --- > > diff v1: using cpu_relax, rather that stop halt-polling > > > > virt/kvm/kvm_main.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index > > 7d95126..1679728 100644 > > --- a/virt/kvm/kvm_main.c > > +++ b/virt/kvm/kvm_main.c > > @@ -3110,6 +3110,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu) > > > ++vcpu->stat.generic.halt_poll_invalid; > > goto out; > > } > > + cpu_relax(); > > poll_end = cur = ktime_get(); > > } while (kvm_vcpu_can_poll(cur, stop)); > > } > > -- > > 2.9.4 > >
' > > > "Rather than disallowing halt-polling entirely, on x86 it should be > > > sufficient to simply have the hardware thread yield to its sibling(s) > > > via PAUSE. It probably won't get back all performance, but I would > > > expect it to be close. > > > This compiles on all KVM architectures, and AFAICT the intended usage > > > of cpu_relax() is identical for all architectures." > > > > For sure change to cpu_relax() is better. > > Was just curious to know if you got descent performance improvement > > compared to previously reported with Unixbench. > > > > Thanks, > > Pankaj > > The test as below: > > 1. run unixbench dhry2reg: ./Run -c 1 dhry2reg -i 1 > without SMT disturbance, the score is 3172 > with a {while(1)i++} SMT disturbance, the score is 1583 > with a {while(1)(rep nop/pause)} SMT disturbance, the score is 1729.4 > > seems cpu_relax can not get back all performance , what wrong? Maybe because of pause intercept filtering, comparatively Mayless VM Exits? > > > 2. back to haltpoll > run unixbench dhry2reg ./Run -c 1 dhry2reg -i 1 > without SMT disturbance, the score is 3172 > > with redis-benchmark SMT disturbance, redis-benchmark takes 90%cpu: > without patch, the score is 1776.9 > with my first patch, the score is 1782.3 > with cpu_relax patch, the score is 1778 > > with redis-benchmark SMT disturbance, redis-benchmark takes 33%cpu: > without patch, the score is 1929.9 > with my first patch, the score is 2294.6 > with cpu_relax patch, the score is 2005.3 > > > cpu_relax give less than stop halt polling, but it should have little effect for redis-benchmark which get benefit from halt polling We are seeing improvement with cpu_relax() though not to the level of stopping the halt polling when sibling CPU running redis workload. For 90% case I think its expected to have similar performance. For 33% stopping halt poll gives better result because of the workload. Overall I think this patch helps and not impact performance in normal cases. Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com> Best regards, Pankaj > > > -Li > > > > > > > Suggested-by: Sean Christopherson <seanjc@google.com> > > > Signed-off-by: Li RongQing <lirongqing@baidu.com> > > > --- > > > diff v1: using cpu_relax, rather that stop halt-polling > > > > > > virt/kvm/kvm_main.c | 1 + > > > 1 file changed, 1 insertion(+) > > > > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index > > > 7d95126..1679728 100644 > > > --- a/virt/kvm/kvm_main.c > > > +++ b/virt/kvm/kvm_main.c > > > @@ -3110,6 +3110,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu) > > > > > ++vcpu->stat.generic.halt_poll_invalid; > > > goto out; > > > } > > > + cpu_relax(); > > > poll_end = cur = ktime_get(); > > > } while (kvm_vcpu_can_poll(cur, stop)); > > > } > > > -- > > > 2.9.4 > > >
> -----邮件原件----- > 发件人: Pankaj Gupta <pankaj.gupta.linux@gmail.com> > 发送时间: 2021年7月28日 14:12 > 收件人: Li,Rongqing <lirongqing@baidu.com> > 抄送: kvm@vger.kernel.org; Paolo Bonzini <pbonzini@redhat.com>; > seanjc@google.com > 主题: Re: [PATCH][v2] KVM: use cpu_relax when halt polling > > ' > > > > "Rather than disallowing halt-polling entirely, on x86 it should > > > > be sufficient to simply have the hardware thread yield to its > > > > sibling(s) via PAUSE. It probably won't get back all performance, > > > > but I would expect it to be close. > > > > This compiles on all KVM architectures, and AFAICT the intended > > > > usage of cpu_relax() is identical for all architectures." > > > > > > For sure change to cpu_relax() is better. > > > Was just curious to know if you got descent performance improvement > > > compared to previously reported with Unixbench. > > > > > > Thanks, > > > Pankaj > > > > The test as below: > > > > 1. run unixbench dhry2reg: ./Run -c 1 dhry2reg -i 1 without SMT > > disturbance, the score is 3172 with a {while(1)i++} SMT disturbance, > > the score is 1583 with a {while(1)(rep nop/pause)} SMT disturbance, > > the score is 1729.4 > > > > seems cpu_relax can not get back all performance , what wrong? > > Maybe because of pause intercept filtering, comparatively Mayless VM Exits? > In vm; I retest it in bare metal, pause instruction works as expect, the score with "pause loop" disturbance is 2886; about 90% of no disturbance -Li > > > > > > 2. back to haltpoll > > run unixbench dhry2reg ./Run -c 1 dhry2reg -i 1 without SMT > > disturbance, the score is 3172 > > > > with redis-benchmark SMT disturbance, redis-benchmark takes 90%cpu: > > without patch, the score is 1776.9 > > with my first patch, the score is 1782.3 with cpu_relax patch, the > > score is 1778 > > > > with redis-benchmark SMT disturbance, redis-benchmark takes 33%cpu: > > without patch, the score is 1929.9 > > with my first patch, the score is 2294.6 with cpu_relax patch, the > > score is 2005.3 > > > > > > cpu_relax give less than stop halt polling, but it should have little > > effect for redis-benchmark which get benefit from halt polling > > We are seeing improvement with cpu_relax() though not to the level of stopping > the halt polling when sibling CPU running redis workload. For 90% case I think its > expected to have similar performance. > > For 33% stopping halt poll gives better result because of the workload. Overall I > think this patch helps and not impact performance in normal cases. > > Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com> > > Best regards, > Pankaj > > > > > > > > -Li > > > > > > > > > > Suggested-by: Sean Christopherson <seanjc@google.com> > > > > Signed-off-by: Li RongQing <lirongqing@baidu.com> > > > > --- > > > > diff v1: using cpu_relax, rather that stop halt-polling > > > > > > > > virt/kvm/kvm_main.c | 1 + > > > > 1 file changed, 1 insertion(+) > > > > > > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index > > > > 7d95126..1679728 100644 > > > > --- a/virt/kvm/kvm_main.c > > > > +++ b/virt/kvm/kvm_main.c > > > > @@ -3110,6 +3110,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu) > > > > > > > ++vcpu->stat.generic.halt_poll_invalid; > > > > goto out; > > > > } > > > > + cpu_relax(); > > > > poll_end = cur = ktime_get(); > > > > } while (kvm_vcpu_can_poll(cur, stop)); > > > > } > > > > -- > > > > 2.9.4 > > > >
diff v1: using cpu_relax, rather that stop halt-polling virt/kvm/kvm_main.c | 1 + 1 file changed, 1 insertion(+) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 7d95126..1679728 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3110,6 +3110,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu) ++vcpu->stat.generic.halt_poll_invalid; goto out; } + cpu_relax(); poll_end = cur = ktime_get(); } while (kvm_vcpu_can_poll(cur, stop)); }
SMT siblings share caches and other hardware, and busy halt polling will degrade its sibling performance if its sibling is working Sean Christopherson suggested as below: "Rather than disallowing halt-polling entirely, on x86 it should be sufficient to simply have the hardware thread yield to its sibling(s) via PAUSE. It probably won't get back all performance, but I would expect it to be close. This compiles on all KVM architectures, and AFAICT the intended usage of cpu_relax() is identical for all architectures." Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Li RongQing <lirongqing@baidu.com> ---