diff mbox series

[v2] KVM: use cpu_relax when halt polling

Message ID 20210727111247.55510-1-lirongqing@baidu.com (mailing list archive)
State New, archived
Headers show
Series [v2] KVM: use cpu_relax when halt polling | expand

Commit Message

Li RongQing July 27, 2021, 11:12 a.m. UTC
SMT siblings share caches and other hardware, and busy halt polling
will degrade its sibling performance if its sibling is working

Sean Christopherson suggested as below:

"Rather than disallowing halt-polling entirely, on x86 it should be
sufficient to simply have the hardware thread yield to its sibling(s)
via PAUSE.  It probably won't get back all performance, but I would
expect it to be close.
This compiles on all KVM architectures, and AFAICT the intended usage
of cpu_relax() is identical for all architectures."

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
---

Comments

Pankaj Gupta July 27, 2021, 6:15 p.m. UTC | #1
> SMT siblings share caches and other hardware, and busy halt polling
> will degrade its sibling performance if its sibling is working
>
> Sean Christopherson suggested as below:
>
> "Rather than disallowing halt-polling entirely, on x86 it should be
> sufficient to simply have the hardware thread yield to its sibling(s)
> via PAUSE.  It probably won't get back all performance, but I would
> expect it to be close.
> This compiles on all KVM architectures, and AFAICT the intended usage
> of cpu_relax() is identical for all architectures."

For sure change to cpu_relax() is better.
Was just curious to know if you got descent performance improvement compared
to previously reported with Unixbench.

Thanks,
Pankaj
>
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Li RongQing <lirongqing@baidu.com>
> ---
> diff v1: using cpu_relax, rather that stop halt-polling
>
>  virt/kvm/kvm_main.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 7d95126..1679728 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3110,6 +3110,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
>                                         ++vcpu->stat.generic.halt_poll_invalid;
>                                 goto out;
>                         }
> +                       cpu_relax();
>                         poll_end = cur = ktime_get();
>                 } while (kvm_vcpu_can_poll(cur, stop));
>         }
> --
> 2.9.4
>
Li RongQing July 28, 2021, 2:57 a.m. UTC | #2
> -----邮件原件-----
> 发件人: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> 发送时间: 2021年7月28日 2:16
> 收件人: Li,Rongqing <lirongqing@baidu.com>
> 抄送: kvm@vger.kernel.org; Paolo Bonzini <pbonzini@redhat.com>;
> seanjc@google.com
> 主题: Re: [PATCH][v2] KVM: use cpu_relax when halt polling
> 
> > SMT siblings share caches and other hardware, and busy halt polling
> > will degrade its sibling performance if its sibling is working
> >
> > Sean Christopherson suggested as below:
> >
> > "Rather than disallowing halt-polling entirely, on x86 it should be
> > sufficient to simply have the hardware thread yield to its sibling(s)
> > via PAUSE.  It probably won't get back all performance, but I would
> > expect it to be close.
> > This compiles on all KVM architectures, and AFAICT the intended usage
> > of cpu_relax() is identical for all architectures."
> 
> For sure change to cpu_relax() is better.
> Was just curious to know if you got descent performance improvement
> compared to previously reported with Unixbench.
> 
> Thanks,
> Pankaj

The test as below:

1. run unixbench dhry2reg:  ./Run -c 1 dhry2reg -i 1
without SMT disturbance, the score is 3172
with a  {while(1)i++} SMT disturbance,  the score is 1583
with a  {while(1)(rep nop/pause)} SMT disturbance,  the score is 1729.4

seems cpu_relax can not get back all performance , what wrong?


2. back to haltpoll
run unixbench dhry2reg ./Run -c 1 dhry2reg -i 1
without SMT disturbance, the score is 3172

with redis-benchmark SMT disturbance, redis-benchmark takes 90%cpu:
without patch, the score is 1776.9
with my first patch, the score is 1782.3
with cpu_relax patch, the score is 1778

with redis-benchmark SMT disturbance, redis-benchmark takes 33%cpu:
without patch, the score is 1929.9
with my first patch, the score is 2294.6
with cpu_relax patch, the score is 2005.3


cpu_relax give less than stop halt polling, but it should have little effect for redis-benchmark which get benefit from halt polling 


-Li

> >
> > Suggested-by: Sean Christopherson <seanjc@google.com>
> > Signed-off-by: Li RongQing <lirongqing@baidu.com>
> > ---
> > diff v1: using cpu_relax, rather that stop halt-polling
> >
> >  virt/kvm/kvm_main.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index
> > 7d95126..1679728 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -3110,6 +3110,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
> >
> ++vcpu->stat.generic.halt_poll_invalid;
> >                                 goto out;
> >                         }
> > +                       cpu_relax();
> >                         poll_end = cur = ktime_get();
> >                 } while (kvm_vcpu_can_poll(cur, stop));
> >         }
> > --
> > 2.9.4
> >
Pankaj Gupta July 28, 2021, 6:12 a.m. UTC | #3
'
> > > "Rather than disallowing halt-polling entirely, on x86 it should be
> > > sufficient to simply have the hardware thread yield to its sibling(s)
> > > via PAUSE.  It probably won't get back all performance, but I would
> > > expect it to be close.
> > > This compiles on all KVM architectures, and AFAICT the intended usage
> > > of cpu_relax() is identical for all architectures."
> >
> > For sure change to cpu_relax() is better.
> > Was just curious to know if you got descent performance improvement
> > compared to previously reported with Unixbench.
> >
> > Thanks,
> > Pankaj
>
> The test as below:
>
> 1. run unixbench dhry2reg:  ./Run -c 1 dhry2reg -i 1
> without SMT disturbance, the score is 3172
> with a  {while(1)i++} SMT disturbance,  the score is 1583
> with a  {while(1)(rep nop/pause)} SMT disturbance,  the score is 1729.4
>
> seems cpu_relax can not get back all performance , what wrong?

Maybe because of pause intercept filtering, comparatively Mayless VM Exits?

>
>
> 2. back to haltpoll
> run unixbench dhry2reg ./Run -c 1 dhry2reg -i 1
> without SMT disturbance, the score is 3172
>
> with redis-benchmark SMT disturbance, redis-benchmark takes 90%cpu:
> without patch, the score is 1776.9
> with my first patch, the score is 1782.3
> with cpu_relax patch, the score is 1778
>
> with redis-benchmark SMT disturbance, redis-benchmark takes 33%cpu:
> without patch, the score is 1929.9
> with my first patch, the score is 2294.6
> with cpu_relax patch, the score is 2005.3
>
>
> cpu_relax give less than stop halt polling, but it should have little effect for redis-benchmark which get benefit from halt polling

We are seeing improvement with cpu_relax() though not to the level of
stopping the halt polling when sibling
CPU running redis workload. For 90% case I think its expected to have
similar performance.

For 33% stopping halt poll gives better result because of the
workload. Overall I think this patch helps and not impact
performance in normal cases.

Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>

Best regards,
Pankaj


>
>
> -Li
>
> > >
> > > Suggested-by: Sean Christopherson <seanjc@google.com>
> > > Signed-off-by: Li RongQing <lirongqing@baidu.com>
> > > ---
> > > diff v1: using cpu_relax, rather that stop halt-polling
> > >
> > >  virt/kvm/kvm_main.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index
> > > 7d95126..1679728 100644
> > > --- a/virt/kvm/kvm_main.c
> > > +++ b/virt/kvm/kvm_main.c
> > > @@ -3110,6 +3110,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
> > >
> > ++vcpu->stat.generic.halt_poll_invalid;
> > >                                 goto out;
> > >                         }
> > > +                       cpu_relax();
> > >                         poll_end = cur = ktime_get();
> > >                 } while (kvm_vcpu_can_poll(cur, stop));
> > >         }
> > > --
> > > 2.9.4
> > >
Li RongQing July 28, 2021, 6:46 a.m. UTC | #4
> -----邮件原件-----
> 发件人: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> 发送时间: 2021年7月28日 14:12
> 收件人: Li,Rongqing <lirongqing@baidu.com>
> 抄送: kvm@vger.kernel.org; Paolo Bonzini <pbonzini@redhat.com>;
> seanjc@google.com
> 主题: Re: [PATCH][v2] KVM: use cpu_relax when halt polling
> 
> '
> > > > "Rather than disallowing halt-polling entirely, on x86 it should
> > > > be sufficient to simply have the hardware thread yield to its
> > > > sibling(s) via PAUSE.  It probably won't get back all performance,
> > > > but I would expect it to be close.
> > > > This compiles on all KVM architectures, and AFAICT the intended
> > > > usage of cpu_relax() is identical for all architectures."
> > >
> > > For sure change to cpu_relax() is better.
> > > Was just curious to know if you got descent performance improvement
> > > compared to previously reported with Unixbench.
> > >
> > > Thanks,
> > > Pankaj
> >
> > The test as below:
> >
> > 1. run unixbench dhry2reg:  ./Run -c 1 dhry2reg -i 1 without SMT
> > disturbance, the score is 3172 with a  {while(1)i++} SMT disturbance,
> > the score is 1583 with a  {while(1)(rep nop/pause)} SMT disturbance,
> > the score is 1729.4
> >
> > seems cpu_relax can not get back all performance , what wrong?
> 
> Maybe because of pause intercept filtering, comparatively Mayless VM Exits?
> 

In vm;

I retest it in bare metal, pause instruction works as expect, the score with "pause loop" disturbance is 2886; about 90% of no disturbance

-Li

> >
> >
> > 2. back to haltpoll
> > run unixbench dhry2reg ./Run -c 1 dhry2reg -i 1 without SMT
> > disturbance, the score is 3172
> >
> > with redis-benchmark SMT disturbance, redis-benchmark takes 90%cpu:
> > without patch, the score is 1776.9
> > with my first patch, the score is 1782.3 with cpu_relax patch, the
> > score is 1778
> >
> > with redis-benchmark SMT disturbance, redis-benchmark takes 33%cpu:
> > without patch, the score is 1929.9
> > with my first patch, the score is 2294.6 with cpu_relax patch, the
> > score is 2005.3
> >
> >
> > cpu_relax give less than stop halt polling, but it should have little
> > effect for redis-benchmark which get benefit from halt polling
> 
> We are seeing improvement with cpu_relax() though not to the level of stopping
> the halt polling when sibling CPU running redis workload. For 90% case I think its
> expected to have similar performance.
> 
> For 33% stopping halt poll gives better result because of the workload. Overall I
> think this patch helps and not impact performance in normal cases.
> 
> Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
> 
> Best regards,
> Pankaj
> 
> 
> >
> >
> > -Li
> >
> > > >
> > > > Suggested-by: Sean Christopherson <seanjc@google.com>
> > > > Signed-off-by: Li RongQing <lirongqing@baidu.com>
> > > > ---
> > > > diff v1: using cpu_relax, rather that stop halt-polling
> > > >
> > > >  virt/kvm/kvm_main.c | 1 +
> > > >  1 file changed, 1 insertion(+)
> > > >
> > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index
> > > > 7d95126..1679728 100644
> > > > --- a/virt/kvm/kvm_main.c
> > > > +++ b/virt/kvm/kvm_main.c
> > > > @@ -3110,6 +3110,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
> > > >
> > > ++vcpu->stat.generic.halt_poll_invalid;
> > > >                                 goto out;
> > > >                         }
> > > > +                       cpu_relax();
> > > >                         poll_end = cur = ktime_get();
> > > >                 } while (kvm_vcpu_can_poll(cur, stop));
> > > >         }
> > > > --
> > > > 2.9.4
> > > >
diff mbox series

Patch

diff v1: using cpu_relax, rather that stop halt-polling

 virt/kvm/kvm_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 7d95126..1679728 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3110,6 +3110,7 @@  void kvm_vcpu_block(struct kvm_vcpu *vcpu)
 					++vcpu->stat.generic.halt_poll_invalid;
 				goto out;
 			}
+			cpu_relax();
 			poll_end = cur = ktime_get();
 		} while (kvm_vcpu_can_poll(cur, stop));
 	}