mbox series

[v4,0/3] High downtime with 95+ throttle pct

Message ID 20190723134215.25095-1-yury-kotov@yandex-team.ru (mailing list archive)
Headers show
Series High downtime with 95+ throttle pct | expand

Message

Yury Kotov July 23, 2019, 1:42 p.m. UTC
Hi,

V4:
* The test was simplified to prevent false fails.

V3:
* Rebase fixes (migrate_set_parameter -> migrate_set_parameter_int)

V2:
* Added a test
* Fixed qemu_cond_timedwait for qsp

I wrote a test for migration auto converge and found out a strange thing:
1. Enable auto converge
2. Set max-bandwidth 1Gb/s
3. Set downtime-limit 1ms
4. Run standard test (just writes a byte per page)
5. Wait for converge
6. It's converged with 99% throttle percentage
7. The result downtime was about 300-600ms   <<<<

It's much higher than expected 1ms. I figured out that cpu_throttle_thread()
function sleeps for 100ms+ for high throttle percentage (>=95%) in VCPU thread.
And it sleeps even after a cpu kick.

Fixed it by using timedwait for ms part of sleep.
E.g timedwait(halt_cond, 1ms) + usleep(500).

Regards,
Yury

Yury Kotov (3):
  qemu-thread: Add qemu_cond_timedwait
  cpus: Fix throttling during vm_stop
  tests/migration: Add a test for auto converge

 cpus.c                   |  27 +++++++---
 include/qemu/thread.h    |  18 +++++++
 tests/migration-test.c   | 103 ++++++++++++++++++++++++++++++++++-----
 util/qemu-thread-posix.c |  40 ++++++++++-----
 util/qemu-thread-win32.c |  16 ++++++
 util/qsp.c               |  18 +++++++
 6 files changed, 191 insertions(+), 31 deletions(-)

Comments

Yury Kotov Aug. 7, 2019, 7:42 a.m. UTC | #1
Ping

23.07.2019, 16:42, "Yury Kotov" <yury-kotov@yandex-team.ru>:
> Hi,
>
> V4:
> * The test was simplified to prevent false fails.
>
> V3:
> * Rebase fixes (migrate_set_parameter -> migrate_set_parameter_int)
>
> V2:
> * Added a test
> * Fixed qemu_cond_timedwait for qsp
>
> I wrote a test for migration auto converge and found out a strange thing:
> 1. Enable auto converge
> 2. Set max-bandwidth 1Gb/s
> 3. Set downtime-limit 1ms
> 4. Run standard test (just writes a byte per page)
> 5. Wait for converge
> 6. It's converged with 99% throttle percentage
> 7. The result downtime was about 300-600ms <<<<
>
> It's much higher than expected 1ms. I figured out that cpu_throttle_thread()
> function sleeps for 100ms+ for high throttle percentage (>=95%) in VCPU thread.
> And it sleeps even after a cpu kick.
>
> Fixed it by using timedwait for ms part of sleep.
> E.g timedwait(halt_cond, 1ms) + usleep(500).
>
> Regards,
> Yury
>
> Yury Kotov (3):
>   qemu-thread: Add qemu_cond_timedwait
>   cpus: Fix throttling during vm_stop
>   tests/migration: Add a test for auto converge
>
>  cpus.c | 27 +++++++---
>  include/qemu/thread.h | 18 +++++++
>  tests/migration-test.c | 103 ++++++++++++++++++++++++++++++++++-----
>  util/qemu-thread-posix.c | 40 ++++++++++-----
>  util/qemu-thread-win32.c | 16 ++++++
>  util/qsp.c | 18 +++++++
>  6 files changed, 191 insertions(+), 31 deletions(-)
>
> --
> 2.22.0
Yury Kotov Aug. 15, 2019, 9:13 a.m. UTC | #2
Ping ping

07.08.2019, 10:42, "Yury Kotov" <yury-kotov@yandex-team.ru>:
> Ping
>
> 23.07.2019, 16:42, "Yury Kotov" <yury-kotov@yandex-team.ru>:
>>  Hi,
>>
>>  V4:
>>  * The test was simplified to prevent false fails.
>>
>>  V3:
>>  * Rebase fixes (migrate_set_parameter -> migrate_set_parameter_int)
>>
>>  V2:
>>  * Added a test
>>  * Fixed qemu_cond_timedwait for qsp
>>
>>  I wrote a test for migration auto converge and found out a strange thing:
>>  1. Enable auto converge
>>  2. Set max-bandwidth 1Gb/s
>>  3. Set downtime-limit 1ms
>>  4. Run standard test (just writes a byte per page)
>>  5. Wait for converge
>>  6. It's converged with 99% throttle percentage
>>  7. The result downtime was about 300-600ms <<<<
>>
>>  It's much higher than expected 1ms. I figured out that cpu_throttle_thread()
>>  function sleeps for 100ms+ for high throttle percentage (>=95%) in VCPU thread.
>>  And it sleeps even after a cpu kick.
>>
>>  Fixed it by using timedwait for ms part of sleep.
>>  E.g timedwait(halt_cond, 1ms) + usleep(500).
>>
>>  Regards,
>>  Yury
>>
>>  Yury Kotov (3):
>>    qemu-thread: Add qemu_cond_timedwait
>>    cpus: Fix throttling during vm_stop
>>    tests/migration: Add a test for auto converge
>>
>>   cpus.c | 27 +++++++---
>>   include/qemu/thread.h | 18 +++++++
>>   tests/migration-test.c | 103 ++++++++++++++++++++++++++++++++++-----
>>   util/qemu-thread-posix.c | 40 ++++++++++-----
>>   util/qemu-thread-win32.c | 16 ++++++
>>   util/qsp.c | 18 +++++++
>>   6 files changed, 191 insertions(+), 31 deletions(-)
>>
>>  --
>>  2.22.0
Paolo Bonzini Aug. 19, 2019, 4:39 p.m. UTC | #3
On 15/08/19 11:13, Yury Kotov wrote:
> Ping ping

Hi,

sorry for the delay, I was waiting for the 4.1 release.

I would like to make a small change so that preemption of QEMU does not
result in overly long sleeps.  The following patch on top of yours computes
the throttle-end time just once.  Of course you can still be unlucky if
you are preempted at the wrong time, but the window becomes much smaller.

diff --git a/cpus.c b/cpus.c
index d091dbd..d7e2145 100644
--- a/cpus.c
+++ b/cpus.c
@@ -781,7 +781,7 @@ static void cpu_throttle_thread(CPUState *cpu, run_on_cpu_data opaque)
 {
     double pct;
     double throttle_ratio;
-    int64_t sleeptime_ns;
+    int64_t sleeptime_ns, end;
 
     if (!cpu_throttle_get_percentage()) {
         return;
@@ -792,18 +792,17 @@ static void cpu_throttle_thread(CPUState *cpu, run_on_cpu_data opaque)
     /* Add 1ns to fix double's rounding error (like 0.9999999...) */
     sleeptime_ns = (int64_t)(throttle_ratio * CPU_THROTTLE_TIMESLICE_NS + 1);
 
-    while (sleeptime_ns >= SCALE_MS && !cpu->stop) {
-        int64_t start, end;
-        start = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
-        qemu_cond_timedwait(cpu->halt_cond, &qemu_global_mutex,
-                            sleeptime_ns / SCALE_MS);
-        end = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
-        sleeptime_ns -= end - start;
-    }
-    if (sleeptime_ns >= SCALE_US && !cpu->stop) {
-        qemu_mutex_unlock_iothread();
-        g_usleep(sleeptime_ns / SCALE_US);
-        qemu_mutex_lock_iothread();
+    end = qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + sleeptime_ns;
+    while (sleeptime_ns > 0 && !cpu->stop) {
+        if (sleeptime_ns > SCALE_MS) {
+            qemu_cond_timedwait(cpu->halt_cond, &qemu_global_mutex,
+                                sleeptime_ns / SCALE_MS);
+        } else {
+            qemu_mutex_unlock_iothread();
+            g_usleep(sleeptime_ns / SCALE_US);
+            qemu_mutex_lock_iothread();
+        }
+        sleeptime_ns = end - qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
     }
     atomic_set(&cpu->throttle_thread_scheduled, 0);
 }
Paolo Bonzini Aug. 19, 2019, 5:11 p.m. UTC | #4
On 19/08/19 18:39, Paolo Bonzini wrote:
> On 15/08/19 11:13, Yury Kotov wrote:
>> Ping ping
> 
> Hi,
> 
> sorry for the delay, I was waiting for the 4.1 release.
> 
> I would like to make a small change so that preemption of QEMU does not
> result in overly long sleeps.  The following patch on top of yours computes
> the throttle-end time just once.  Of course you can still be unlucky if
> you are preempted at the wrong time, but the window becomes much smaller.

The new unit test is hanging for me on aarch64-softmmu.

Paolo

> diff --git a/cpus.c b/cpus.c
> index d091dbd..d7e2145 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -781,7 +781,7 @@ static void cpu_throttle_thread(CPUState *cpu, run_on_cpu_data opaque)
>  {
>      double pct;
>      double throttle_ratio;
> -    int64_t sleeptime_ns;
> +    int64_t sleeptime_ns, end;
>  
>      if (!cpu_throttle_get_percentage()) {
>          return;
> @@ -792,18 +792,17 @@ static void cpu_throttle_thread(CPUState *cpu, run_on_cpu_data opaque)
>      /* Add 1ns to fix double's rounding error (like 0.9999999...) */
>      sleeptime_ns = (int64_t)(throttle_ratio * CPU_THROTTLE_TIMESLICE_NS + 1);
>  
> -    while (sleeptime_ns >= SCALE_MS && !cpu->stop) {
> -        int64_t start, end;
> -        start = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
> -        qemu_cond_timedwait(cpu->halt_cond, &qemu_global_mutex,
> -                            sleeptime_ns / SCALE_MS);
> -        end = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
> -        sleeptime_ns -= end - start;
> -    }
> -    if (sleeptime_ns >= SCALE_US && !cpu->stop) {
> -        qemu_mutex_unlock_iothread();
> -        g_usleep(sleeptime_ns / SCALE_US);
> -        qemu_mutex_lock_iothread();
> +    end = qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + sleeptime_ns;
> +    while (sleeptime_ns > 0 && !cpu->stop) {
> +        if (sleeptime_ns > SCALE_MS) {
> +            qemu_cond_timedwait(cpu->halt_cond, &qemu_global_mutex,
> +                                sleeptime_ns / SCALE_MS);
> +        } else {
> +            qemu_mutex_unlock_iothread();
> +            g_usleep(sleeptime_ns / SCALE_US);
> +            qemu_mutex_lock_iothread();
> +        }
> +        sleeptime_ns = end - qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
>      }
>      atomic_set(&cpu->throttle_thread_scheduled, 0);
>  }
>
Yury Kotov Aug. 23, 2019, 8:27 a.m. UTC | #5
Hi,

19.08.2019, 20:11, "Paolo Bonzini" <pbonzini@redhat.com>:
> On 19/08/19 18:39, Paolo Bonzini wrote:
>>  On 15/08/19 11:13, Yury Kotov wrote:
>>>  Ping ping
>>
>>  Hi,
>>
>>  sorry for the delay, I was waiting for the 4.1 release.
>>
>>  I would like to make a small change so that preemption of QEMU does not
>>  result in overly long sleeps. The following patch on top of yours computes
>>  the throttle-end time just once. Of course you can still be unlucky if
>>  you are preempted at the wrong time, but the window becomes much smaller.

Thanks for your suggestion! I'll use it in the v5.

>
> The new unit test is hanging for me on aarch64-softmmu.
>

Ok, I'll try to fix it in v5.

> Paolo
>
>>  diff --git a/cpus.c b/cpus.c
>>  index d091dbd..d7e2145 100644
>>  --- a/cpus.c
>>  +++ b/cpus.c
>>  @@ -781,7 +781,7 @@ static void cpu_throttle_thread(CPUState *cpu, run_on_cpu_data opaque)
>>   {
>>       double pct;
>>       double throttle_ratio;
>>  - int64_t sleeptime_ns;
>>  + int64_t sleeptime_ns, end;
>>
>>       if (!cpu_throttle_get_percentage()) {
>>           return;
>>  @@ -792,18 +792,17 @@ static void cpu_throttle_thread(CPUState *cpu, run_on_cpu_data opaque)
>>       /* Add 1ns to fix double's rounding error (like 0.9999999...) */
>>       sleeptime_ns = (int64_t)(throttle_ratio * CPU_THROTTLE_TIMESLICE_NS + 1);
>>
>>  - while (sleeptime_ns >= SCALE_MS && !cpu->stop) {
>>  - int64_t start, end;
>>  - start = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
>>  - qemu_cond_timedwait(cpu->halt_cond, &qemu_global_mutex,
>>  - sleeptime_ns / SCALE_MS);
>>  - end = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
>>  - sleeptime_ns -= end - start;
>>  - }
>>  - if (sleeptime_ns >= SCALE_US && !cpu->stop) {
>>  - qemu_mutex_unlock_iothread();
>>  - g_usleep(sleeptime_ns / SCALE_US);
>>  - qemu_mutex_lock_iothread();
>>  + end = qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + sleeptime_ns;
>>  + while (sleeptime_ns > 0 && !cpu->stop) {
>>  + if (sleeptime_ns > SCALE_MS) {
>>  + qemu_cond_timedwait(cpu->halt_cond, &qemu_global_mutex,
>>  + sleeptime_ns / SCALE_MS);
>>  + } else {
>>  + qemu_mutex_unlock_iothread();
>>  + g_usleep(sleeptime_ns / SCALE_US);
>>  + qemu_mutex_lock_iothread();
>>  + }
>>  + sleeptime_ns = end - qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
>>       }
>>       atomic_set(&cpu->throttle_thread_scheduled, 0);
>>   }

Regards,
Yury