diff mbox

high CPU stolen time after live migrate

Message ID 0fb7b738-5637-4e9a-ad2e-6b61a894348a@default (mailing list archive)
State New, archived
Headers show

Commit Message

Dongli Zhang Oct. 8, 2017, 5:29 a.m. UTC
Hi Dario and Olivier,

I have just encountered this issue in the past. While the fix mentioned in the
link is effective, I assume the fix was derived from upstream linux and it will
introduce new error as mentioned below. 

While there is a kernel bug in the guest kernel, I think the root cause is at
the hypervisor side.

From my own test, the issue is reproducible even when migration a VM locally
within the same dom0. From the test, once guest VM is migrated,
RUNSTATE_offline time looks normal, while RUNSTATE_runnable is moving backward
and decreased. Therefore, the value returned by paravirt_steal_clock()
(actually xen_steal_clock()), which is equivalent to the sum of
RUNSTATE_offline and RUNSTATE_runnable, is decreased as well. However, the
kernel such as 4.8 could not handle this special situation correctly
as the code in cputime.c is not written specifically for xen hypervisor.

For kernel like v4.8-rc8, would something as below would be better?


This issue seems not getting totally fixed by most up-to-date upstream linux (I
have tested with 4.12.0-rc7). The issue in 4.12.0-rc7 is different. After live
migration, although the steal clock counter is not overflowed (become a very
large unsigned number), the steal clock counter in /proc/stat is moving
backward and decreased (e.g., from 329 to 311).

test@vm:~$ cat /proc/stat 
cpu  248 0 240 31197 893 0 1 329 0 0 
cpu0 248 0 240 31197 893 0 1 329 0 0 
intr 39051 16307 0 0 0 0 0 990 127 592 1004 1360 40 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 
ctxt 59400 
btime 1506731352 
processes 1877 
procs_running 1 
procs_blocked 0 
softirq 38903 0 15524 1227 6904 0 0 6 0 0 15242 

After live migration, steal counter in ubuntu guest running 4.12.0-rc7 was decreased to 311. 

test@vm:~$ cat /proc/stat 
cpu  251 0 242 31245 893 0 1 311 0 0 
cpu0 251 0 242 31245 893 0 1 311 0 0 
intr 39734 16404 0 0 0 0 0 1440 128 0 8 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 
ctxt 60880 
btime 1506731352 
processes 1882 
procs_running 3 
procs_blocked 0 
softirq 39195 0 15618 1286 6958 0 0 7 0 0 15326

I assume this is not an expected behavior. A different patch (similar to the one
I mentioned above) to upstream linux would fix this issue.

---------------------------------------------------------

Whatever the fix would be applied to guest kernel side, I think the root cause
is because xen hypervisor returns a RUNSTATE_runnable time less than the
previous one before live migration.

As I am not clear enough with xen scheduling, I do not understand why
RUNSTATE_runnable cputime is decreased after live migration.

Dongli Zhang 



----- Original Message -----
From: dario.faggioli@citrix.com
To: xen.list@daevel.fr, xen-users@lists.xensource.com
Cc: xen-devel@lists.xen.org
Sent: Tuesday, October 3, 2017 5:24:49 PM GMT +08:00 Beijing / Chongqing / Hong Kong / Urumqi
Subject: Re: [Xen-devel] high CPU stolen time after live migrate

On Mon, 2017-10-02 at 18:37 +0200, Olivier Bonvalet wrote:
> root! laussor:/proc# cat /proc/uptime 
> 652005.23 2631328.82
> 
> 
> Values for "stolen time" in /proc/stat seems impossible with only 7
> days of uptime.
> 
I think it can be this:
https://0xstubs.org/debugging-a-flaky-cpu-steal-time-counter-on-a-parav
irtualized-xen-guest/

What's the version of your guest kernel?

Dario

Comments

Jan Beulich Oct. 9, 2017, 8:26 a.m. UTC | #1
>>> On 08.10.17 at 07:29, <dongli.zhang@oracle.com> wrote:
> Whatever the fix would be applied to guest kernel side, I think the root cause
> is because xen hypervisor returns a RUNSTATE_runnable time less than the
> previous one before live migration.
> 
> As I am not clear enough with xen scheduling, I do not understand why
> RUNSTATE_runnable cputime is decreased after live migration.

Isn't this simply because accounting starts from zero again in the
new (migrated) domain? If so, that's nothing that ought to change,
it would still be the guest kernel responsible to take care of if it
matters to it.

Jan
diff mbox

Patch

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index a846cf8..3546e21 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -274,11 +274,17 @@  static __always_inline cputime_t steal_account_process_time(cputime_t maxtime)
 	if (static_key_false(&paravirt_steal_enabled)) {
 		cputime_t steal_cputime;
 		u64 steal;
+		s64 steal_diff;
 
 		steal = paravirt_steal_clock(smp_processor_id());
-		steal -= this_rq()->prev_steal_time;
+		steal_diff = steal - this_rq()->prev_steal_time;
 
-		steal_cputime = min(nsecs_to_cputime(steal), maxtime);
+		if (steal_diff < 0) {
+			this_rq()->prev_steal_time = steal;
+			return 0;
+		}
+
+		steal_cputime = min(nsecs_to_cputime(steal_diff), maxtime);
 		account_steal_time(steal_cputime);
 		this_rq()->prev_steal_time += cputime_to_nsecs(steal_cputime);