From patchwork Wed Jun 8 03:05:09 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Wanpeng Li X-Patchwork-Id: 9163259 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id AC90460467 for ; Wed, 8 Jun 2016 03:06:35 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A18EA2830C for ; Wed, 8 Jun 2016 03:06:35 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 932592836E; Wed, 8 Jun 2016 03:06:35 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED,FREEMAIL_FROM,RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2CA052830C for ; Wed, 8 Jun 2016 03:06:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755784AbcFHDF5 (ORCPT ); Tue, 7 Jun 2016 23:05:57 -0400 Received: from mail-pf0-f194.google.com ([209.85.192.194]:36800 "EHLO mail-pf0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755776AbcFHDFz (ORCPT ); Tue, 7 Jun 2016 23:05:55 -0400 Received: by mail-pf0-f194.google.com with SMTP id 62so21886093pfd.3; Tue, 07 Jun 2016 20:05:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=QKIE32Zt+QvYzIk6MQKF3Gl313NpxmcXezBtwEF/BVI=; b=nx39iute2nT67uDK6avobhaluVUs/+tKZA4STvCbbeMWUIEJx9IZX0o8dfRSdmP7H+ O2Fcz+JUGwtnU7GeM7okTrfI/FYn0PJAL0kwzBoterDWqxLQlcPzug7wEPf7/7irc2c3 LnMCzST0wfe8GPvlt8DFtXpn2g39Rr0Ur9hScQB1x2uBK6SFfNzbLxLjlaNNUhwrik7v jvJOZ2HQUDyoqSubeXIfavNon9SmhPWec8Tt53KfpOboKQBqFixpLUPjZ6xUlwtQVJt6 hss+kcX935NKTJ3FlvvjmRKRrI7AKoSxdpsQQ9g5YvSmx1RLaKHIfUomkzDJqYVnxyFn 7/6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=QKIE32Zt+QvYzIk6MQKF3Gl313NpxmcXezBtwEF/BVI=; b=FulBmCgFiMokkhjDqJ3SkR2B51TmEF7RaMkEukB5ua44MBHrvDoQ/wMeWF96DGyfHe BjSaE65tajC7wloikvGOGHevrx1dlAXByBDo1unQko9Fh8qCqrUFB6xNSQLMvbZo3sxt pvUvMzpEs0soZKihSNdTqbO1PlCH2zwgdxfr4mzdTMtHpfJ5d9MMEXtYkQWdhxTsGAnE VNFlnaseGEzA9zPgIRraMqBL6fIGm4F9wB4wWe9sGMxFubC/XU+2VJ4Yhr8gWbmmgml/ 1ardZ1ZbtSQVh48nIFu3CdngTxtFyt3WJhBOYhUcB94mgFaQ4LAC7ay9gJIcKLR8qyeh plRg== X-Gm-Message-State: ALyK8tI6T+OgewHmjH9rfipEMMbRDUjAqzJx/oZ74RkEX10/ul+4Kv5Qp9Jm70naB5iKLQ== X-Received: by 10.98.31.92 with SMTP id f89mr3056483pff.134.1465355154494; Tue, 07 Jun 2016 20:05:54 -0700 (PDT) Received: from kernel.kingsoft.cn ([219.141.176.229]) by smtp.gmail.com with ESMTPSA id y70sm2368643pff.25.2016.06.07.20.05.51 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Tue, 07 Jun 2016 20:05:54 -0700 (PDT) From: Wanpeng Li X-Google-Original-From: Wanpeng Li To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Wanpeng Li , Ingo Molnar , "Peter Zijlstra (Intel)" , Rik van Riel , Thomas Gleixner , Frederic Weisbecker , Paolo Bonzini , =?UTF-8?q?Radim=20Kr=C4=8Dm=C3=A1=C5=99?= Subject: [PATCH v5 2/3] sched/cputime: Fix prev steal time accouting during cpu hotplug Date: Wed, 8 Jun 2016 11:05:09 +0800 Message-Id: <1465355110-21714-2-git-send-email-wanpeng.li@hotmail.com> X-Mailer: git-send-email 1.9.1 In-Reply-To: <1465355110-21714-1-git-send-email-wanpeng.li@hotmail.com> References: <1465355110-21714-1-git-send-email-wanpeng.li@hotmail.com> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Wanpeng Li Commit e9532e69b8d1 ("sched/cputime: Fix steal time accounting vs. CPU hotplug") set rq->prev_* to 0 after a cpu hotplug comes back in order to fix the scenario: | steal is smaller than rq->prev_steal_time we end up with an insane large | value which then gets added to rq->prev_steal_time, resulting in a permanent | wreckage of the accounting. However, it is still buggy. rq->prev_steal_time = 0: As Rik pointed out: | setting rq->prev_irq_time to 0 in the guest, and then getting a giant value from | the host, could result in a very large of steal_jiffies. rq->prev_steal_time_rq = 0: | steal = paravirt_steal_clock(cpu_of(rq)); | steal -= rq->prev_steal_time_rq; | | if (unlikely(steal > delta)) | steal = delta; | | rq->prev_steal_time_rq += steal; | delta -= steal; | | rq->clock_task += delta; steal is a giant value and rq->prev_steal_time_rq is 0, rq->prev_steal_time_rq grows in delta granularity, rq->clock_task can't ramp up until rq->prev_steal_time_rq catches up steal clock since delta value will be 0 after reducing steal time from normal execution time. That's why I obersved that cpuhg/1-12 continue running until rq->prev_steal_time_rq catches up steal clock timestamp. I believe rq->prev_irq_time has similar issue. So this patch fix it by reverting commit e9532e69b8d1. Fixes: 'commit e9532e69b8d1 ("sched/cputime: Fix steal time accounting vs. CPU hotplug")' Acked-by: Rik van Riel Cc: Ingo Molnar Cc: Peter Zijlstra (Intel) Cc: Rik van Riel Cc: Thomas Gleixner Cc: Frederic Weisbecker Cc: Paolo Bonzini Cc: Radim Krčmář Signed-off-by: Wanpeng Li --- v4 -> v5: * revert commit e9532e69b8d1 kernel/sched/core.c | 1 - kernel/sched/sched.h | 13 ------------- 2 files changed, 14 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 7f2cae4..7d45bb3 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -7213,7 +7213,6 @@ static void sched_rq_cpu_starting(unsigned int cpu) struct rq *rq = cpu_rq(cpu); rq->calc_load_update = calc_load_update; - account_reset_rq(rq); update_max_interval(); } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 72f1f30..de607e4 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1809,16 +1809,3 @@ static inline void cpufreq_trigger_update(u64 time) {} #else /* arch_scale_freq_capacity */ #define arch_scale_freq_invariant() (false) #endif - -static inline void account_reset_rq(struct rq *rq) -{ -#ifdef CONFIG_IRQ_TIME_ACCOUNTING - rq->prev_irq_time = 0; -#endif -#ifdef CONFIG_PARAVIRT - rq->prev_steal_time = 0; -#endif -#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING - rq->prev_steal_time_rq = 0; -#endif -}