From patchwork Thu Mar 12 13:44:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Dario Faggioli X-Patchwork-Id: 11434305 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9927E14B4 for ; Thu, 12 Mar 2020 13:45:32 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7CC0A2067C for ; Thu, 12 Mar 2020 13:45:32 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7CC0A2067C Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=xen-devel-bounces@lists.xenproject.org Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1jCO8B-0002Sh-UW; Thu, 12 Mar 2020 13:44:11 +0000 Received: from us1-rack-iad1.inumbo.com ([172.99.69.81]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1jCO8A-0002Sc-Nf for xen-devel@lists.xenproject.org; Thu, 12 Mar 2020 13:44:10 +0000 X-Inumbo-ID: 8cd4c5b2-6467-11ea-bec1-bc764e2007e4 Received: from mail-wm1-f65.google.com (unknown [209.85.128.65]) by us1-rack-iad1.inumbo.com (Halon) with ESMTPS id 8cd4c5b2-6467-11ea-bec1-bc764e2007e4; Thu, 12 Mar 2020 13:44:10 +0000 (UTC) Received: by mail-wm1-f65.google.com with SMTP id e26so6352928wme.5 for ; Thu, 12 Mar 2020 06:44:09 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:user-agent :mime-version:content-transfer-encoding; bh=rvSgpMRc10wjqlEjUeYgDJmB8sVCTMYc8+s0onQKcm4=; b=gofcBWkjQxO1d5l33+MMxe7Y3FAf/k/eUaApjx5fCTtBeRMJEI7fNTgcrak1y0RyBO cGxtKkxmiaoetzUbX7rIvoxoZv4nUC9laIy52YaaVS/RJ3DWTmUZGn67CUXRQNaYBxM8 0wIieeM9DNjGYlie/aerZPmVAAzHDcq+KppOCqak78dFQUHaweOpUJeOmfBdx+PwVP1k +kq7libI1Yv+NNsnEY9SwE791PqYvA3L1WPu8aEMor01gNSOEuzziAyklSrwqL7tEi1I z+mDODsCWS4GkhYRf2Hdbt05Uz82f3osvE5KtNSIZiX/w9TWJflceippzM60yh2pSyhX mxkA== X-Gm-Message-State: ANhLgQ0DpfGHd9Cku4b8cKpGXzjOycDXMZ9IQkC3ihpwlcmw8ZhIDqFC B1wCUlXHF7pFaYcr1Dx/Stg= X-Google-Smtp-Source: ADFU+vv/EiJPH20iAFaXi5JZDEj4Hmupi6jhfYiUNaG4sMdSrIdKz5IlHg2V1EdZov/HDXqXXpG8pg== X-Received: by 2002:a1c:a78a:: with SMTP id q132mr5004398wme.107.1584020648989; Thu, 12 Mar 2020 06:44:08 -0700 (PDT) Received: from [192.168.0.36] (87.78.186.89.cust.ip.kpnqwest.it. [89.186.78.87]) by smtp.gmail.com with ESMTPSA id a186sm12590829wmh.33.2020.03.12.06.44.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 12 Mar 2020 06:44:08 -0700 (PDT) From: Dario Faggioli To: xen-devel@lists.xenproject.org Date: Thu, 12 Mar 2020 14:44:07 +0100 Message-ID: <158402056376.753.7091379488590272336.stgit@Palanthas> User-Agent: StGit/0.21 MIME-Version: 1.0 Subject: [Xen-devel] [PATCH 0/2] xen: credit2: fix vcpu starvation due to too few credits X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Juergen Gross , Charles Arnold , Jan Beulich , Glen , George Dunlap , Tomas Mozes , Sarah Newman Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" Hello everyone, There have been reports of a Credit2 issue due to which vCPUs where being starved, to the point that guest kernel would complain or even crash. See the following xen-users and xen-devel threads: https://lists.xenproject.org/archives/html/xen-users/2020-02/msg00018.html https://lists.xenproject.org/archives/html/xen-users/2020-02/msg00015.html https://lists.xenproject.org/archives/html/xen-devel/2020-02/msg01158.html I did some investigations, and figured out that the vCPUs in question are not scheduled for long time intervals because they somehow manage to be given an amount of credits which is less than the credit the idle vCPU has. An example of this situation is shown here. In fact, we can see d0v1 sitting in the runqueue while all the CPUs are idle, as it has -1254238270 credits, which is smaller than -2^30 = −1073741824: (XEN) Runqueue 0: (XEN) ncpus = 28 (XEN) cpus = 0-27 (XEN) max_weight = 256 (XEN) pick_bias = 22 (XEN) instload = 1 (XEN) aveload = 293391 (~111%) (XEN) idlers: 00,00000000,00000000,00000000,00000000,00000000,0fffffff (XEN) tickled: 00,00000000,00000000,00000000,00000000,00000000,00000000 (XEN) fully idle cores: 00,00000000,00000000,00000000,00000000,00000000,0fffffff [...] (XEN) Runqueue 0: (XEN) CPU[00] runq=0, sibling=00,..., core=00,... (XEN) CPU[01] runq=0, sibling=00,..., core=00,... [...] (XEN) CPU[26] runq=0, sibling=00,..., core=00,... (XEN) CPU[27] runq=0, sibling=00,..., core=00,... (XEN) RUNQ: (XEN) 0: [0.1] flags=0 cpu=5 credit=-1254238270 [w=256] load=262144 (~100%) This happens bacause --although very rarely-- vCPUs are allowed to execute for much more than the scheduler would want them to. For example, I have a trace showing that csched2_schedule() is invoked at t=57970746155ns. At t=57970747658ns (+1503ns) the s_timer is set to fire at t=57979485083ns, i.e., 8738928ns in future. That's because credit of snext is exactly that 8738928ns. Then, what I see is that the next call to burn_credits(), coming from csched2_schedule() for the same vCPU happens at t=60083283617ns. That is *a lot* (2103798534ns) later than when we expected and asked. Of course, that also means that delta is 2112537462ns, and therefore credits will sink to -2103798534! Also, to the best of my current knowledge, this does not look like Credit2 related, as I've observed it when running with Credit1 as well. I personally don't think it would be scheduling related, in general, but I need to do more investigation to be sure about that (and/or to figure out what the real root cause is). The reason why Credit2 is affected much more than Credit1 is because of how time accounting is done. Basically, there's very rudimental time accounting in Credit1, which is a very bad thing, IMO, but indeed that is also what prevented for this issue to cause severe stalls. One more thing is that Credit2 gives -2^30 credits to the idle vCPU, which was considered to be low enough, which is true. But it's not a robust choice, should an issue like the one we're discussing occur, which is happening. :-) Therefore, I think we should lower the credits of the idle vCPU to the minimum possible value, so that even under whatever unusual or weird or buggy situations like this one, we will never pick idle instead of an actual vCPU that is ready to run. This is what is done in the first patch of this series. This is a robustness improvement and a fix (or at least the best way we can deal with the it within the scheduler) for the issue at hand. It therefore should be backported. While looking into this, I also have found out that there is an actual bug in Credit2 code. It is something I introduced myself with commit 5e4b4199667b9 ("xen: credit2: only reset credit on reset condition"). In fact, while it was and still is a good idea to avoid resetting credits too often, the implementation of this was just wrong. A fix for this bug is what is contained in patch 2. And it also should be backported. Note that patch 2 alone was also already mitigating the stall/starvation issue quite substantially. Nevertheless, the proper fix for the issue itself is making Credit2 more robust against similar problem, as done in patch 1, while this other bug just happens to be something which interact with the sympthoms. This to say that, although both patches will be bugported, asboth are actual bugfixes, if there is the need to apply something "in emergency" to fix the starvation problem, applying only patch 1 is enough. Thanks and Regards --- Dario Faggioli (2): xen: credit2: avoid vCPUs to ever reach lower credits than idle xen: credit2: fix credit reset happening too few times tools/xentrace/formats | 2 +- tools/xentrace/xenalyze.c | 8 +++----- xen/common/sched/credit2.c | 32 ++++++++++++++------------------ 3 files changed, 18 insertions(+), 24 deletions(-) -- Dario Faggioli, Ph.D http://about.me/dario.faggioli Virtualization Software Engineer SUSE Labs, SUSE https://www.suse.com/ ------------------------------------------------------------------- <> (Raistlin Majere)