From patchwork Fri Jun 28 02:57:23 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Tosatti X-Patchwork-Id: 2796791 Return-Path: X-Original-To: patchwork-kvm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 5BE9DBF4A1 for ; Fri, 28 Jun 2013 02:58:49 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 8800820132 for ; Fri, 28 Jun 2013 02:58:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 9D56E20131 for ; Fri, 28 Jun 2013 02:58:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754101Ab3F1C6b (ORCPT ); Thu, 27 Jun 2013 22:58:31 -0400 Received: from mx1.redhat.com ([209.132.183.28]:46749 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753743Ab3F1C6a (ORCPT ); Thu, 27 Jun 2013 22:58:30 -0400 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r5S2wMAP008984 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 27 Jun 2013 22:58:24 -0400 Received: from amt.cnet (vpn1-6-237.gru2.redhat.com [10.97.6.237]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id r5S2wKmA012754; Thu, 27 Jun 2013 22:58:21 -0400 Received: from amt.cnet (localhost [127.0.0.1]) by amt.cnet (Postfix) with ESMTP id 11FFA104A79; Thu, 27 Jun 2013 23:57:25 -0300 (BRT) Received: (from marcelo@localhost) by amt.cnet (8.14.6/8.14.6/Submit) id r5S2vNIW012948; Thu, 27 Jun 2013 23:57:23 -0300 Date: Thu, 27 Jun 2013 23:57:23 -0300 From: Marcelo Tosatti To: kvm-devel , linux-kernel@vger.kernel.org Cc: Karen Noel , Rik van Riel , Don Zickus , Prarit Bhargava , Thomas Gleixner Subject: watchdog: print stolen time increment at softlockup detection Message-ID: <20130628025723.GA12719@amt.cnet> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Spam-Status: No, score=-8.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP One possibility for a softlockup report in a Linux VM, is that the host system is overcommitted to the point where the watchdog task is unable to make progress (unable to touch the watchdog). Maintain the increment in stolen time for the period of softlockup threshold detection (20 seconds by the default), and report this increment in the softlockup message. Overcommitment is then indicated by a large stolen time increment, accounting for more than, or for a significant percentage of the softlockup threshold. Signed-off-by: Marcelo Tosatti --- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 05039e3..ed09d58 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -34,6 +34,8 @@ int __read_mostly watchdog_thresh = 10; static int __read_mostly watchdog_disabled; static u64 __read_mostly sample_period; +#define SOFT_INTRS_PER_PERIOD 5 + static DEFINE_PER_CPU(unsigned long, watchdog_touch_ts); static DEFINE_PER_CPU(struct task_struct *, softlockup_watchdog); static DEFINE_PER_CPU(struct hrtimer, watchdog_hrtimer); @@ -127,9 +129,51 @@ static void set_sample_period(void) * and hard thresholds) to increment before the * hardlockup detector generates a warning */ - sample_period = get_softlockup_thresh() * ((u64)NSEC_PER_SEC / 5); + sample_period = get_softlockup_thresh() * + ((u64)NSEC_PER_SEC / SOFT_INTRS_PER_PERIOD); } +#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING +struct steal_clock_record { + u64 prev_stolen_time; + u64 stolen_time[SOFT_INTRS_PER_PERIOD]; + int idx; +}; + +static DEFINE_PER_CPU(struct steal_clock_record, steal_record); +static void record_steal_time(void) +{ + struct steal_clock_record *r; + int cpu = smp_processor_id(); + u64 steal_time; + r = &per_cpu(steal_record, cpu); + + steal_time = paravirt_steal_clock(cpu); + r->stolen_time[r->idx] = steal_time - r->prev_stolen_time; + r->idx++; + if (r->idx == SOFT_INTRS_PER_PERIOD) + r->idx = 0; + r->prev_stolen_time = steal_time; +} + +static unsigned int get_accumulated_steal(int cpu) +{ + int idx; + u64 t = 0; + struct steal_clock_record *r = &per_cpu(steal_record, cpu); + + for (idx = 0; idx < SOFT_INTRS_PER_PERIOD; idx++) + t += r->stolen_time[idx]; + + do_div(t, 1000000); + + return t; +} + +#else +static void record_steal_time(void) { return; } +#endif + /* Commands for resetting the watchdog */ static void __touch_watchdog(void) { @@ -271,6 +315,9 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) /* kick the hardlockup detector */ watchdog_interrupt_count(); + /* record steal time */ + record_steal_time(); + /* kick the softlockup detector */ wake_up_process(__this_cpu_read(softlockup_watchdog)); @@ -316,6 +363,10 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) printk(KERN_EMERG "BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n", smp_processor_id(), duration, current->comm, task_pid_nr(current)); +#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING + printk(KERN_EMERG "soft lockup stolen time = %ums\n", + get_accumulated_steal(smp_processor_id())); +#endif print_modules(); print_irqtrace_events(current); if (regs)