From patchwork Wed Feb 28 10:05:55 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Tvrtko Ursulin X-Patchwork-Id: 10247081 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id B910460362 for ; Wed, 28 Feb 2018 10:06:10 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9C58B28C76 for ; Wed, 28 Feb 2018 10:06:10 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8EFB328C87; Wed, 28 Feb 2018 10:06:10 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_MED,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 39CFB289A6 for ; Wed, 28 Feb 2018 10:06:09 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 0CD846E91B; Wed, 28 Feb 2018 10:06:08 +0000 (UTC) X-Original-To: Intel-gfx@lists.freedesktop.org Delivered-To: Intel-gfx@lists.freedesktop.org Received: from mail-wm0-x244.google.com (mail-wm0-x244.google.com [IPv6:2a00:1450:400c:c09::244]) by gabe.freedesktop.org (Postfix) with ESMTPS id B90036E91A for ; Wed, 28 Feb 2018 10:06:05 +0000 (UTC) Received: by mail-wm0-x244.google.com with SMTP id x7so3906877wmc.0 for ; Wed, 28 Feb 2018 02:06:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ursulin-net.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Mr0/QHWSMkIeusTiWGCAjg4s35xySmpNbq9yLv+aLQU=; b=EYYN8bOUuL3tKWutzpms947qdGYfYF6ZLbBpgujk9VDZMfKlxWrog751D6fWT/TLgZ WvjCXnkmOPZqd9ocJMHipN293pcYZPzO1POlTcAMIjOyPazP68X/YCyJjfgUNjqnEUCT 6aU/i7dmSu+Lrb+NM3QTEVELUsA8czuau1ZOy0cnLpHE4mkdJII9gTRFCCGCGGAJMA9w zRxnOOc8BPAU4+VU78dTeKtq0ZTYVol3lcf/vnKdLYYpfPjd9Ug8ds00ZWPuXSNfqqng wutfNsMVT7fftXn0H9RwUnj8hc99EwU1SahTaLMc/ZBBLt6EXfFFNpwj9ik7tphC6n6f pdJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Mr0/QHWSMkIeusTiWGCAjg4s35xySmpNbq9yLv+aLQU=; b=qjMG2sf+DISXoms6o8y6A62qqFqmBl3B2xcm7W5GAeLGNJAjnWFJ2PShxi1Apaz04K 0bR3E+3AXKyWfmBCXs1WDWRBFVaumu4jzBra49ABIjyYDJBy2dv2PJPPeDZND1AR7tBL LjEkr5bKMWWukl2upkAS2MBb4CaqTqKSVERFRwQCHJAHNLrXp41qYF4YS4sGSEX9TE7N WirZaB0dRF4TlL7SY77xyDkkNjRMOkV6LZzdOK3QVZ5Wg/wpSsFD6iZKk7n6SmPWJ4yh fW4QrDSTMZs1Zzdd+W57+oiRReUoMfzwhk9syU7BTYY8nYyPhmydzw6pJUWvNLWItaX2 ebUA== X-Gm-Message-State: APf1xPD1nBtEEVfiUUCPGiOA/MfAUbGoz4KqaIod6xhi1tt5dvvSq3bP R38AcInzlk8GdnIYriztKq2BFrvM X-Google-Smtp-Source: AG47ELvSZxtMxhoqSFcnHyH2AHb/a2uk79nsEAR4P8CtvLRmBxFZmXdedGEIVjs7RqdhB9k7wY/J/g== X-Received: by 10.28.45.209 with SMTP id t200mr10451073wmt.90.1519812364210; Wed, 28 Feb 2018 02:06:04 -0800 (PST) Received: from localhost.localdomain ([95.146.144.186]) by smtp.gmail.com with ESMTPSA id i11sm764005wre.36.2018.02.28.02.06.03 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 28 Feb 2018 02:06:03 -0800 (PST) From: Tvrtko Ursulin X-Google-Original-From: Tvrtko Ursulin To: igt-dev@lists.freedesktop.org Date: Wed, 28 Feb 2018 10:05:55 +0000 Message-Id: <20180228100555.11734-1-tvrtko.ursulin@linux.intel.com> X-Mailer: git-send-email 2.14.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH i-g-t v2] tests/perf_pmu: Handle CPU hotplug failures better X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Tomi Sarvela , Intel-gfx@lists.freedesktop.org Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP From: Chris Wilson CPU hotplug, especially CPU0, can be flaky on commodity hardware. To improve test reliability and reponse times when testing larger runs we need to handle those cases better. Handle failures to off-line a CPU by immediately skipping the test, and failures to on-line a CPU by immediately rebooting the machine. This patch includes igt_sysrq_reboot implementation from Chris Wilson. v2: Halt by default, reboot if env variable IGT_REBOOT_ON_FATAL_ERROR is set. (Petri Latvala) Signed-off-by: Tvrtko Ursulin Cc: Chris Wilson Cc: Petri Latvala Cc: Tomi Sarvela --- lib/Makefile.sources | 2 ++ lib/igt_core.c | 23 +++++++++++++++++++++++ lib/igt_core.h | 1 + lib/igt_sysrq.c | 22 ++++++++++++++++++++++ lib/igt_sysrq.h | 30 ++++++++++++++++++++++++++++++ lib/meson.build | 1 + tests/perf_pmu.c | 38 +++++++++++++++++++++++++++++++------- 7 files changed, 110 insertions(+), 7 deletions(-) create mode 100644 lib/igt_sysrq.c create mode 100644 lib/igt_sysrq.h diff --git a/lib/Makefile.sources b/lib/Makefile.sources index 5b13ef8896c0..3d37ef1d1984 100644 --- a/lib/Makefile.sources +++ b/lib/Makefile.sources @@ -35,6 +35,8 @@ lib_source_list = \ igt_stats.h \ igt_sysfs.c \ igt_sysfs.h \ + igt_sysrq.c \ + igt_sysrq.h \ igt_x86.h \ igt_x86.c \ igt_vgem.c \ diff --git a/lib/igt_core.c b/lib/igt_core.c index c292343de09e..3fd9f529f09f 100644 --- a/lib/igt_core.c +++ b/lib/igt_core.c @@ -70,6 +70,7 @@ #include "igt_core.h" #include "igt_aux.h" #include "igt_sysfs.h" +#include "igt_sysrq.h" #include "igt_rc.h" #define UNW_LOCAL_ONLY @@ -1136,6 +1137,28 @@ void igt_fail(int exitcode) } } +/** + * igt_fatal_error: + * + * Stop test execution or optionally, if the IGT_REBOOT_ON_FATAL_ERROR + * environment variable is set, reboot the machine. + * + * Since out test runner (piglit) does support fatal test exit codes, we + * implement the default behaviour by waiting endlessly. + */ +void __attribute__((noreturn)) igt_fatal_error(void) +{ + if (igt_check_boolean_env_var("IGT_REBOOT_ON_FATAL_ERROR", false)) { + igt_warn("FATAL ERROR - REBOOTING"); + igt_sysrq_reboot(); + } else { + igt_warn("FATAL ERROR"); + for (;;) + sleep(60); + } +} + + /** * igt_can_fail: * diff --git a/lib/igt_core.h b/lib/igt_core.h index 7af2b4c109fe..66523a208c31 100644 --- a/lib/igt_core.h +++ b/lib/igt_core.h @@ -311,6 +311,7 @@ void __igt_fail_assert(const char *domain, const char *file, const char *format, ...) __attribute__((noreturn)); void igt_exit(void) __attribute__((noreturn)); +void igt_fatal_error(void) __attribute__((noreturn)); /** * igt_ignore_warn: diff --git a/lib/igt_sysrq.c b/lib/igt_sysrq.c new file mode 100644 index 000000000000..fe3d2e344ff1 --- /dev/null +++ b/lib/igt_sysrq.c @@ -0,0 +1,22 @@ +#include +#include +#include +#include + +#include "igt_core.h" + +#include "igt_sysrq.h" + +void igt_sysrq_reboot(void) +{ + sync(); + + /* Try to be nice at first, and if that fails pull the trigger */ + if (reboot(RB_AUTOBOOT)) { + int fd = open("/proc/sysrq-trigger", O_WRONLY); + igt_ignore_warn(write(fd, "b", 2)); + close(fd); + } + + abort(); +} diff --git a/lib/igt_sysrq.h b/lib/igt_sysrq.h new file mode 100644 index 000000000000..422473d2a480 --- /dev/null +++ b/lib/igt_sysrq.h @@ -0,0 +1,30 @@ +/* + * Copyright © 2018 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + */ + +#ifndef __IGT_SYSRQ_H__ +#define __IGT_SYSRQ_H__ + +void igt_sysrq_reboot(void) __attribute__((noreturn)); + +#endif /* __IGT_SYSRQ_H__ */ diff --git a/lib/meson.build b/lib/meson.build index a9e53689b35d..b3b8b14a3f01 100644 --- a/lib/meson.build +++ b/lib/meson.build @@ -14,6 +14,7 @@ lib_sources = [ 'igt_stats.c', 'igt_syncobj.c', 'igt_sysfs.c', + 'igt_sysrq.c', 'igt_vgem.c', 'igt_x86.c', 'instdone.c', diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c index 3bbb18d2f216..8c75b0641785 100644 --- a/tests/perf_pmu.c +++ b/tests/perf_pmu.c @@ -965,6 +965,7 @@ static void cpu_hotplug(int gem_fd) int link[2]; int fd, ret; int cur = 0; + char buf; igt_skip_on(IS_BROXTON(intel_get_drm_devid(gem_fd))); igt_require(cpu0_hotplug_support()); @@ -1011,9 +1012,32 @@ static void cpu_hotplug(int gem_fd) } /* Offline followed by online a CPU. */ - igt_assert_eq(write(cpufd, "0", 2), 2); + + ret = write(cpufd, "0", 2); + if (ret < 0) { + /* + * If we failed to offline a CPU we don't want + * to proceed. + */ + igt_warn("Failed to offline cpu%u! (%d)\n", + cpu, errno); + igt_assert_eq(write(link[1], "s", 1), 1); + break; + } + usleep(1e6); - igt_assert_eq(write(cpufd, "1", 2), 2); + + ret = write(cpufd, "1", 2); + if (ret < 0) { + /* + * Failed to bring a CPU back online is fatal + * for the sanity of a test run so reboot + * immediately. + */ + igt_warn("Failed to online cpu%u! (%d)\n", + cpu, errno); + igt_fatal_error(); + } close(cpufd); cpu++; @@ -1027,15 +1051,12 @@ static void cpu_hotplug(int gem_fd) * until the CPU core shuffler finishes one loop. */ for (;;) { - char buf; - int ret2; - usleep(500e3); end_spin(gem_fd, spin[cur], 0); /* Check if the child is signaling completion. */ - ret2 = read(link[0], &buf, 1); - if ( ret2 == 1 || (ret2 < 0 && errno != EAGAIN)) + ret = read(link[0], &buf, 1); + if ( ret == 1 || (ret < 0 && errno != EAGAIN)) break; igt_spin_batch_free(gem_fd, spin[cur]); @@ -1054,6 +1075,9 @@ static void cpu_hotplug(int gem_fd) close(fd); close(link[0]); + /* Skip if child signals a problem with offlining a CPU. */ + igt_skip_on(buf == 's'); + assert_within_epsilon(val, ts[1] - ts[0], tolerance); }