From patchwork Fri Sep 11 10:30:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Janusz Krzysztofik X-Patchwork-Id: 11770233 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 98CCF59D for ; Fri, 11 Sep 2020 10:31:15 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7A3C6221ED for ; Fri, 11 Sep 2020 10:31:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7A3C6221ED Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 7E9FD6E9FC; Fri, 11 Sep 2020 10:31:14 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by gabe.freedesktop.org (Postfix) with ESMTPS id AE67F6E9FC; Fri, 11 Sep 2020 10:31:13 +0000 (UTC) IronPort-SDR: VfUnqrnR4Bi/rr/NmWePchV0Bm00Urt3KJYKyl+Q5BKjp2YVXlkPHESd4Fvz4N0VrM9OQYcdfZ jo9NTq19uXTw== X-IronPort-AV: E=McAfee;i="6000,8403,9740"; a="146463264" X-IronPort-AV: E=Sophos;i="5.76,414,1592895600"; d="scan'208";a="146463264" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Sep 2020 03:31:13 -0700 IronPort-SDR: ixYrhW6qjd2Ytg3OlR54NuTg0KI1s2+EVza7kYxTMA6/nH7dcew4KVwMPLamKlNPsJugOdZdu6 vslbADg9/ZHQ== X-IronPort-AV: E=Sophos;i="5.76,414,1592895600"; d="scan'208";a="334474773" Received: from jkrzyszt-desk.igk.intel.com ([172.22.244.18]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Sep 2020 03:31:11 -0700 From: Janusz Krzysztofik To: igt-dev@lists.freedesktop.org Date: Fri, 11 Sep 2020 12:30:26 +0200 Message-Id: <20200911103039.4574-12-janusz.krzysztofik@linux.intel.com> X-Mailer: git-send-email 2.21.1 In-Reply-To: <20200911103039.4574-1-janusz.krzysztofik@linux.intel.com> References: <20200911103039.4574-1-janusz.krzysztofik@linux.intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH i-g-t v6 11/24] tests/core_hotunplug: Recover from subtest failures X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?utf-8?q?Micha=C5=82_Winiarski?= , intel-gfx@lists.freedesktop.org Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Subtests now forcibly call or request igt_abort on failures in order to avoid silently leaving an exercised device in an unusable state. However, a failure inside a subtest doesn't always mean the device is no longer working correctly and reboot is needed. On the other hand, if a subtest just fails without aborting, that doesn't mean in turn the device is healthy. We should still perform a device health check in that case before deciding on next steps. Reuse the 'failure' structure field as a mark which is set before each critical operation is executed that must be followed by a successful health check in order to avoid aborting the test. Then, follow each subtest with its individual igt_fixture section, from where device file descriptors potentially left open are closed, device rediscover or driver rebing operation is run as needed, and finally the health check is run again if the preceding igt_subtest section has exited with the marker set. v2: Start each recovery phase from unconditionally closing file descriptors potentially left open by a subtest before it entered its critical section, - replace igt_require() with 'if() return;' construct in recover() to reduce noise, - replace "subtest failure" message used as a request for healthcheck with a more appropriate "need healthcheck" for clarity, - rebase on current upstream master. v3: Refresh, - move bus_rescan() and driver_bind() function calls back from heaalthcheck() to recover() so a pure health check can still be called from a subtest if essential, - move failure mark assignments back from subtests to helpers for more adequate abort reason reporting but clean the mark only on health check success, - call cleanup() also from post_healthcheck() in order to close a device file descriptor potentially left open by a failed health check, - reword commit message and update description. v4: Close exercised device fd before failing a health check run, - don't drop health checks from subtest bodies, their results should always matter. v5: Refresh. Signed-off-by: Janusz Krzysztofik Reviewed-by: MichaƂ Winiarski # v1 --- tests/core_hotunplug.c | 100 ++++++++++++++++++++++++++++++----------- 1 file changed, 74 insertions(+), 26 deletions(-) diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c index d51526029..7fc6df688 100644 --- a/tests/core_hotunplug.c +++ b/tests/core_hotunplug.c @@ -78,12 +78,18 @@ static int local_close(int fd, const char *warning) static int close_device(int fd_drm, const char *when, const char *which) { + if (fd_drm < 0) /* not open - return current status */ + return fd_drm; + igt_debug("%sclosing %sdevice instance\n", when, which); return local_close(fd_drm, "Device close failed"); } static int close_sysfs(int fd_sysfs_dev) { + if (fd_sysfs_dev < 0) /* not open - return current status */ + return fd_sysfs_dev; + return local_close(fd_sysfs_dev, "Device sysfs node close failed"); } @@ -117,24 +123,22 @@ static void prepare(struct hotunplug *priv) static void driver_unbind(struct hotunplug *priv, const char *prefix) { igt_debug("%sunbinding the driver from the device\n", prefix); + priv->failure = "Driver unbind failure!"; - priv->failure = "Driver unbind timeout!"; - igt_set_timeout(60, priv->failure); + igt_set_timeout(60, "Driver unbind timeout!"); igt_sysfs_set(priv->fd.sysfs_drv, "unbind", priv->dev_bus_addr); igt_reset_timeout(); - priv->failure = NULL; } /* Re-bind the driver to the device */ static void driver_bind(struct hotunplug *priv) { igt_debug("rebinding the driver to the device\n"); + priv->failure = "Driver re-bind failure!"; - priv->failure = "Driver re-bind timeout!"; - igt_set_timeout(60, priv->failure); + igt_set_timeout(60, "Driver re-bind timeout!"); igt_sysfs_set(priv->fd.sysfs_drv, "bind", priv->dev_bus_addr); igt_reset_timeout(); - priv->failure = NULL; } /* Remove (virtually unplug) the device from its bus */ @@ -147,12 +151,11 @@ static void device_unplug(struct hotunplug *priv, const char *prefix) igt_assert_fd(priv->fd.sysfs_dev); igt_debug("%sunplugging the device\n", prefix); + priv->failure = "Device unplug failure!"; - priv->failure = "Device unplug timeout!"; - igt_set_timeout(60, priv->failure); + igt_set_timeout(60, "Device unplug timeout!"); igt_sysfs_set(priv->fd.sysfs_dev, "remove", "1"); igt_reset_timeout(); - priv->failure = NULL; priv->fd.sysfs_dev = close_sysfs(priv->fd.sysfs_dev); } @@ -161,12 +164,17 @@ static void device_unplug(struct hotunplug *priv, const char *prefix) static void bus_rescan(struct hotunplug *priv) { igt_debug("rediscovering the device\n"); + priv->failure = "Bus rescan failure!"; - priv->failure = "Bus rescan timeout!"; - igt_set_timeout(60, priv->failure); + igt_set_timeout(60, "Bus rescan timeout!"); igt_sysfs_set(priv->fd.sysfs_bus, "../rescan", "1"); igt_reset_timeout(); - priv->failure = NULL; +} + +static void cleanup(struct hotunplug *priv) +{ + priv->fd.drm = close_device(priv->fd.drm, "post ", "failed "); + priv->fd.sysfs_dev = close_sysfs(priv->fd.sysfs_dev); } static void healthcheck(struct hotunplug *priv) @@ -180,25 +188,45 @@ static void healthcheck(struct hotunplug *priv) priv->failure = "Device reopen failure!"; fd_drm = local_drm_open_driver("re", " for health check"); - if (closed) /* store fd for post_healthcheck if not dirty */ + if (closed) /* store fd for cleanup if not dirty */ priv->fd.drm = fd_drm; - priv->failure = NULL; if (is_i915_device(fd_drm)) { priv->failure = "GEM failure"; igt_require_gem(fd_drm); priv->failure = NULL; + } else { + /* no device specific healthcheck, rely on reopen result */ + priv->failure = NULL; } fd_drm = close_device(fd_drm, "", "health checked "); if (closed || fd_drm < -1) /* update status for post_healthcheck */ priv->fd.drm = fd_drm; + + /* not only request igt_abort on failure, also fail the health check */ + igt_fail_on_f(priv->failure, "%s\n", priv->failure); +} + +static void recover(struct hotunplug *priv) +{ + cleanup(priv); + + if (faccessat(priv->fd.sysfs_bus, priv->dev_bus_addr, F_OK, 0)) + bus_rescan(priv); + + else if (faccessat(priv->fd.sysfs_drv, priv->dev_bus_addr, F_OK, 0)) + driver_bind(priv); + + if (priv->failure) + healthcheck(priv); } static void post_healthcheck(struct hotunplug *priv) { igt_abort_on_f(priv->failure, "%s\n", priv->failure); + cleanup(priv); igt_require(priv->fd.drm == -1); } @@ -297,30 +325,50 @@ igt_main prepare(&priv); } - igt_describe("Check if the driver can be cleanly unbound from a device believed to be closed"); - igt_subtest("unbind-rebind") - unbind_rebind(&priv); + igt_subtest_group { + igt_describe("Check if the driver can be cleanly unbound from a device believed to be closed"); + igt_subtest("unbind-rebind") + unbind_rebind(&priv); + + igt_fixture + recover(&priv); + } igt_fixture post_healthcheck(&priv); - igt_describe("Check if a device believed to be closed can be cleanly unplugged"); - igt_subtest("unplug-rescan") - unplug_rescan(&priv); + igt_subtest_group { + igt_describe("Check if a device believed to be closed can be cleanly unplugged"); + igt_subtest("unplug-rescan") + unplug_rescan(&priv); + + igt_fixture + recover(&priv); + } igt_fixture post_healthcheck(&priv); - igt_describe("Check if the driver can be cleanly unbound from a still open device, then released"); - igt_subtest("hotunbind-lateclose") - hotunbind_lateclose(&priv); + igt_subtest_group { + igt_describe("Check if the driver can be cleanly unbound from a still open device, then released"); + igt_subtest("hotunbind-lateclose") + hotunbind_lateclose(&priv); + + igt_fixture + recover(&priv); + } igt_fixture post_healthcheck(&priv); - igt_describe("Check if a still open device can be cleanly unplugged, then released"); - igt_subtest("hotunplug-lateclose") - hotunplug_lateclose(&priv); + igt_subtest_group { + igt_describe("Check if a still open device can be cleanly unplugged, then released"); + igt_subtest("hotunplug-lateclose") + hotunplug_lateclose(&priv); + + igt_fixture + recover(&priv); + } igt_fixture { post_healthcheck(&priv);