From patchwork Tue Jan 24 00:53:07 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Robert Bragg <robert@sixbynine.org>
X-Patchwork-Id: 9533819
Return-Path: <intel-gfx-bounces@lists.freedesktop.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	6A1C26042F for <patchwork-intel-gfx@patchwork.kernel.org>;
	Tue, 24 Jan 2017 00:53:13 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5842027C05
	for <patchwork-intel-gfx@patchwork.kernel.org>;
	Tue, 24 Jan 2017 00:53:13 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 4C7CF27F8C; Tue, 24 Jan 2017 00:53:13 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-4.1 required=2.0 tests=BAYES_00,DKIM_SIGNED,
	RCVD_IN_DNSWL_MED,T_DKIM_INVALID autolearn=ham version=3.3.1
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 4905A27C05
	for <patchwork-intel-gfx@patchwork.kernel.org>;
	Tue, 24 Jan 2017 00:53:11 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 6FB656E67A;
	Tue, 24 Jan 2017 00:53:11 +0000 (UTC)
X-Original-To: intel-gfx@lists.freedesktop.org
Delivered-To: intel-gfx@lists.freedesktop.org
Received: from mail-pg0-x241.google.com (mail-pg0-x241.google.com
	[IPv6:2607:f8b0:400e:c05::241])
	by gabe.freedesktop.org (Postfix) with ESMTPS id 63EB66E0D9
	for <intel-gfx@lists.freedesktop.org>;
	Tue, 24 Jan 2017 00:53:10 +0000 (UTC)
Received: by mail-pg0-x241.google.com with SMTP id 204so14998163pge.2
	for <intel-gfx@lists.freedesktop.org>;
	Mon, 23 Jan 2017 16:53:10 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
	h=sender:from:to:subject:date:message-id;
	bh=dwJdyGOiVxu6lBduruMlsFhR5F+QhM1e2Kyg1yTvGC8=;
	b=Xdm/Lz85Djs9lBSQ4tIGGcBYyUDBY+AbnSxMIsbHq1UznBx9R7C866wONYZTHeb6bc
	vsNeibzyQhwiY1WntXuyrh7NEY3AAnwNALVxB2XdeQSDtE8RXFyR7KAHdjzBO3ht+ZXH
	kitrSVoQ/YubuMCx0ZYXI5hXR+vlTt6tUV7uDK+b6jRdBbobeQ13WvdP0dxv2pR9/lvV
	9DJIW4BGe++bfVQenoTEhI0lHQ8sIZP2A1RAdrrQnGhXDo2zpUA251qwOzzTk6LcY14F
	NyIzV9didL5y/pOV5cTuBvp+/SkRKJF10SXjPwzak2G6t4DCz5lPVUnGImycFawZxJhv
	ghRA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:sender:from:to:subject:date:message-id;
	bh=dwJdyGOiVxu6lBduruMlsFhR5F+QhM1e2Kyg1yTvGC8=;
	b=t1+iDOJcTVD5hWpvkZItUyuyfe/NmgUXSIRR+lp9ztxCUVnJ75qyewXj7Gw6WVwCp1
	oJeYax5I9vDcbIoxzBHxCwMMFDBY4wR55naZ/i6T2qIPSJizCf//7kb2YqgAv/05pAkJ
	tKedYSx8RW2opkds7b9ojwdqgInfSBsn5wZvS9WFa9ZnXl/wlHPGM3yFAUKQahzqj4im
	+IlryZ5hAxDKhQjoc4+eKoUEfBHf2k5prfyEJzeD0lHOtcmnTrDP8OR9xGiYSBoMsaTf
	Lo2OwbELxzNJxafmyzbqEUn0pI2iCvG57Dy2xaSMPe188RguRISz3zaI0+kFV23sBS1+
	p/MQ==
X-Gm-Message-State: 
 AIkVDXLw3Wu1VzXyNY1JIwV/h+Dp7W0xS3EPv7Cn5vxEjo/dItc3dshxHAJrqsknX5nyug==
X-Received: by 10.84.173.195 with SMTP id p61mr39067950plb.63.1485219189680;
	Mon, 23 Jan 2017 16:53:09 -0800 (PST)
Received: from sixbynine.org ([207.194.157.2])
	by smtp.gmail.com with ESMTPSA id
	u75sm477871pgc.31.2017.01.23.16.53.09
	for <intel-gfx@lists.freedesktop.org>
	(version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);
	Mon, 23 Jan 2017 16:53:09 -0800 (PST)
From: Robert Bragg <robert@sixbynine.org>
To: intel-gfx@lists.freedesktop.org
Date: Mon, 23 Jan 2017 16:53:07 -0800
Message-Id: <20170124005307.11750-1-robert@sixbynine.org>
X-Mailer: git-send-email 2.11.0
Subject: [Intel-gfx] [PATCH igt] igt/perf: improve robustness of
	polling/blocking tests
X-BeenThere: intel-gfx@lists.freedesktop.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Intel graphics driver community testing & development
	<intel-gfx.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
MIME-Version: 1.0
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
X-Virus-Scanned: ClamAV using ClamSMTP

There were a couple of problems with both of these tests that could lead
to false negatives addressed by this patch.

1) The upper limit for the number of iterations missed a +1 to consider
   that there might be a sample immediately available at the start of the
   loop.

v2) The tests didn't consider that a duration measured in terms of
   (end-start) ticks could be +- 1 tick since we don't know the
   fractional part of the tick counts. Our threshold for stime being <
   one tick could have a false negative for any real stime between 1 to
   10 milliseconds depending on luck.

The tests now both run for a lot longer (1000 x tick duration, or
typically 10 seconds each) so that a single tick represents a much
smaller proportion of the total duration (0.1%) and the stime thresholds
are now set at 1% of the total duration.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
 tests/perf.c | 140 +++++++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 94 insertions(+), 46 deletions(-)

diff --git a/tests/perf.c b/tests/perf.c
index c9c5c57e..f3db84dd 100644
--- a/tests/perf.c
+++ b/tests/perf.c
@@ -1263,18 +1263,50 @@ test_blocking(void)
 	struct tms end_times;
 	int64_t user_ns, kernel_ns;
 	int64_t tick_ns = 1000000000 / sysconf(_SC_CLK_TCK);
+	int64_t test_duration_ns = tick_ns * 1000;
+
+	/* Based on the 40ms OA sampling period set above: max OA samples: */
+	int max_iterations = (test_duration_ns / 40000000ull) + 1;
+
+	/* It's a bit tricky to put a lower limit here, but we expect a
+	 * relatively low latency for seeing reports, while we don't currently
+	 * give any control over this in the api.
+	 *
+	 * We assume a maximum latency of 6 millisecond to deliver a POLLIN and
+	 * read() after a new sample is written (46ms per iteration) considering
+	 * the knowledge that that the driver uses a 200Hz hrtimer (5ms period)
+	 * to check for data and giving some time to read().
+	 */
+	int min_iterations = (test_duration_ns / 46000000ull);
+
 	int64_t start;
 	int n = 0;
 
 	times(&start_times);
 
-	/* Loop for 600ms performing blocking reads while the HW is sampling at
+	igt_debug("tick length = %dns, test duration = %"PRIu64"ns, min iter. = %d, max iter. = %d\n",
+		  (int)tick_ns, test_duration_ns,
+		  min_iterations, max_iterations);
+
+	/* In the loop we perform blocking polls while the HW is sampling at
 	 * ~25Hz, with the expectation that we spend most of our time blocked
 	 * in the kernel, and shouldn't be burning cpu cycles in the kernel in
 	 * association with this process (verified by looking at stime before
 	 * and after loop).
+	 *
+	 * We're looking to assert that less than 1% of the test duration is
+	 * spent in the kernel dealing with polling and read()ing.
+	 *
+	 * The test runs for a relatively long time considering the very low
+	 * resolution of stime in ticks of typically 10 milliseconds. Since we
+	 * don't know the fractional part of tick values we read from userspace
+	 * so our minimum threshold needs to be >= one tick since any
+	 * measurement might really be +- tick_ns (assuming we effectively get
+	 * floor(real_stime)).
+	 *
+	 * We Loop for 1000 x tick_ns so one tick corresponds to 0.1%
 	 */
-	for (start = get_time(); (get_time() - start) < 600000000; /* nop */) {
+	for (start = get_time(); (get_time() - start) < test_duration_ns; /* nop */) {
 		int ret;
 
 		while ((ret = read(stream_fd, buf, sizeof(buf))) < 0 &&
@@ -1294,33 +1326,26 @@ test_blocking(void)
 	user_ns = (end_times.tms_utime - start_times.tms_utime) * tick_ns;
 	kernel_ns = (end_times.tms_stime - start_times.tms_stime) * tick_ns;
 
-	igt_debug("%d blocking reads in 500 milliseconds, with 1KHz OA sampling\n", n);
-	igt_debug("time in userspace = %"PRIu64"ns (start utime = %d, end = %d, ns ticks per sec = %d)\n",
-		  user_ns, (int)start_times.tms_utime, (int)end_times.tms_utime, (int)tick_ns);
-	igt_debug("time in kernelspace = %"PRIu64"ns (start stime = %d, end = %d, ns ticks per sec = %d)\n",
-		  kernel_ns, (int)start_times.tms_stime, (int)end_times.tms_stime, (int)tick_ns);
+	igt_debug("%d blocking reads during test with 25Hz OA sampling\n", n);
+	igt_debug("time in userspace = %"PRIu64"ns (+-%dns) (start utime = %d, end = %d)\n",
+		  user_ns, (int)tick_ns,
+		  (int)start_times.tms_utime, (int)end_times.tms_utime);
+	igt_debug("time in kernelspace = %"PRIu64"ns (+-%dns) (start stime = %d, end = %d)\n",
+		  kernel_ns, (int)tick_ns,
+		  (int)start_times.tms_stime, (int)end_times.tms_stime);
+
 
 	/* With completely broken blocking (but also not returning an error) we
-	 * could end up with an open loop, hopefully recognisable with > 15
-	 * (600/40)iterations.
+	 * could end up with an open loop,
 	 */
-	igt_assert(n <= 15);
+	igt_assert(n <= max_iterations);
 
-	/* It's a bit tricky to put a lower limit here, but we expect a
-	 * relatively low latency for seeing reports, while we don't currently
-	 * give any control over this in the api.
-	 *
-	 * Limited to a 5 millisecond latency and 45ms (worst case)
-	 * per-iteration that could give 13.3 iterations. Rounding gives a tiny
-	 * bit more latency slack (6ms)...
+	/* Make sure the driver is reporting new samples with a reasonably
+	 * low latency...
 	 */
-	igt_assert(n > 13);
+	igt_assert(n > min_iterations);
 
-	/* A bit tricky to put a number on this, but we don't expect the kernel
-	 * to use any significant cpu while waiting and given the in precision
-	 * of stime (multiple of CLK_TCK) we expect this to round to zero.
-	 */
-	igt_assert_eq(kernel_ns, 0);
+	igt_assert(kernel_ns <= (test_duration_ns / 100ull));
 
 	close(stream_fd);
 }
@@ -1356,18 +1381,49 @@ test_polling(void)
 	struct tms end_times;
 	int64_t user_ns, kernel_ns;
 	int64_t tick_ns = 1000000000 / sysconf(_SC_CLK_TCK);
+	int64_t test_duration_ns = tick_ns * 1000;
+
+	/* Based on the 40ms OA sampling period set above: max OA samples: */
+	int max_iterations = (test_duration_ns / 40000000ull) + 1;
+
+	/* It's a bit tricky to put a lower limit here, but we expect a
+	 * relatively low latency for seeing reports, while we don't currently
+	 * give any control over this in the api.
+	 *
+	 * We assume a maximum latency of 6 millisecond to deliver a POLLIN and
+	 * read() after a new sample is written (46ms per iteration) considering
+	 * the knowledge that that the driver uses a 200Hz hrtimer (5ms period)
+	 * to check for data and giving some time to read().
+	 */
+	int min_iterations = (test_duration_ns / 46000000ull);
 	int64_t start;
 	int n = 0;
 
 	times(&start_times);
 
-	/* Loop for 600ms performing blocking polls while the HW is sampling at
+	igt_debug("tick length = %dns, test duration = %"PRIu64"ns, min iter. = %d, max iter. = %d\n",
+		  (int)tick_ns, test_duration_ns,
+		  min_iterations, max_iterations);
+
+	/* In the loop we perform blocking polls while the HW is sampling at
 	 * ~25Hz, with the expectation that we spend most of our time blocked
 	 * in the kernel, and shouldn't be burning cpu cycles in the kernel in
 	 * association with this process (verified by looking at stime before
 	 * and after loop).
+	 *
+	 * We're looking to assert that less than 1% of the test duration is
+	 * spent in the kernel dealing with polling and read()ing.
+	 *
+	 * The test runs for a relatively long time considering the very low
+	 * resolution of stime in ticks of typically 10 milliseconds. Since we
+	 * don't know the fractional part of tick values we read from userspace
+	 * so our minimum threshold needs to be >= one tick since any
+	 * measurement might really be +- tick_ns (assuming we effectively get
+	 * floor(real_stime)).
+	 *
+	 * We Loop for 1000 x tick_ns so one tick corresponds to 0.1%
 	 */
-	for (start = get_time(); (get_time() - start) < 600000000; /* nop */) {
+	for (start = get_time(); (get_time() - start) < test_duration_ns; /* nop */) {
 		struct pollfd pollfd = { .fd = stream_fd, .events = POLLIN };
 		int ret;
 
@@ -1418,33 +1474,25 @@ test_polling(void)
 	user_ns = (end_times.tms_utime - start_times.tms_utime) * tick_ns;
 	kernel_ns = (end_times.tms_stime - start_times.tms_stime) * tick_ns;
 
-	igt_debug("%d blocking poll()s in 600 milliseconds, with 25Hz OA sampling\n", n);
-	igt_debug("time in userspace = %"PRIu64"ns (start utime = %d, end = %d, ns ticks per sec = %d)\n",
-		  user_ns, (int)start_times.tms_utime, (int)end_times.tms_utime, (int)tick_ns);
-	igt_debug("time in kernelspace = %"PRIu64"ns (start stime = %d, end = %d, ns ticks per sec = %d)\n",
-		  kernel_ns, (int)start_times.tms_stime, (int)end_times.tms_stime, (int)tick_ns);
+	igt_debug("%d blocking poll()s during test with 25Hz OA sampling\n", n);
+	igt_debug("time in userspace = %"PRIu64"ns (+-%dns) (start utime = %d, end = %d)\n",
+		  user_ns, (int)tick_ns,
+		  (int)start_times.tms_utime, (int)end_times.tms_utime);
+	igt_debug("time in kernelspace = %"PRIu64"ns (+-%dns) (start stime = %d, end = %d)\n",
+		  kernel_ns, (int)tick_ns,
+		  (int)start_times.tms_stime, (int)end_times.tms_stime);
 
 	/* With completely broken blocking while polling (but still somehow
-	 * reporting a POLLIN event) we could end up with an open loop,
-	 * hopefully recognisable with > 15 (600/40)iterations.
+	 * reporting a POLLIN event) we could end up with an open loop.
 	 */
-	igt_assert(n <= 15);
+	igt_assert(n <= max_iterations);
 
-	/* It's a bit tricky to put a lower limit here, but we expect a
-	 * relatively low latency for seeing reports, while we don't currently
-	 * give any control over this in the api.
-	 *
-	 * Limited to a 5 millisecond latency and 45ms (worst case)
-	 * per-iteration that could give 13.3 iterations. Rounding gives a tiny
-	 * bit more latency slack (6ms)...
+	/* Make sure the driver is reporting new samples with a reasonably
+	 * low latency...
 	 */
-	igt_assert(n > 13);
+	igt_assert(n > min_iterations);
 
-	/* A bit tricky to put a number on this, but we don't expect the kernel
-	 * to use any significant cpu while waiting and given the in precision
-	 * of stime (multiple of CLK_TCK) we expect this to round to zero.
-	 */
-	igt_assert_eq(kernel_ns, 0);
+	igt_assert(kernel_ns <= (test_duration_ns / 100ull));
 
 	close(stream_fd);
 }