From patchwork Mon Aug  6 23:11:09 2012
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Steven Fuerst <svfuerst@gmail.com>
X-Patchwork-Id: 1283821
Return-Path: 
 <dri-devel-bounces+patchwork-dri-devel=patchwork.kernel.org@lists.freedesktop.org>
X-Original-To: patchwork-dri-devel@patchwork.kernel.org
Delivered-To: patchwork-process-083081@patchwork1.kernel.org
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	by patchwork1.kernel.org (Postfix) with ESMTP id DECE03FC23
	for <patchwork-dri-devel@patchwork.kernel.org>;
	Tue,  7 Aug 2012 07:20:00 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id F1D3E9EC2A
	for <patchwork-dri-devel@patchwork.kernel.org>;
	Tue,  7 Aug 2012 00:20:00 -0700 (PDT)
X-Original-To: dri-devel@lists.freedesktop.org
Delivered-To: dri-devel@lists.freedesktop.org
Received: from mail-pb0-f49.google.com (mail-pb0-f49.google.com
	[209.85.160.49])
	by gabe.freedesktop.org (Postfix) with ESMTP id 366A49E7F0
	for <dri-devel@lists.freedesktop.org>;
	Mon,  6 Aug 2012 16:11:44 -0700 (PDT)
Received: by mail-pb0-f49.google.com with SMTP id rq13so6672756pbb.36
	for <dri-devel@lists.freedesktop.org>;
	Mon, 06 Aug 2012 16:11:44 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=from:to:cc:subject:date:message-id:x-mailer:in-reply-to:references;
	bh=21q1zEgx1mIovplQ/KDQNfb32oLLEncqkUpaTr7uLIE=;
	b=IQnzw9P94ZwL7bK7+rxIkLFoCwnt4J3FdjyYkEpZ2cEkgBhT3PFZ1e5pphSd9hymfe
	MHRrMN5cUtI2ObJJzJajMUuNijHlgCr1cfSzUCswUKSqXxSYM0jnk02IyplRu2hvQo3q
	Jr8l+mwwLIEjfoOVKoihTF6DHbn0NogoqcAa8ief7HWZAOO7MxAphk4RZNuK86ymlf11
	grAf4DQZZIoCHLvIRGrt2FCK2mS1o32imJ1Sxqiq94T3y2fPFZM1ZNrkFfU996FoqJT2
	K+DV1T4ZJYAZqkUBDx1WSGvuZcLpI6D4V5LIej4b/AtDlGXvwGOXBtB0cjJazgjpSVmQ
	POaA==
Received: by 10.68.136.233 with SMTP id qd9mr22576639pbb.166.1344294704177;
	Mon, 06 Aug 2012 16:11:44 -0700 (PDT)
Received: from localhost.localdomain (c-24-18-84-54.hsd1.wa.comcast.net.
	[24.18.84.54]) by mx.google.com with ESMTPS id
	oo6sm6388795pbc.22.2012.08.06.16.11.43
	(version=TLSv1/SSLv3 cipher=OTHER);
	Mon, 06 Aug 2012 16:11:43 -0700 (PDT)
From: Steven Fuerst <svfuerst@gmail.com>
To: dri-devel@lists.freedesktop.org
Subject: [[PATCH][RESENT] 2/3] Replace i2f() in r600_blit_kms.c with an
	optimized version.
Date: Mon,  6 Aug 2012 16:11:09 -0700
Message-Id: <1344294671-13905-2-git-send-email-svfuerst@gmail.com>
X-Mailer: git-send-email 1.7.10.4
In-Reply-To: <1344294671-13905-1-git-send-email-svfuerst@gmail.com>
References: <1344294671-13905-1-git-send-email-svfuerst@gmail.com>
X-Mailman-Approved-At: Mon, 06 Aug 2012 22:55:34 -0700
Cc: Steven Fuerst <svfuerst@gmail.com>
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
	<dri-devel.lists.freedesktop.org>
List-Unsubscribe: <http://lists.freedesktop.org/mailman/options/dri-devel>,
	<mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <http://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <http://lists.freedesktop.org/mailman/listinfo/dri-devel>,
	<mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
MIME-Version: 1.0
Sender: 
 dri-devel-bounces+patchwork-dri-devel=patchwork.kernel.org@lists.freedesktop.org
Errors-To: 
 dri-devel-bounces+patchwork-dri-devel=patchwork.kernel.org@lists.freedesktop.org

We use __fls() to find the most significant bit.  Using that, the
loop can be avoided.  A second trick is to use the mod(32)
behaviour of the rotate instructions on x86 to expand the range
of the unsigned int to float conversion to the full 32 bits.

The routine is now exact up to 2^24.  Above that, we truncate which
is equivalent to rounding towards zero.

Signed-off-by: Steven Fuerst <svfuerst@gmail.com>
---
 drivers/gpu/drm/radeon/r600_blit_kms.c |   53 ++++++++++++++------------------
 1 file changed, 23 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/radeon/r600_blit_kms.c b/drivers/gpu/drm/radeon/r600_blit_kms.c
index 2bef854..8307558 100644
--- a/drivers/gpu/drm/radeon/r600_blit_kms.c
+++ b/drivers/gpu/drm/radeon/r600_blit_kms.c
@@ -455,44 +455,37 @@ set_default_state(struct radeon_device *rdev)
 	radeon_ring_write(ring, sq_stack_resource_mgmt_2);
 }
 
-#define I2F_MAX_BITS 15
-#define I2F_MAX_INPUT  ((1 << I2F_MAX_BITS) - 1)
-#define I2F_SHIFT (24 - I2F_MAX_BITS)
+/* 23 bits of float fractional data */
+#define I2F_FRAC_BITS	23
+#define I2F_MASK ((1 << I2F_FRAC_BITS) - 1)
 
 /*
  * Converts unsigned integer into 32-bit IEEE floating point representation.
- * Conversion is not universal and only works for the range from 0
- * to 2^I2F_MAX_BITS-1. Currently we only use it with inputs between
- * 0 and 16384 (inclusive), so I2F_MAX_BITS=15 is enough. If necessary,
- * I2F_MAX_BITS can be increased, but that will add to the loop iterations
- * and slow us down. Conversion is done by shifting the input and counting
- * down until the first 1 reaches bit position 23. The resulting counter
- * and the shifted input are, respectively, the exponent and the fraction.
- * The sign is always zero.
+ * Will be exact from 0 to 2^24.  Above that, we round towards zero
+ * as the fractional bits will not fit in a float.  (It would be better to
+ * round towards even as the fpu does, but that is slower.)
+ * This routine depends on the mod(32) behaviour of the rotate instructions
+ * on x86.
  */
-static uint32_t i2f(uint32_t input)
+static uint32_t i2f(uint32_t x)
 {
-	u32 result, i, exponent, fraction;
+	uint32_t msb, exponent, fraction;
 
-	WARN_ON_ONCE(input > I2F_MAX_INPUT);
+	/* Zero is special */
+	if (!x) return 0;
 
-	if ((input & I2F_MAX_INPUT) == 0)
-		result = 0;
-	else {
-		exponent = 126 + I2F_MAX_BITS;
-		fraction = (input & I2F_MAX_INPUT) << I2F_SHIFT;
+	/* Get location of the most significant bit */
+	msb = __fls(x);
 
-		for (i = 0; i < I2F_MAX_BITS; i++) {
-			if (fraction & 0x800000)
-				break;
-			else {
-				fraction = fraction << 1;
-				exponent = exponent - 1;
-			}
-		}
-		result = exponent << 23 | (fraction & 0x7fffff);
-	}
-	return result;
+	/*
+	 * Use a rotate instead of a shift because that works both leftwards
+	 * and rightwards due to the mod(32) beahviour.  This means we don't
+	 * need to check to see if we are above 2^24 or not.
+	 */
+	fraction = ror32(x, msb - I2F_FRAC_BITS) & I2F_MASK;
+	exponent = (127 + msb) << I2F_FRAC_BITS;
+
+	return fraction + exponent;
 }
 
 int r600_blit_init(struct radeon_device *rdev)