From patchwork Sat Aug 11 17:30:20 2012
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Steven Fuerst <svfuerst@gmail.com>
X-Patchwork-Id: 1311691
Return-Path: 
 <dri-devel-bounces+patchwork-dri-devel=patchwork.kernel.org@lists.freedesktop.org>
X-Original-To: patchwork-dri-devel@patchwork.kernel.org
Delivered-To: patchwork-process-083081@patchwork2.kernel.org
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	by patchwork2.kernel.org (Postfix) with ESMTP id 41BCEDF223
	for <patchwork-dri-devel@patchwork.kernel.org>;
	Mon, 13 Aug 2012 10:16:13 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 20EAB9E907
	for <patchwork-dri-devel@patchwork.kernel.org>;
	Mon, 13 Aug 2012 03:16:13 -0700 (PDT)
X-Original-To: dri-devel@lists.freedesktop.org
Delivered-To: dri-devel@lists.freedesktop.org
Received: from mail-pb0-f49.google.com (mail-pb0-f49.google.com
	[209.85.160.49])
	by gabe.freedesktop.org (Postfix) with ESMTP id 4929A9E839
	for <dri-devel@lists.freedesktop.org>;
	Sat, 11 Aug 2012 10:30:53 -0700 (PDT)
Received: by mail-pb0-f49.google.com with SMTP id rq8so3434428pbb.36
	for <dri-devel@lists.freedesktop.org>;
	Sat, 11 Aug 2012 10:30:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=from:to:cc:subject:date:message-id:x-mailer:in-reply-to:references;
	bh=LmzfJTGRhyfK19xaYp/nANuPkN3xv3yFuLfoOSh578M=;
	b=EavxMRFSNPRztN7YcPfB27XRP7CYUZFhqNkUKYHuNAme4gNRmybeRy91zkv5bDm4lW
	pYFVWOBqnndK50/+pMQ67cYbaAERxly9kLr14gXjtGuDH7vNgm5RrBtqD6pdLypWRq6E
	Fl7dS3/RoC+5n+41ThuwCRB2azUWK7HIUH/4F9eVA393C0bpGYBaT8I+ASkTsoZPOFcV
	FWCcEjNCfZxxiUg4eL/R+Dd/zROWDhoHV/LFjnZbXljrMQcg05IE4FWpXIEkS3BM8EpV
	vPdnbP9FYelVu2hv2kfMIcI4MDALdSw/aVy6mc7/HTIhedvhU7AbJogy/UQ1TAMQ54rP
	psKA==
Received: by 10.68.227.70 with SMTP id ry6mr6908984pbc.53.1344706253141;
	Sat, 11 Aug 2012 10:30:53 -0700 (PDT)
Received: from localhost.localdomain (c-24-18-84-54.hsd1.wa.comcast.net.
	[24.18.84.54]) by mx.google.com with ESMTPS id
	pn4sm1729323pbb.50.2012.08.11.10.30.52
	(version=TLSv1/SSLv3 cipher=OTHER);
	Sat, 11 Aug 2012 10:30:52 -0700 (PDT)
From: Steven Fuerst <svfuerst@gmail.com>
To: dri-devel@lists.freedesktop.org
Subject: [Patch v2 2/4] Replace i2f() in r600_blit_kms.c with an optimized
	version.
Date: Sat, 11 Aug 2012 10:30:20 -0700
Message-Id: <1344706222-3018-2-git-send-email-svfuerst@gmail.com>
X-Mailer: git-send-email 1.7.10.4
In-Reply-To: <1344706222-3018-1-git-send-email-svfuerst@gmail.com>
References: <1344706222-3018-1-git-send-email-svfuerst@gmail.com>
X-Mailman-Approved-At: Mon, 13 Aug 2012 03:12:13 -0700
Cc: Steven Fuerst <svfuerst@gmail.com>
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
	<dri-devel.lists.freedesktop.org>
List-Unsubscribe: <http://lists.freedesktop.org/mailman/options/dri-devel>,
	<mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <http://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <http://lists.freedesktop.org/mailman/listinfo/dri-devel>,
	<mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
MIME-Version: 1.0
Sender: 
 dri-devel-bounces+patchwork-dri-devel=patchwork.kernel.org@lists.freedesktop.org
Errors-To: 
 dri-devel-bounces+patchwork-dri-devel=patchwork.kernel.org@lists.freedesktop.org

We use __fls() to find the most significant bit.  Using that, the
loop can be avoided.  A second trick is to use the behaviour of the
rotate instructions to expand the range of the unsigned int to float
conversion to the full 32 bits in a branchless way.

The routine is now exact up to 2^24.  Above that, we truncate which
is equivalent to rounding towards zero.

Signed-off-by: Steven Fuerst <svfuerst@gmail.com>
---
 drivers/gpu/drm/radeon/r600_blit_kms.c |   51 +++++++++++++-------------------
 1 file changed, 21 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/radeon/r600_blit_kms.c b/drivers/gpu/drm/radeon/r600_blit_kms.c
index 2bef854..e5a40ca 100644
--- a/drivers/gpu/drm/radeon/r600_blit_kms.c
+++ b/drivers/gpu/drm/radeon/r600_blit_kms.c
@@ -455,44 +455,35 @@ set_default_state(struct radeon_device *rdev)
 	radeon_ring_write(ring, sq_stack_resource_mgmt_2);
 }
 
-#define I2F_MAX_BITS 15
-#define I2F_MAX_INPUT  ((1 << I2F_MAX_BITS) - 1)
-#define I2F_SHIFT (24 - I2F_MAX_BITS)
+/* 23 bits of float fractional data */
+#define I2F_FRAC_BITS	23
+#define I2F_MASK ((1 << I2F_FRAC_BITS) - 1)
 
 /*
  * Converts unsigned integer into 32-bit IEEE floating point representation.
- * Conversion is not universal and only works for the range from 0
- * to 2^I2F_MAX_BITS-1. Currently we only use it with inputs between
- * 0 and 16384 (inclusive), so I2F_MAX_BITS=15 is enough. If necessary,
- * I2F_MAX_BITS can be increased, but that will add to the loop iterations
- * and slow us down. Conversion is done by shifting the input and counting
- * down until the first 1 reaches bit position 23. The resulting counter
- * and the shifted input are, respectively, the exponent and the fraction.
- * The sign is always zero.
+ * Will be exact from 0 to 2^24.  Above that, we round towards zero
+ * as the fractional bits will not fit in a float.  (It would be better to
+ * round towards even as the fpu does, but that is slower.)
  */
-static uint32_t i2f(uint32_t input)
+static uint32_t i2f(uint32_t x)
 {
-	u32 result, i, exponent, fraction;
+	uint32_t msb, exponent, fraction;
 
-	WARN_ON_ONCE(input > I2F_MAX_INPUT);
+	/* Zero is special */
+	if (!x) return 0;
 
-	if ((input & I2F_MAX_INPUT) == 0)
-		result = 0;
-	else {
-		exponent = 126 + I2F_MAX_BITS;
-		fraction = (input & I2F_MAX_INPUT) << I2F_SHIFT;
+	/* Get location of the most significant bit */
+	msb = __fls(x);
 
-		for (i = 0; i < I2F_MAX_BITS; i++) {
-			if (fraction & 0x800000)
-				break;
-			else {
-				fraction = fraction << 1;
-				exponent = exponent - 1;
-			}
-		}
-		result = exponent << 23 | (fraction & 0x7fffff);
-	}
-	return result;
+	/*
+	 * Use a rotate instead of a shift because that works both leftwards
+	 * and rightwards due to the mod(32) behaviour.  This means we don't
+	 * need to check to see if we are above 2^24 or not.
+	 */
+	fraction = ror32(x, (msb - I2F_FRAC_BITS) & 0x1f) & I2F_MASK;
+	exponent = (127 + msb) << I2F_FRAC_BITS;
+
+	return fraction + exponent;
 }
 
 int r600_blit_init(struct radeon_device *rdev)