From patchwork Sat Aug 11 17:30:20 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Fuerst X-Patchwork-Id: 1311691 Return-Path: X-Original-To: patchwork-dri-devel@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork2.kernel.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by patchwork2.kernel.org (Postfix) with ESMTP id 41BCEDF223 for ; Mon, 13 Aug 2012 10:16:13 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 20EAB9E907 for ; Mon, 13 Aug 2012 03:16:13 -0700 (PDT) X-Original-To: dri-devel@lists.freedesktop.org Delivered-To: dri-devel@lists.freedesktop.org Received: from mail-pb0-f49.google.com (mail-pb0-f49.google.com [209.85.160.49]) by gabe.freedesktop.org (Postfix) with ESMTP id 4929A9E839 for ; Sat, 11 Aug 2012 10:30:53 -0700 (PDT) Received: by mail-pb0-f49.google.com with SMTP id rq8so3434428pbb.36 for ; Sat, 11 Aug 2012 10:30:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id:x-mailer:in-reply-to:references; bh=LmzfJTGRhyfK19xaYp/nANuPkN3xv3yFuLfoOSh578M=; b=EavxMRFSNPRztN7YcPfB27XRP7CYUZFhqNkUKYHuNAme4gNRmybeRy91zkv5bDm4lW pYFVWOBqnndK50/+pMQ67cYbaAERxly9kLr14gXjtGuDH7vNgm5RrBtqD6pdLypWRq6E Fl7dS3/RoC+5n+41ThuwCRB2azUWK7HIUH/4F9eVA393C0bpGYBaT8I+ASkTsoZPOFcV FWCcEjNCfZxxiUg4eL/R+Dd/zROWDhoHV/LFjnZbXljrMQcg05IE4FWpXIEkS3BM8EpV vPdnbP9FYelVu2hv2kfMIcI4MDALdSw/aVy6mc7/HTIhedvhU7AbJogy/UQ1TAMQ54rP psKA== Received: by 10.68.227.70 with SMTP id ry6mr6908984pbc.53.1344706253141; Sat, 11 Aug 2012 10:30:53 -0700 (PDT) Received: from localhost.localdomain (c-24-18-84-54.hsd1.wa.comcast.net. [24.18.84.54]) by mx.google.com with ESMTPS id pn4sm1729323pbb.50.2012.08.11.10.30.52 (version=TLSv1/SSLv3 cipher=OTHER); Sat, 11 Aug 2012 10:30:52 -0700 (PDT) From: Steven Fuerst To: dri-devel@lists.freedesktop.org Subject: [Patch v2 2/4] Replace i2f() in r600_blit_kms.c with an optimized version. Date: Sat, 11 Aug 2012 10:30:20 -0700 Message-Id: <1344706222-3018-2-git-send-email-svfuerst@gmail.com> X-Mailer: git-send-email 1.7.10.4 In-Reply-To: <1344706222-3018-1-git-send-email-svfuerst@gmail.com> References: <1344706222-3018-1-git-send-email-svfuerst@gmail.com> X-Mailman-Approved-At: Mon, 13 Aug 2012 03:12:13 -0700 Cc: Steven Fuerst X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: dri-devel-bounces+patchwork-dri-devel=patchwork.kernel.org@lists.freedesktop.org Errors-To: dri-devel-bounces+patchwork-dri-devel=patchwork.kernel.org@lists.freedesktop.org We use __fls() to find the most significant bit. Using that, the loop can be avoided. A second trick is to use the behaviour of the rotate instructions to expand the range of the unsigned int to float conversion to the full 32 bits in a branchless way. The routine is now exact up to 2^24. Above that, we truncate which is equivalent to rounding towards zero. Signed-off-by: Steven Fuerst --- drivers/gpu/drm/radeon/r600_blit_kms.c | 51 +++++++++++++------------------- 1 file changed, 21 insertions(+), 30 deletions(-) diff --git a/drivers/gpu/drm/radeon/r600_blit_kms.c b/drivers/gpu/drm/radeon/r600_blit_kms.c index 2bef854..e5a40ca 100644 --- a/drivers/gpu/drm/radeon/r600_blit_kms.c +++ b/drivers/gpu/drm/radeon/r600_blit_kms.c @@ -455,44 +455,35 @@ set_default_state(struct radeon_device *rdev) radeon_ring_write(ring, sq_stack_resource_mgmt_2); } -#define I2F_MAX_BITS 15 -#define I2F_MAX_INPUT ((1 << I2F_MAX_BITS) - 1) -#define I2F_SHIFT (24 - I2F_MAX_BITS) +/* 23 bits of float fractional data */ +#define I2F_FRAC_BITS 23 +#define I2F_MASK ((1 << I2F_FRAC_BITS) - 1) /* * Converts unsigned integer into 32-bit IEEE floating point representation. - * Conversion is not universal and only works for the range from 0 - * to 2^I2F_MAX_BITS-1. Currently we only use it with inputs between - * 0 and 16384 (inclusive), so I2F_MAX_BITS=15 is enough. If necessary, - * I2F_MAX_BITS can be increased, but that will add to the loop iterations - * and slow us down. Conversion is done by shifting the input and counting - * down until the first 1 reaches bit position 23. The resulting counter - * and the shifted input are, respectively, the exponent and the fraction. - * The sign is always zero. + * Will be exact from 0 to 2^24. Above that, we round towards zero + * as the fractional bits will not fit in a float. (It would be better to + * round towards even as the fpu does, but that is slower.) */ -static uint32_t i2f(uint32_t input) +static uint32_t i2f(uint32_t x) { - u32 result, i, exponent, fraction; + uint32_t msb, exponent, fraction; - WARN_ON_ONCE(input > I2F_MAX_INPUT); + /* Zero is special */ + if (!x) return 0; - if ((input & I2F_MAX_INPUT) == 0) - result = 0; - else { - exponent = 126 + I2F_MAX_BITS; - fraction = (input & I2F_MAX_INPUT) << I2F_SHIFT; + /* Get location of the most significant bit */ + msb = __fls(x); - for (i = 0; i < I2F_MAX_BITS; i++) { - if (fraction & 0x800000) - break; - else { - fraction = fraction << 1; - exponent = exponent - 1; - } - } - result = exponent << 23 | (fraction & 0x7fffff); - } - return result; + /* + * Use a rotate instead of a shift because that works both leftwards + * and rightwards due to the mod(32) behaviour. This means we don't + * need to check to see if we are above 2^24 or not. + */ + fraction = ror32(x, (msb - I2F_FRAC_BITS) & 0x1f) & I2F_MASK; + exponent = (127 + msb) << I2F_FRAC_BITS; + + return fraction + exponent; } int r600_blit_init(struct radeon_device *rdev)