From patchwork Sun Jan 11 02:44:49 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kenneth Graunke X-Patchwork-Id: 5605711 Return-Path: X-Original-To: patchwork-intel-gfx@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 49D0F9F443 for ; Sun, 11 Jan 2015 10:49:42 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 5996920561 for ; Sun, 11 Jan 2015 10:49:41 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by mail.kernel.org (Postfix) with ESMTP id 42B072054A for ; Sun, 11 Jan 2015 10:49:40 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 10C556E27E; Sun, 11 Jan 2015 02:49:39 -0800 (PST) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from smtp145.dfw.emailsrvr.com (smtp145.dfw.emailsrvr.com [67.192.241.145]) by gabe.freedesktop.org (Postfix) with ESMTP id F24666E14C; Sun, 11 Jan 2015 02:49:36 -0800 (PST) Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp15.relay.dfw1a.emailsrvr.com (SMTP Server) with ESMTP id E78273800D7; Sun, 11 Jan 2015 05:49:34 -0500 (EST) X-Virus-Scanned: OK Received: by smtp15.relay.dfw1a.emailsrvr.com (Authenticated sender: kenneth-AT-whitecape.org) with ESMTPSA id A81AE3800D5; Sun, 11 Jan 2015 05:49:33 -0500 (EST) X-Sender-Id: kenneth@whitecape.org Received: from shale.shinigami (static-50-43-36-85.bvtn.or.frontiernet.net [50.43.36.85]) (using TLSv1.2 with cipher AES128-SHA256) by 0.0.0.0:465 (trex/5.4.2); Sun, 11 Jan 2015 10:49:34 GMT From: Kenneth Graunke To: intel-gfx@lists.freedesktop.org Date: Sat, 10 Jan 2015 18:44:49 -0800 Message-Id: <1420944289-832-1-git-send-email-kenneth@whitecape.org> X-Mailer: git-send-email 2.2.1 Cc: mesa-dev@lists.freedesktop.org Subject: [Intel-gfx] [PATCH] drm/i915: Enable the HiZ RAW Stall Optimization on Gen8. X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00, DATE_IN_PAST_06_12, RCVD_IN_DNSWL_MED,T_RP_MATCHES_RCVD,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This is an important optimization for avoiding read-after-write (RAW) stalls in the HiZ buffer. Certain workloads would run very slowly with HiZ enabled, but run much faster with the "hiz=false" driconf option. With this patch, they run at full speed even with HiZ. Improves performance in OglVSInstancing by 3.2x on Broadwell GT3e (Iris Pro 6200). Thanks to Jesse Barnes for finding this missing bit! Thanks to Chris Wilson for helping me find where to set it. Signed-off-by: Kenneth Graunke Cc: Jesse Barnes Signed-off-by: Ben Widawsky --- drivers/gpu/drm/i915/intel_ringbuffer.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) Here's an alternate patch which implements the workaround in the kernel instead of Mesa. It's probably better to do it there, since the kernel does it on Haswell already. diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index dabc1d8..23020d6 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -796,6 +796,16 @@ static int bdw_init_workarounds(struct intel_engine_cs *ring) HDC_DONOT_FETCH_MEM_WHEN_MASKED | (IS_BDW_GT3(dev) ? HDC_FENCE_DEST_SLM_DISABLE : 0)); + /* From the Haswell PRM, Command Reference: Registers, CACHE_MODE_0: + * "The Hierarchical Z RAW Stall Optimization allows non-overlapping + * polygons in the same 8x4 pixel/sample area to be processed without + * stalling waiting for the earlier ones to write to Hierarchical Z + * buffer." + * + * This optimization is off by default for Broadwell; turn it on. + */ + WA_CLR_BIT_MASKED(CACHE_MODE_0_GEN7, HIZ_RAW_STALL_OPT_DISABLE); + /* Wa4x4STCOptimizationDisable:bdw */ WA_SET_BIT_MASKED(CACHE_MODE_1, GEN8_4x4_STC_OPTIMIZATION_DISABLE); @@ -836,6 +846,11 @@ static int chv_init_workarounds(struct intel_engine_cs *ring) HDC_FORCE_NON_COHERENT | HDC_DONOT_FETCH_MEM_WHEN_MASKED); + /* According to the CACHE_MODE_0 default value documentation, some + * CHV platforms disable this optimization by default. Turn it on. + */ + WA_CLR_BIT_MASKED(CACHE_MODE_0_GEN7, HIZ_RAW_STALL_OPT_DISABLE); + /* Improve HiZ throughput on CHV. */ WA_SET_BIT_MASKED(HIZ_CHICKEN, CHV_HZ_8X8_MODE_IN_1X);