From patchwork Wed Aug 20 14:19:17 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: arun.siluvery@linux.intel.com X-Patchwork-Id: 4751971 Return-Path: X-Original-To: patchwork-intel-gfx@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 6CAC99F375 for ; Wed, 20 Aug 2014 14:19:50 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 576FA2015A for ; Wed, 20 Aug 2014 14:19:49 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by mail.kernel.org (Postfix) with ESMTP id 3225120155 for ; Wed, 20 Aug 2014 14:19:48 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id DA8996E6E5; Wed, 20 Aug 2014 07:19:47 -0700 (PDT) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by gabe.freedesktop.org (Postfix) with ESMTP id 210CE6E6E5 for ; Wed, 20 Aug 2014 07:19:46 -0700 (PDT) Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga101.fm.intel.com with ESMTP; 20 Aug 2014 07:19:45 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,902,1400050800"; d="scan'208";a="587480858" Received: from asiluver-linux.isw.intel.com ([10.102.226.49]) by fmsmga002.fm.intel.com with ESMTP; 20 Aug 2014 07:19:44 -0700 From: Arun Siluvery To: intel-gfx@lists.freedesktop.org Date: Wed, 20 Aug 2014 15:19:17 +0100 Message-Id: <1408544358-26735-2-git-send-email-arun.siluvery@linux.intel.com> X-Mailer: git-send-email 2.0.4 In-Reply-To: <1408544358-26735-1-git-send-email-arun.siluvery@linux.intel.com> References: <1408544358-26735-1-git-send-email-arun.siluvery@linux.intel.com> Subject: [Intel-gfx] [PATCH 1/2] drm/i915/bdw: Apply workarounds using the golden render state X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Spam-Status: No, score=-4.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Workarounds for bdw are currently applied in init_clock_gating() but they are lost following a gpu reset. Some of the WA registers are part of register state context and they are restored with every context switch so initializing them in golden render state ensures that they are applied even when we start with an uninitialized context or during hw initlialization followed by a reset. v2: Add comments corresponding to WAs in golden render state (Chris). The generation of render state is not a straighforward process, it would be ideal to augment WA values from during the setup state as opposed to using a tool but that would be a follow up patch. Signed-off-by: Arun Siluvery --- drivers/gpu/drm/i915/intel_pm.c | 49 -------------- drivers/gpu/drm/i915/intel_renderstate_gen8.c | 95 ++++++++++++++++++++------- 2 files changed, 72 insertions(+), 72 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c index c8f744c..bcae3dc 100644 --- a/drivers/gpu/drm/i915/intel_pm.c +++ b/drivers/gpu/drm/i915/intel_pm.c @@ -5507,101 +5507,52 @@ static void gen8_init_clock_gating(struct drm_device *dev) struct drm_i915_private *dev_priv = dev->dev_private; enum pipe pipe; I915_WRITE(WM3_LP_ILK, 0); I915_WRITE(WM2_LP_ILK, 0); I915_WRITE(WM1_LP_ILK, 0); /* FIXME(BDW): Check all the w/a, some might only apply to * pre-production hw. */ - /* WaDisablePartialInstShootdown:bdw */ - I915_WRITE(GEN8_ROW_CHICKEN, - _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE)); - - /* WaDisableThreadStallDopClockGating:bdw */ - /* FIXME: Unclear whether we really need this on production bdw. */ - I915_WRITE(GEN8_ROW_CHICKEN, - _MASKED_BIT_ENABLE(STALL_DOP_GATING_DISABLE)); - - /* - * This GEN8_CENTROID_PIXEL_OPT_DIS W/A is only needed for - * pre-production hardware - */ - I915_WRITE(HALF_SLICE_CHICKEN3, - _MASKED_BIT_ENABLE(GEN8_CENTROID_PIXEL_OPT_DIS)); - I915_WRITE(HALF_SLICE_CHICKEN3, - _MASKED_BIT_ENABLE(GEN8_SAMPLER_POWER_BYPASS_DIS)); I915_WRITE(GAMTARBMODE, _MASKED_BIT_ENABLE(ARB_MODE_BWGTLB_DISABLE)); I915_WRITE(_3D_CHICKEN3, _MASKED_BIT_ENABLE(_3D_CHICKEN_SDE_LIMIT_FIFO_POLY_DEPTH(2))); - I915_WRITE(COMMON_SLICE_CHICKEN2, - _MASKED_BIT_ENABLE(GEN8_CSC2_SBE_VUE_CACHE_CONSERVATIVE)); - - I915_WRITE(GEN7_HALF_SLICE_CHICKEN1, - _MASKED_BIT_ENABLE(GEN7_SINGLE_SUBSCAN_DISPATCH_ENABLE)); - - /* WaDisableDopClockGating:bdw May not be needed for production */ - I915_WRITE(GEN7_ROW_CHICKEN2, - _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE)); - /* WaSwitchSolVfFArbitrationPriority:bdw */ I915_WRITE(GAM_ECOCHK, I915_READ(GAM_ECOCHK) | HSW_ECOCHK_ARB_PRIO_SOL); /* WaPsrDPAMaskVBlankInSRD:bdw */ I915_WRITE(CHICKEN_PAR1_1, I915_READ(CHICKEN_PAR1_1) | DPA_MASK_VBLANK_SRD); /* WaPsrDPRSUnmaskVBlankInSRD:bdw */ for_each_pipe(pipe) { I915_WRITE(CHICKEN_PIPESL_1(pipe), I915_READ(CHICKEN_PIPESL_1(pipe)) | BDW_DPRS_MASK_VBLANK_SRD); } - /* Use Force Non-Coherent whenever executing a 3D context. This is a - * workaround for for a possible hang in the unlikely event a TLB - * invalidation occurs during a PSD flush. - */ - I915_WRITE(HDC_CHICKEN0, - I915_READ(HDC_CHICKEN0) | - _MASKED_BIT_ENABLE(HDC_FORCE_NON_COHERENT)); - /* WaVSRefCountFullforceMissDisable:bdw */ /* WaDSRefCountFullforceMissDisable:bdw */ I915_WRITE(GEN7_FF_THREAD_MODE, I915_READ(GEN7_FF_THREAD_MODE) & ~(GEN8_FF_DS_REF_CNT_FFME | GEN7_FF_VS_REF_CNT_FFME)); - /* - * BSpec recommends 8x4 when MSAA is used, - * however in practice 16x4 seems fastest. - * - * Note that PS/WM thread counts depend on the WIZ hashing - * disable bit, which we don't touch here, but it's good - * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM). - */ - I915_WRITE(GEN7_GT_MODE, - GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4); I915_WRITE(GEN6_RC_SLEEP_PSMI_CONTROL, _MASKED_BIT_ENABLE(GEN8_RC_SEMA_IDLE_MSG_DISABLE)); /* WaDisableSDEUnitClockGating:bdw */ I915_WRITE(GEN8_UCGCTL6, I915_READ(GEN8_UCGCTL6) | GEN8_SDEUNIT_CLOCK_GATE_DISABLE); - - /* Wa4x4STCOptimizationDisable:bdw */ - I915_WRITE(CACHE_MODE_1, - _MASKED_BIT_ENABLE(GEN8_4x4_STC_OPTIMIZATION_DISABLE)); } static void haswell_init_clock_gating(struct drm_device *dev) { struct drm_i915_private *dev_priv = dev->dev_private; ilk_init_lp_watermarks(dev); /* L3 caching of data atomics doesn't work -- disable it. */ I915_WRITE(HSW_SCRATCH1, HSW_SCRATCH1_L3_DATA_ATOMICS_DISABLE); diff --git a/drivers/gpu/drm/i915/intel_renderstate_gen8.c b/drivers/gpu/drm/i915/intel_renderstate_gen8.c index 75ef1b5d..617be0f 100644 --- a/drivers/gpu/drm/i915/intel_renderstate_gen8.c +++ b/drivers/gpu/drm/i915/intel_renderstate_gen8.c @@ -1,21 +1,78 @@ #include "intel_renderstate.h" static const u32 gen8_null_state_relocs[] = { - 0x00000048, - 0x00000050, - 0x00000060, - 0x000003ec, + 0x000000a8, + 0x000000b0, + 0x000000c0, + 0x0000044c, -1, }; static const u32 gen8_null_state_batch[] = { + 0x11000001, /* Apply workarounds - start */ + /* GEN8_ROW_CHICKEN + * WaDisablePartialInstShootdown:bdw + * WaDisableThreadStallDopClockGating:bdw + */ + 0x0000e4f0, + 0x83208320, + 0x11000001, + /* GEN7_ROW_CHICKEN2 + * WaDisableDopClockGating:bdw, may not be needed for production. + */ + 0x0000e4f4, + 0x00010001, + 0x11000001, + /* HALF_SLICE_CHICKEN3 + * This GEN8_CENTROID_PIXEL_OPT_DIS W/A is only needed for + * pre-production hardware + */ + 0x0000e184, + 0x01020102, + 0x11000001, + /* GEN7_HALF_SLICE_CHICKEN1 + * Wa: GEN7_SINGLE_SUBSCAN_DISPATCH_ENABLE + */ + 0x0000e100, + 0x04000400, + 0x11000001, + /* COMMON_SLICE_CHICKEN2 + * Wa: GEN8_CSC2_SBE_VUE_CACHE_CONSERVATIVE + */ + 0x00007014, + 0x00010001, + 0x11000001, + /* HDC_CHICKEN0 + * Use Force Non-Coherent whenever executing a 3D context. This is a + * workaround for for a possible hang in the unlikely event a TLB + * invalidation occurs during a PSD flush. + */ + 0x00007300, + 0x00100010, + 0x11000001, + /* CACHE_MODE_1 + * Wa4x4STCOptimizationDisable:bdw + */ + 0x00007004, + 0x00400040, + 0x11000001, + /* + * BSpec recommends 8x4 when MSAA is used, + * however in practice 16x4 seems fastest. + * + * Note that PS/WM thread counts depend on the WIZ hashing + * disable bit, which we don't touch here, but it's good + * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM). + */ + 0x00007008, + 0x02800200, /* Apply workarounds - end */ 0x69040000, 0x61020001, 0x00000000, 0x00000000, 0x79120000, 0x00000000, 0x79130000, 0x00000000, 0x79140000, 0x00000000, @@ -33,35 +90,35 @@ static const u32 gen8_null_state_batch[] = { 0x00000000, 0x00000000, 0x00000000, 0x00000001, /* reloc */ 0x00000000, 0xfffff001, 0x00001001, 0xfffff001, 0x00001001, 0x78230000, - 0x000006e0, + 0x00000720, 0x78210000, - 0x00000700, + 0x00000740, 0x78300000, 0x08010040, 0x78330000, 0x08000000, 0x78310000, 0x08000000, 0x78320000, 0x08000000, 0x78240000, - 0x00000641, + 0x00000681, 0x780e0000, - 0x00000601, + 0x00000641, 0x780d0000, 0x00000000, 0x78180000, 0x00000001, 0x78520003, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x78190009, @@ -192,54 +249,54 @@ static const u32 gen8_null_state_batch[] = { 0x78500003, 0x00210000, 0x00000000, 0x00000000, 0x00000000, 0x78130002, 0x00000000, 0x00000000, 0x00000000, 0x782a0000, - 0x00000480, + 0x000004c0, 0x782f0000, - 0x00000540, + 0x00000580, 0x78140000, 0x00000800, 0x78170009, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x7820000a, - 0x00000580, + 0x000005c0, 0x00000000, 0x08080000, 0x00000000, 0x00000000, 0x1f000002, 0x00060000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x784d0000, 0x40000000, 0x784f0000, 0x80000100, 0x780f0000, - 0x00000740, + 0x00000780, 0x78050006, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x78070003, 0x00000000, @@ -253,21 +310,21 @@ static const u32 gen8_null_state_batch[] = { 0x00000000, 0x78040001, 0x00000000, 0x00000001, 0x79000002, 0xffffffff, 0x00000000, 0x00000000, 0x78080003, 0x00006000, - 0x000005e0, /* reloc */ + 0x00000620, /* reloc */ 0x00000000, 0x00000000, 0x78090005, 0x02000000, 0x22220000, 0x02f60000, 0x11230000, 0x02850004, 0x11230000, 0x784b0000, @@ -282,30 +339,22 @@ static const u32 gen8_null_state_batch[] = { 0x00000001, 0x00000000, 0x00000000, 0x05000000, /* cmds end */ 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, - 0x00000000, - 0x00000000, - 0x00000000, - 0x00000000, - 0x00000000, - 0x00000000, - 0x00000000, - 0x00000000, - 0x000004c0, /* state start */ - 0x00000500, + 0x00000500, /* state start */ + 0x00000540, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,