From patchwork Fri Feb 18 18:47:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ramalingam C X-Patchwork-Id: 12751789 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 2B0F7C433F5 for ; Fri, 18 Feb 2022 18:47:48 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 40E3410E71F; Fri, 18 Feb 2022 18:47:45 +0000 (UTC) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by gabe.freedesktop.org (Postfix) with ESMTPS id 501AE10E7BD; Fri, 18 Feb 2022 18:47:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1645210061; x=1676746061; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=cc+KBzSk90QHeEXAYZ+q2QPbk0ZOVFRG6Y9ZZAuaGpo=; b=ZTBf3cm4cd2LpfjHbct17HLGisD4EwF2Y0gUITU2lyugkLfrnG4s8EsR wDBhEiQQ1l0brrSK2oaTq/HR6OymGApl3NLwab+5DN79bazPHH4fvnMVr p20cxU3wkJsdX6IS52hPS0IriAJI35Jxc8la4NVilMemYdqV7DjNvgj3H bv+6wd0Q7m5FA9RciANdz8XdAtcaXLK1S/a1tMYSmrdDcHf7VZaZXyg+v axJRRze71pDH25HH0FVKhUDnUWYtkPYGkydltsqh8Nvt3m6aQzPRka2oS oXjrAZzRP/sp/9kDKjRdbnIic3LCrGIm+ZVBe7CDlOvrcBsBzKFdxAmy6 w==; X-IronPort-AV: E=McAfee;i="6200,9189,10262"; a="238592912" X-IronPort-AV: E=Sophos;i="5.88,379,1635231600"; d="scan'208";a="238592912" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Feb 2022 10:47:41 -0800 X-IronPort-AV: E=Sophos;i="5.88,379,1635231600"; d="scan'208";a="489642012" Received: from ramaling-i9x.iind.intel.com ([10.203.144.108]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Feb 2022 10:47:38 -0800 From: Ramalingam C To: intel-gfx , dri-devel Subject: [PATCH 01/15] drm/i915/dg2: Define GuC firmware version for DG2 Date: Sat, 19 Feb 2022 00:17:38 +0530 Message-Id: <20220218184752.7524-2-ramalingam.c@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220218184752.7524-1-ramalingam.c@intel.com> References: <20220218184752.7524-1-ramalingam.c@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Tomasz Mistat , lucas.demarchi@intel.com, Daniele Ceraolo Spurio , John Harrison Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: John Harrison First release of GuC for DG2. Signed-off-by: John Harrison CC: Tomasz Mistat CC: Ramalingam C CC: Daniele Ceraolo Spurio Reviewed-by: Daniele Ceraolo Spurio --- drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c index c88113044494..55512db29183 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c @@ -52,6 +52,7 @@ void intel_uc_fw_change_status(struct intel_uc_fw *uc_fw, * firmware as TGL. */ #define INTEL_GUC_FIRMWARE_DEFS(fw_def, guc_def) \ + fw_def(DG2, 0, guc_def(dg2, 69, 0, 3)) \ fw_def(ALDERLAKE_P, 0, guc_def(adlp, 69, 0, 3)) \ fw_def(ALDERLAKE_S, 0, guc_def(tgl, 69, 0, 3)) \ fw_def(DG1, 0, guc_def(dg1, 69, 0, 3)) \ From patchwork Fri Feb 18 18:47:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Ramalingam C X-Patchwork-Id: 12751790 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 8601BC433EF for ; Fri, 18 Feb 2022 18:47:51 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id BFD2410E7BD; Fri, 18 Feb 2022 18:47:45 +0000 (UTC) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by gabe.freedesktop.org (Postfix) with ESMTPS id 4105110E71F; Fri, 18 Feb 2022 18:47:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1645210064; x=1676746064; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=C+MkOqiwP4I7fA/1aBEuuIAuM7+18btfM7ehJPLyiQA=; b=XdIEDToNVfXBF0Kb8x8hsXoPHMhB7lQ81wM0KtIL2w24cf52z+2mRokK CZjTNx+pt1RWJpMlsDVaYfnI3LZSFK6+EfhFMFohOG2JjTLKmWHzEGrE3 6LcJ5dWnjuZSyqbR/CjAGDCOCP24bDarZPRpAI5M/YQ+DnMv5TI+N4gTR LaClacDG+DjD0SEnJIZ4nBwmKeSfp0NjGCw9dIgRxzvt8sK7YX54p+WWD aRo/rEikPAS5rsW6VN8Jht7eiPYG4iYuuphr7ZasaYsb4vn9UJjoI8vUo cWN596bP3SDKGzirBg0ZLUaZExstaN9nOP5n4p6Eoyy5d75dzUy4+7LTU w==; X-IronPort-AV: E=McAfee;i="6200,9189,10262"; a="238592933" X-IronPort-AV: E=Sophos;i="5.88,379,1635231600"; d="scan'208";a="238592933" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Feb 2022 10:47:44 -0800 X-IronPort-AV: E=Sophos;i="5.88,379,1635231600"; d="scan'208";a="489642047" Received: from ramaling-i9x.iind.intel.com ([10.203.144.108]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Feb 2022 10:47:41 -0800 From: Ramalingam C To: intel-gfx , dri-devel Subject: [PATCH 02/15] drm/i915: Fix for PHY_MISC_TC1 offset Date: Sat, 19 Feb 2022 00:17:39 +0530 Message-Id: <20220218184752.7524-3-ramalingam.c@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220218184752.7524-1-ramalingam.c@intel.com> References: <20220218184752.7524-1-ramalingam.c@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: lucas.demarchi@intel.com, Uma Shankar , =?utf-8?q?Jouni_H=C3=B6gander?= Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Jouni Högander Currently ICL_PHY_MISC macro is returning offset 0x64C10 for PHY_E. The PORT_TC1 port is not yet enabled properly in the driver, but intel_phy_snps.c is relying on intel_phy_is_snps() to filter out unavailable phys. That function was already considering the last phy as available. Just correct the offset of the last phy to 0x64C14 as the rest of the support for it is coming on next commits. Signed-off-by: Matt Roper Signed-off-by: Jouni Högander Signed-off-by: Ramalingam C Reviewed-by: Uma Shankar Reviewed-by: Lucas De Marchi Acked-by: Ville Syrjälä Signed-off-by: Lucas De Marchi --- drivers/gpu/drm/i915/display/intel_snps_phy.c | 2 +- drivers/gpu/drm/i915/i915_reg.h | 6 ++++-- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/display/intel_snps_phy.c b/drivers/gpu/drm/i915/display/intel_snps_phy.c index 8fd00de981fc..4cdce0116883 100644 --- a/drivers/gpu/drm/i915/display/intel_snps_phy.c +++ b/drivers/gpu/drm/i915/display/intel_snps_phy.c @@ -32,7 +32,7 @@ void intel_snps_phy_wait_for_calibration(struct drm_i915_private *i915) if (!intel_phy_is_snps(i915, phy)) continue; - if (intel_de_wait_for_clear(i915, ICL_PHY_MISC(phy), + if (intel_de_wait_for_clear(i915, DG2_PHY_MISC(phy), DG2_PHY_DP_TX_ACK_MASK, 25)) drm_err(&i915->drm, "SNPS PHY %c failed to calibrate after 25ms.\n", phy_name(phy)); diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index e2e9f543fb83..cc13918fe246 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -9361,8 +9361,10 @@ enum skl_power_gate { #define _ICL_PHY_MISC_A 0x64C00 #define _ICL_PHY_MISC_B 0x64C04 -#define ICL_PHY_MISC(port) _MMIO_PORT(port, _ICL_PHY_MISC_A, \ - _ICL_PHY_MISC_B) +#define _DG2_PHY_MISC_TC1 0x64C14 /* TC1="PHY E" but offset as if "PHY F" */ +#define ICL_PHY_MISC(port) _MMIO_PORT(port, _ICL_PHY_MISC_A, _ICL_PHY_MISC_B) +#define DG2_PHY_MISC(port) ((port) == PHY_E ? _MMIO(_DG2_PHY_MISC_TC1) : \ + ICL_PHY_MISC(port)) #define ICL_PHY_MISC_MUX_DDID (1 << 28) #define ICL_PHY_MISC_DE_IO_COMP_PWR_DOWN (1 << 23) #define DG2_PHY_DP_TX_ACK_MASK REG_GENMASK(23, 20) From patchwork Fri Feb 18 18:47:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Ramalingam C X-Patchwork-Id: 12751791 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 2A2F3C433EF for ; Fri, 18 Feb 2022 18:47:59 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 0C38310E7EA; Fri, 18 Feb 2022 18:47:54 +0000 (UTC) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by gabe.freedesktop.org (Postfix) with ESMTPS id 9691210E7D9; Fri, 18 Feb 2022 18:47:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1645210067; x=1676746067; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=jBsr9T5v4RKnfxGcFDbrEZQsh/z87jXxkIM5XNK6w4w=; b=F4vsEOTeAuOP5Lx/NUGexq7m1sf2u86fuOETCPfIZ09xrNd/rjhErzho fPsvnfjc9bTLkqIVFbNWREpvaUr+gigBn6Sb7nu9At7Weiu41m5lMA53R 2eIJYXHuR+iWeJU+j+QxGEDg55SsG/A94ivXxJO9bqyzFdJK4IBwshxHt j5WP4LZ4knABaV0nYy+86L8/7VvARDihuI7i6RHVsBGrS8GQWteCl19M0 nAC15PlGUdEJU2Pn5/S/hspb2d87VwNBqr5PGNQ8E9dY09nQzACHCaUpK VTUn49UFZI//WNiHQzKd1LeI4d9Wnt0YiPQf5pYLhMuLEwwUPs7VRhcij A==; X-IronPort-AV: E=McAfee;i="6200,9189,10262"; a="238592946" X-IronPort-AV: E=Sophos;i="5.88,379,1635231600"; d="scan'208";a="238592946" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Feb 2022 10:47:47 -0800 X-IronPort-AV: E=Sophos;i="5.88,379,1635231600"; d="scan'208";a="489642078" Received: from ramaling-i9x.iind.intel.com ([10.203.144.108]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Feb 2022 10:47:44 -0800 From: Ramalingam C To: intel-gfx , dri-devel Subject: [PATCH 03/15] drm/i915/dg2: Drop 38.4 MHz MPLLB tables Date: Sat, 19 Feb 2022 00:17:40 +0530 Message-Id: <20220218184752.7524-4-ramalingam.c@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220218184752.7524-1-ramalingam.c@intel.com> References: <20220218184752.7524-1-ramalingam.c@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Anusha Srivatsa , lucas.demarchi@intel.com, =?utf-8?q?Jos=C3=A9_Roberto_de_Souza?= , Uma Shankar Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Matt Roper Our early understanding of DG2 was incorrect; since the 5th display isn't actually a Type-C output, 38.4 MHz input clocks are never used on this platform and we can drop the corresponding MPLLB tables. Cc: Anusha Srivatsa Cc: José Roberto de Souza Signed-off-by: Matt Roper Signed-off-by: Ramalingam C Reviewed-by: Lucas De Marchi Reviewed-by: Uma Shankar Signed-off-by: Lucas De Marchi --- drivers/gpu/drm/i915/display/intel_snps_phy.c | 208 +----------------- 1 file changed, 1 insertion(+), 207 deletions(-) diff --git a/drivers/gpu/drm/i915/display/intel_snps_phy.c b/drivers/gpu/drm/i915/display/intel_snps_phy.c index 4cdce0116883..7e6245b97fed 100644 --- a/drivers/gpu/drm/i915/display/intel_snps_phy.c +++ b/drivers/gpu/drm/i915/display/intel_snps_phy.c @@ -250,197 +250,6 @@ static const struct intel_mpllb_state * const dg2_dp_100_tables[] = { NULL, }; -/* - * Basic DP link rates with 38.4 MHz reference clock. - */ - -static const struct intel_mpllb_state dg2_dp_rbr_38_4 = { - .clock = 162000, - .ref_control = - REG_FIELD_PREP(SNPS_PHY_REF_CONTROL_REF_RANGE, 1), - .mpllb_cp = - REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_INT, 5) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_PROP, 25) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_INT_GS, 65) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_PROP_GS, 127), - .mpllb_div = - REG_FIELD_PREP(SNPS_PHY_MPLLB_DIV5_CLK_EN, 1) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_TX_CLK_DIV, 2) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_PMIX_EN, 1) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_V2I, 2) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_FREQ_VCO, 2), - .mpllb_div2 = - REG_FIELD_PREP(SNPS_PHY_MPLLB_REF_CLK_DIV, 1) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_MULTIPLIER, 304), - .mpllb_fracn1 = - REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_CGG_UPDATE_EN, 1) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_EN, 1) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_DEN, 1), - .mpllb_fracn2 = - REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_QUOT, 49152), -}; - -static const struct intel_mpllb_state dg2_dp_hbr1_38_4 = { - .clock = 270000, - .ref_control = - REG_FIELD_PREP(SNPS_PHY_REF_CONTROL_REF_RANGE, 1), - .mpllb_cp = - REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_INT, 5) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_PROP, 25) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_INT_GS, 65) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_PROP_GS, 127), - .mpllb_div = - REG_FIELD_PREP(SNPS_PHY_MPLLB_DIV5_CLK_EN, 1) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_TX_CLK_DIV, 1) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_PMIX_EN, 1) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_V2I, 2) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_FREQ_VCO, 3), - .mpllb_div2 = - REG_FIELD_PREP(SNPS_PHY_MPLLB_REF_CLK_DIV, 1) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_MULTIPLIER, 248), - .mpllb_fracn1 = - REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_CGG_UPDATE_EN, 1) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_EN, 1) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_DEN, 1), - .mpllb_fracn2 = - REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_QUOT, 40960), -}; - -static const struct intel_mpllb_state dg2_dp_hbr2_38_4 = { - .clock = 540000, - .ref_control = - REG_FIELD_PREP(SNPS_PHY_REF_CONTROL_REF_RANGE, 1), - .mpllb_cp = - REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_INT, 5) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_PROP, 25) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_INT_GS, 65) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_PROP_GS, 127), - .mpllb_div = - REG_FIELD_PREP(SNPS_PHY_MPLLB_DIV5_CLK_EN, 1) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_PMIX_EN, 1) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_V2I, 2) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_FREQ_VCO, 3), - .mpllb_div2 = - REG_FIELD_PREP(SNPS_PHY_MPLLB_REF_CLK_DIV, 1) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_MULTIPLIER, 248), - .mpllb_fracn1 = - REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_CGG_UPDATE_EN, 1) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_EN, 1) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_DEN, 1), - .mpllb_fracn2 = - REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_QUOT, 40960), -}; - -static const struct intel_mpllb_state dg2_dp_hbr3_38_4 = { - .clock = 810000, - .ref_control = - REG_FIELD_PREP(SNPS_PHY_REF_CONTROL_REF_RANGE, 1), - .mpllb_cp = - REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_INT, 6) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_PROP, 26) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_INT_GS, 65) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_PROP_GS, 127), - .mpllb_div = - REG_FIELD_PREP(SNPS_PHY_MPLLB_DIV5_CLK_EN, 1) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_PMIX_EN, 1) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_V2I, 2), - .mpllb_div2 = - REG_FIELD_PREP(SNPS_PHY_MPLLB_REF_CLK_DIV, 1) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_MULTIPLIER, 388), - .mpllb_fracn1 = - REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_CGG_UPDATE_EN, 1) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_EN, 1) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_DEN, 1), - .mpllb_fracn2 = - REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_QUOT, 61440), -}; - -static const struct intel_mpllb_state dg2_dp_uhbr10_38_4 = { - .clock = 1000000, - .ref_control = - REG_FIELD_PREP(SNPS_PHY_REF_CONTROL_REF_RANGE, 1), - .mpllb_cp = - REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_INT, 5) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_PROP, 26) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_INT_GS, 65) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_PROP_GS, 127), - .mpllb_div = - REG_FIELD_PREP(SNPS_PHY_MPLLB_DIV5_CLK_EN, 1) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_DIV_CLK_EN, 1) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_DIV_MULTIPLIER, 8) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_PMIX_EN, 1) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_WORD_DIV2_EN, 1) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_DP2_MODE, 1) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_SHIM_DIV32_CLK_SEL, 1) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_V2I, 2), - .mpllb_div2 = - REG_FIELD_PREP(SNPS_PHY_MPLLB_REF_CLK_DIV, 1) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_MULTIPLIER, 488), - .mpllb_fracn1 = - REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_CGG_UPDATE_EN, 1) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_EN, 1) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_DEN, 3), - .mpllb_fracn2 = - REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_REM, 2) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_QUOT, 27306), - - /* - * SSC will be enabled, DP UHBR has a minimum SSC requirement. - */ - .mpllb_sscen = - REG_FIELD_PREP(SNPS_PHY_MPLLB_SSC_EN, 1) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_SSC_PEAK, 76800), - .mpllb_sscstep = - REG_FIELD_PREP(SNPS_PHY_MPLLB_SSC_STEPSIZE, 129024), -}; - -static const struct intel_mpllb_state dg2_dp_uhbr13_38_4 = { - .clock = 1350000, - .ref_control = - REG_FIELD_PREP(SNPS_PHY_REF_CONTROL_REF_RANGE, 1), - .mpllb_cp = - REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_INT, 6) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_PROP, 56) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_INT_GS, 65) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_PROP_GS, 127), - .mpllb_div = - REG_FIELD_PREP(SNPS_PHY_MPLLB_DIV5_CLK_EN, 1) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_DIV_CLK_EN, 1) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_DIV_MULTIPLIER, 8) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_PMIX_EN, 1) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_WORD_DIV2_EN, 1) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_DP2_MODE, 1) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_V2I, 3), - .mpllb_div2 = - REG_FIELD_PREP(SNPS_PHY_MPLLB_REF_CLK_DIV, 1) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_MULTIPLIER, 670), - .mpllb_fracn1 = - REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_CGG_UPDATE_EN, 1) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_EN, 1) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_DEN, 1), - .mpllb_fracn2 = - REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_QUOT, 36864), - - /* - * SSC will be enabled, DP UHBR has a minimum SSC requirement. - */ - .mpllb_sscen = - REG_FIELD_PREP(SNPS_PHY_MPLLB_SSC_EN, 1) | - REG_FIELD_PREP(SNPS_PHY_MPLLB_SSC_PEAK, 103680), - .mpllb_sscstep = - REG_FIELD_PREP(SNPS_PHY_MPLLB_SSC_STEPSIZE, 174182), -}; - -static const struct intel_mpllb_state * const dg2_dp_38_4_tables[] = { - &dg2_dp_rbr_38_4, - &dg2_dp_hbr1_38_4, - &dg2_dp_hbr2_38_4, - &dg2_dp_hbr3_38_4, - &dg2_dp_uhbr10_38_4, - &dg2_dp_uhbr13_38_4, - NULL, -}; - /* * eDP link rates with 100 MHz reference clock. */ @@ -749,22 +558,7 @@ intel_mpllb_tables_get(struct intel_crtc_state *crtc_state, if (intel_crtc_has_type(crtc_state, INTEL_OUTPUT_EDP)) { return dg2_edp_tables; } else if (intel_crtc_has_dp_encoder(crtc_state)) { - /* - * FIXME: Initially we're just enabling the "combo" outputs on - * port A-D. The MPLLB for those ports takes an input from the - * "Display Filter PLL" which always has an output frequency - * of 100 MHz, hence the use of the _100 tables below. - * - * Once we enable port TC1 it will either use the same 100 MHz - * "Display Filter PLL" (when strapped to support a native - * display connection) or different 38.4 MHz "Filter PLL" when - * strapped to support a USB connection, so we'll need to check - * that to determine which table to use. - */ - if (0) - return dg2_dp_38_4_tables; - else - return dg2_dp_100_tables; + return dg2_dp_100_tables; } else if (intel_crtc_has_type(crtc_state, INTEL_OUTPUT_HDMI)) { return dg2_hdmi_tables; } From patchwork Fri Feb 18 18:47:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Ramalingam C X-Patchwork-Id: 12751793 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5E08EC433F5 for ; Fri, 18 Feb 2022 18:48:04 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id ADD9010E852; Fri, 18 Feb 2022 18:47:55 +0000 (UTC) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by gabe.freedesktop.org (Postfix) with ESMTPS id 757CD10E7EA; Fri, 18 Feb 2022 18:47:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1645210070; x=1676746070; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=avSnmV3TzE2UKZCn7duZQYHZjAWODut4c320p8f2VnY=; b=lO5GVGceX/EK13xw65QfKokOUUFpA8cHVK5E+fthdeVAX6ZjJq3vK8Nh pmpRg23QVMcyObfanxlLneBrqckXpOWwKhUUiuUhiOZwRi7FON+3l1M/x ComXW7JfVcOrpa4GsAcBrAFAHTN5mjQgW0EbVbxSxHx/x1WjTi+QLG4ZO S7k3wmEk9YhdEYwVtu+qiljdanFnPu2ma3lXmrMiLT1IculhJ6jWWQ3di ZLSqB1fDwCuEuUCD1jeYntt3b84sdVjfxfXCRTqWKM3lx5LS4j93ZJP1b 0an08dksNCMC18IQTVTylFi9mMOIfFaQX9IN3HS1kku6fM1lN/obfl9M1 Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10262"; a="238592961" X-IronPort-AV: E=Sophos;i="5.88,379,1635231600"; d="scan'208";a="238592961" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Feb 2022 10:47:50 -0800 X-IronPort-AV: E=Sophos;i="5.88,379,1635231600"; d="scan'208";a="489642119" Received: from ramaling-i9x.iind.intel.com ([10.203.144.108]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Feb 2022 10:47:47 -0800 From: Ramalingam C To: intel-gfx , dri-devel Subject: [PATCH 04/15] drm/i915/dg2: Enable 5th port Date: Sat, 19 Feb 2022 00:17:41 +0530 Message-Id: <20220218184752.7524-5-ramalingam.c@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220218184752.7524-1-ramalingam.c@intel.com> References: <20220218184752.7524-1-ramalingam.c@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Swathi Dhanavanthri , lucas.demarchi@intel.com, =?utf-8?q?Jos=C3=A9_Roberto_de_Souza?= Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Matt Roper DG2 supports a 5th display output which the hardware refers to as "TC1," even though it isn't a Type-C output. This behaves similarly to the TC1 on past platforms with just a couple minor differences: * DG2's TC1 bit in SDEISR is at bit 25 rather than 24 as it is on ICP/TGP/ADP. * DG2 doesn't need the hpd inversion setting that we had to use on DG1 v2: intel_ddi_init(dev_priv, PORT_TC1); [Matt] Cc: Swathi Dhanavanthri Cc: Lucas De Marchi Cc: José Roberto de Souza Signed-off-by: Matt Roper Signed-off-by: Ramalingam C Reviewed-by: Lucas De Marchi Signed-off-by: Lucas De Marchi --- drivers/gpu/drm/i915/display/intel_display.c | 1 + drivers/gpu/drm/i915/display/intel_gmbus.c | 16 ++++++++++++++-- drivers/gpu/drm/i915/i915_irq.c | 5 ++++- drivers/gpu/drm/i915/i915_reg.h | 1 + 4 files changed, 20 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c index aaf2aee4da35..69e15ad2c253 100644 --- a/drivers/gpu/drm/i915/display/intel_display.c +++ b/drivers/gpu/drm/i915/display/intel_display.c @@ -8757,6 +8757,7 @@ static void intel_setup_outputs(struct drm_i915_private *dev_priv) intel_ddi_init(dev_priv, PORT_B); intel_ddi_init(dev_priv, PORT_C); intel_ddi_init(dev_priv, PORT_D_XELPD); + intel_ddi_init(dev_priv, PORT_TC1); } else if (IS_ALDERLAKE_P(dev_priv)) { intel_ddi_init(dev_priv, PORT_A); intel_ddi_init(dev_priv, PORT_B); diff --git a/drivers/gpu/drm/i915/display/intel_gmbus.c b/drivers/gpu/drm/i915/display/intel_gmbus.c index 6ce8c10fe975..2fad03250661 100644 --- a/drivers/gpu/drm/i915/display/intel_gmbus.c +++ b/drivers/gpu/drm/i915/display/intel_gmbus.c @@ -98,11 +98,21 @@ static const struct gmbus_pin gmbus_pins_dg1[] = { [GMBUS_PIN_4_CNP] = { "dpd", GPIOE }, }; +static const struct gmbus_pin gmbus_pins_dg2[] = { + [GMBUS_PIN_1_BXT] = { "dpa", GPIOB }, + [GMBUS_PIN_2_BXT] = { "dpb", GPIOC }, + [GMBUS_PIN_3_BXT] = { "dpc", GPIOD }, + [GMBUS_PIN_4_CNP] = { "dpd", GPIOE }, + [GMBUS_PIN_9_TC1_ICP] = { "tc1", GPIOJ }, +}; + /* pin is expected to be valid */ static const struct gmbus_pin *get_gmbus_pin(struct drm_i915_private *dev_priv, unsigned int pin) { - if (INTEL_PCH_TYPE(dev_priv) >= PCH_DG1) + if (INTEL_PCH_TYPE(dev_priv) >= PCH_DG2) + return &gmbus_pins_dg2[pin]; + else if (INTEL_PCH_TYPE(dev_priv) >= PCH_DG1) return &gmbus_pins_dg1[pin]; else if (INTEL_PCH_TYPE(dev_priv) >= PCH_ICP) return &gmbus_pins_icp[pin]; @@ -123,7 +133,9 @@ bool intel_gmbus_is_valid_pin(struct drm_i915_private *dev_priv, { unsigned int size; - if (INTEL_PCH_TYPE(dev_priv) >= PCH_DG1) + if (INTEL_PCH_TYPE(dev_priv) >= PCH_DG2) + size = ARRAY_SIZE(gmbus_pins_dg2); + else if (INTEL_PCH_TYPE(dev_priv) >= PCH_DG1) size = ARRAY_SIZE(gmbus_pins_dg1); else if (INTEL_PCH_TYPE(dev_priv) >= PCH_ICP) size = ARRAY_SIZE(gmbus_pins_icp); diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index fdd568ba4a16..4d81063b128c 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -179,6 +179,7 @@ static const u32 hpd_sde_dg1[HPD_NUM_PINS] = { [HPD_PORT_B] = SDE_DDI_HOTPLUG_ICP(HPD_PORT_B), [HPD_PORT_C] = SDE_DDI_HOTPLUG_ICP(HPD_PORT_C), [HPD_PORT_D] = SDE_DDI_HOTPLUG_ICP(HPD_PORT_D), + [HPD_PORT_TC1] = SDE_TC_HOTPLUG_DG2(HPD_PORT_TC1), }; static void intel_hpd_init_pins(struct drm_i915_private *dev_priv) @@ -4424,7 +4425,9 @@ void intel_irq_init(struct drm_i915_private *dev_priv) if (I915_HAS_HOTPLUG(dev_priv)) dev_priv->hotplug_funcs = &i915_hpd_funcs; } else { - if (HAS_PCH_DG1(dev_priv)) + if (HAS_PCH_DG2(dev_priv)) + dev_priv->hotplug_funcs = &icp_hpd_funcs; + else if (HAS_PCH_DG1(dev_priv)) dev_priv->hotplug_funcs = &dg1_hpd_funcs; else if (DISPLAY_VER(dev_priv) >= 11) dev_priv->hotplug_funcs = &gen11_hpd_funcs; diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index cc13918fe246..986fb30da9ab 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -6059,6 +6059,7 @@ /* south display engine interrupt: ICP/TGP */ #define SDE_GMBUS_ICP (1 << 23) #define SDE_TC_HOTPLUG_ICP(hpd_pin) REG_BIT(24 + _HPD_PIN_TC(hpd_pin)) +#define SDE_TC_HOTPLUG_DG2(hpd_pin) REG_BIT(25 + _HPD_PIN_TC(hpd_pin)) /* sigh */ #define SDE_DDI_HOTPLUG_ICP(hpd_pin) REG_BIT(16 + _HPD_PIN_DDI(hpd_pin)) #define SDE_DDI_HOTPLUG_MASK_ICP (SDE_DDI_HOTPLUG_ICP(HPD_PORT_D) | \ SDE_DDI_HOTPLUG_ICP(HPD_PORT_C) | \ From patchwork Fri Feb 18 18:47:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Ramalingam C X-Patchwork-Id: 12751792 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 28503C433EF for ; Fri, 18 Feb 2022 18:48:02 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 04B7510E821; Fri, 18 Feb 2022 18:47:55 +0000 (UTC) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by gabe.freedesktop.org (Postfix) with ESMTPS id 290CB10E7EA; Fri, 18 Feb 2022 18:47:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1645210073; x=1676746073; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=qYqiPkht20qnR8at+2JndMN4rVVZMVGIhD7k7npCWQ8=; b=ObOZuITF1oOvWuIPJ1MiQ0bLzlTVC+I7DqpvFryD4KpN/WidWtGqbTRq aMGjptGkw17ucSepFDgoyrwApfHOMlSJS6zKqQplvqRwaOzDr1netojcj oFEQa0hfvkfXuvqqtCQKfDUJiYbQfxEvzIJpO7oodlw2iqZOWwugHi9oL MlkoCJ/taf4xRD9CtW6AoilawT6hcZlGU+bN2uk2QWJ6KHB6FpYwlP1YF dUeNzmFMFZEhYUdCe9Yr+5OLxv5T3QGAoScqvV6VyDdK3joWq9BovUTs7 1jfwyD0+Gbx9OrSNY20/2w1qPRzqIt1EHLpaYo/oHRTS2ZP4sIZPtnKcK Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10262"; a="238592968" X-IronPort-AV: E=Sophos;i="5.88,379,1635231600"; d="scan'208";a="238592968" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Feb 2022 10:47:53 -0800 X-IronPort-AV: E=Sophos;i="5.88,379,1635231600"; d="scan'208";a="489642140" Received: from ramaling-i9x.iind.intel.com ([10.203.144.108]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Feb 2022 10:47:50 -0800 From: Ramalingam C To: intel-gfx , dri-devel Subject: [PATCH 05/15] drm/i915: add needs_compact_pt flag Date: Sat, 19 Feb 2022 00:17:42 +0530 Message-Id: <20220218184752.7524-6-ramalingam.c@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220218184752.7524-1-ramalingam.c@intel.com> References: <20220218184752.7524-1-ramalingam.c@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?utf-8?q?Thomas_Hellstr=C3=B6m?= , lucas.demarchi@intel.com, Matthew Auld Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Add a new platform flag, needs_compact_pt, to mark the requirement of compact pt layout support for the ppGTT when using 64K GTT pages. With this flag has_64k_pages will only indicate requirement of 64K GTT page sizes or larger for device local memory access. v6: * minor doc formatting Suggested-by: Matthew Auld Signed-off-by: Ramalingam C Signed-off-by: Robert Beckett Reviewed-by: Thomas Hellström --- drivers/gpu/drm/i915/i915_drv.h | 11 ++++++++--- drivers/gpu/drm/i915/i915_pci.c | 2 ++ drivers/gpu/drm/i915/intel_device_info.h | 1 + 3 files changed, 11 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index f600d1cb01b3..4a3ac66e777a 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1340,12 +1340,17 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915, /* * Set this flag, when platform requires 64K GTT page sizes or larger for - * device local memory access. Also this flag implies that we require or - * at least support the compact PT layout for the ppGTT when using the 64K - * GTT pages. + * device local memory access. */ #define HAS_64K_PAGES(dev_priv) (INTEL_INFO(dev_priv)->has_64k_pages) +/* + * Set this flag when platform doesn't allow both 64k pages and 4k pages in + * the same PT. this flag means we need to support compact PT layout for the + * ppGTT when using the 64K GTT pages. + */ +#define NEEDS_COMPACT_PT(dev_priv) (INTEL_INFO(dev_priv)->needs_compact_pt) + #define HAS_IPC(dev_priv) (INTEL_INFO(dev_priv)->display.has_ipc) #define HAS_REGION(i915, i) (INTEL_INFO(i915)->memory_regions & (i)) diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c index 91677a9f330c..8df8887d76ae 100644 --- a/drivers/gpu/drm/i915/i915_pci.c +++ b/drivers/gpu/drm/i915/i915_pci.c @@ -1030,6 +1030,7 @@ static const struct intel_device_info xehpsdv_info = { PLATFORM(INTEL_XEHPSDV), .display = { }, .has_64k_pages = 1, + .needs_compact_pt = 1, .platform_engine_mask = BIT(RCS0) | BIT(BCS0) | BIT(VECS0) | BIT(VECS1) | BIT(VECS2) | BIT(VECS3) | @@ -1048,6 +1049,7 @@ static const struct intel_device_info dg2_info = { PLATFORM(INTEL_DG2), .has_guc_deprivilege = 1, .has_64k_pages = 1, + .needs_compact_pt = 1, .platform_engine_mask = BIT(RCS0) | BIT(BCS0) | BIT(VECS0) | BIT(VECS1) | diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h index 27dcfe6f2429..f75673da768d 100644 --- a/drivers/gpu/drm/i915/intel_device_info.h +++ b/drivers/gpu/drm/i915/intel_device_info.h @@ -131,6 +131,7 @@ enum intel_ppgtt_type { /* Keep has_* in alphabetical order */ \ func(has_64bit_reloc); \ func(has_64k_pages); \ + func(needs_compact_pt); \ func(gpu_reset_clobbers_display); \ func(has_reset_engine); \ func(has_global_mocs); \ From patchwork Fri Feb 18 18:47:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Ramalingam C X-Patchwork-Id: 12751798 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 2435BC433F5 for ; Fri, 18 Feb 2022 18:48:29 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 2D14410E972; Fri, 18 Feb 2022 18:48:22 +0000 (UTC) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by gabe.freedesktop.org (Postfix) with ESMTPS id 485AB10E87F; Fri, 18 Feb 2022 18:47:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1645210076; x=1676746076; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=UeN0ZjQhiLjtBTrnlCnYrX20Rtsz/dqkKFMFt39Vyu0=; b=g1PriumcNgmSJYtbauDE+dQ/q+KFjpw5OzbEZ3xyVWascMdtpGFDP54Y YowzzgH9Mrspa2PMj/Evl+1FJ/1mT4NoiOBpMWEwCr7YvQyQaORC9LJKw Q8fAPxwyEhAVNX0qd/kAxnGDQ4S9BUcbi6Fc4YkPraJxjKR0ij6Va0vQR HgbYVSTU2xvneewMltz2KFuOGIG+pmAS2gp18C7er9Al33iDqQy+51Sr+ REgLmxKFTqDdzm0pM+BENibvR1cNs7g5kA3KOI8KtCQpLrMeKdW8YQ/+t +g51ffAR3ZU1DTZyeLZE7jhjuC2yQ3lWNlaCWRzx31KtHhDgdJAEhFwvQ g==; X-IronPort-AV: E=McAfee;i="6200,9189,10262"; a="238592977" X-IronPort-AV: E=Sophos;i="5.88,379,1635231600"; d="scan'208";a="238592977" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Feb 2022 10:47:56 -0800 X-IronPort-AV: E=Sophos;i="5.88,379,1635231600"; d="scan'208";a="489642172" Received: from ramaling-i9x.iind.intel.com ([10.203.144.108]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Feb 2022 10:47:53 -0800 From: Ramalingam C To: intel-gfx , dri-devel Subject: [PATCH 06/15] drm/i915: enforce min GTT alignment for discrete cards Date: Sat, 19 Feb 2022 00:17:43 +0530 Message-Id: <20220218184752.7524-7-ramalingam.c@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220218184752.7524-1-ramalingam.c@intel.com> References: <20220218184752.7524-1-ramalingam.c@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?utf-8?q?Thomas_Hellstr=C3=B6m?= , lucas.demarchi@intel.com, Matthew Auld , Rodrigo Vivi Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Matthew Auld For local-memory objects we need to align the GTT addresses to 64K, both for the ppgtt and ggtt. We need to support vm->min_alignment > 4K, depending on the vm itself and the type of object we are inserting. With this in mind update the GTT selftests to take this into account. For compact-pt we further align and pad lmem object GTT addresses to 2MB to ensure PDEs contain consistent page sizes as required by the HW. v3: * use needs_compact_pt flag to discriminate between 64K and 64K with compact-pt * add i915_vm_obj_min_alignment * use i915_vm_obj_min_alignment to round up vma reservation if compact-pt instead of hard coding v5: * fix i915_vm_obj_min_alignment for internal objects which have no memory region v6: * tiled_blits_create correctly pick largest required alignment v8: * i915_vm_min_alignment protect against array overflow for mock region Signed-off-by: Matthew Auld Signed-off-by: Ramalingam C Signed-off-by: Robert Beckett Reviewed-by: Thomas Hellström Cc: Joonas Lahtinen Cc: Rodrigo Vivi --- .../i915/gem/selftests/i915_gem_client_blt.c | 21 ++-- drivers/gpu/drm/i915/gt/intel_gtt.c | 12 +++ drivers/gpu/drm/i915/gt/intel_gtt.h | 22 +++++ drivers/gpu/drm/i915/i915_vma.c | 9 ++ drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96 ++++++++++++------- 5 files changed, 119 insertions(+), 41 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c index 8f28e46e8ee5..ddd0772fd828 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c @@ -40,6 +40,7 @@ struct tiled_blits { struct blit_buffer scratch; struct i915_vma *batch; u64 hole; + u64 align; u32 width; u32 height; }; @@ -411,14 +412,19 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng) goto err_free; } - hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4); + t->align = i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_LOCAL); + t->align = max(t->align, + i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_SYSTEM)); + + hole_size = 2 * round_up(WIDTH * HEIGHT * 4, t->align); hole_size *= 2; /* room to maneuver */ - hole_size += 2 * I915_GTT_MIN_ALIGNMENT; + hole_size += 2 * t->align; /* padding on either side */ mutex_lock(&t->ce->vm->mutex); memset(&hole, 0, sizeof(hole)); err = drm_mm_insert_node_in_range(&t->ce->vm->mm, &hole, - hole_size, 0, I915_COLOR_UNEVICTABLE, + hole_size, t->align, + I915_COLOR_UNEVICTABLE, 0, U64_MAX, DRM_MM_INSERT_BEST); if (!err) @@ -429,7 +435,7 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng) goto err_put; } - t->hole = hole.start + I915_GTT_MIN_ALIGNMENT; + t->hole = hole.start + t->align; pr_info("Using hole at %llx\n", t->hole); err = tiled_blits_create_buffers(t, WIDTH, HEIGHT, prng); @@ -456,7 +462,7 @@ static void tiled_blits_destroy(struct tiled_blits *t) static int tiled_blits_prepare(struct tiled_blits *t, struct rnd_state *prng) { - u64 offset = PAGE_ALIGN(t->width * t->height * 4); + u64 offset = round_up(t->width * t->height * 4, t->align); u32 *map; int err; int i; @@ -487,8 +493,7 @@ static int tiled_blits_prepare(struct tiled_blits *t, static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng) { - u64 offset = - round_up(t->width * t->height * 4, 2 * I915_GTT_MIN_ALIGNMENT); + u64 offset = round_up(t->width * t->height * 4, 2 * t->align); int err; /* We want to check position invariant tiling across GTT eviction */ @@ -501,7 +506,7 @@ static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng) /* Reposition so that we overlap the old addresses, and slightly off */ err = tiled_blit(t, - &t->buffers[2], t->hole + I915_GTT_MIN_ALIGNMENT, + &t->buffers[2], t->hole + t->align, &t->buffers[1], t->hole + 3 * offset / 2); if (err) return err; diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c index 49a8fb63e6e5..c548c193cd35 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.c +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c @@ -225,6 +225,18 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass) GEM_BUG_ON(!vm->total); drm_mm_init(&vm->mm, 0, vm->total); + + memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT, + ARRAY_SIZE(vm->min_alignment)); + + if (HAS_64K_PAGES(vm->i915) && NEEDS_COMPACT_PT(vm->i915)) { + vm->min_alignment[INTEL_MEMORY_LOCAL] = I915_GTT_PAGE_SIZE_2M; + vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = I915_GTT_PAGE_SIZE_2M; + } else if (HAS_64K_PAGES(vm->i915)) { + vm->min_alignment[INTEL_MEMORY_LOCAL] = I915_GTT_PAGE_SIZE_64K; + vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = I915_GTT_PAGE_SIZE_64K; + } + vm->mm.head_node.color = I915_COLOR_UNEVICTABLE; INIT_LIST_HEAD(&vm->bound_list); diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h index 8073438b67c8..6cd518a3277c 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.h +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h @@ -29,6 +29,8 @@ #include "i915_selftest.h" #include "i915_vma_resource.h" #include "i915_vma_types.h" +#include "i915_params.h" +#include "intel_memory_region.h" #define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN) @@ -223,6 +225,7 @@ struct i915_address_space { struct device *dma; u64 total; /* size addr space maps (ex. 2GB for ggtt) */ u64 reserved; /* size addr space reserved */ + u64 min_alignment[INTEL_MEMORY_STOLEN_LOCAL + 1]; unsigned int bind_async_flags; @@ -384,6 +387,25 @@ i915_vm_has_scratch_64K(struct i915_address_space *vm) return vm->scratch_order == get_order(I915_GTT_PAGE_SIZE_64K); } +static inline u64 i915_vm_min_alignment(struct i915_address_space *vm, + enum intel_memory_type type) +{ + /* avoid INTEL_MEMORY_MOCK overflow */ + if ((int)type >= ARRAY_SIZE(vm->min_alignment)) + type = INTEL_MEMORY_SYSTEM; + + return vm->min_alignment[type]; +} + +static inline u64 i915_vm_obj_min_alignment(struct i915_address_space *vm, + struct drm_i915_gem_object *obj) +{ + struct intel_memory_region *mr = READ_ONCE(obj->mm.region); + enum intel_memory_type type = mr ? mr->type : INTEL_MEMORY_SYSTEM; + + return i915_vm_min_alignment(vm, type); +} + static inline bool i915_vm_has_cache_coloring(struct i915_address_space *vm) { diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 845cd88f8313..3558b16a929c 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -757,6 +757,14 @@ i915_vma_insert(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, end = min_t(u64, end, (1ULL << 32) - I915_GTT_PAGE_SIZE); GEM_BUG_ON(!IS_ALIGNED(end, I915_GTT_PAGE_SIZE)); + alignment = max(alignment, i915_vm_obj_min_alignment(vma->vm, vma->obj)); + /* + * for compact-pt we round up the reservation to prevent + * any smaller pages being used within the same PDE + */ + if (NEEDS_COMPACT_PT(vma->vm->i915)) + size = round_up(size, alignment); + /* If binding the object/GGTT view requires more space than the entire * aperture has, reject it early before evicting everything in a vain * attempt to find space. @@ -769,6 +777,7 @@ i915_vma_insert(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, } color = 0; + if (i915_vm_has_cache_coloring(vma->vm)) color = vma->obj->cache_level; diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c index e7e6c4b2c81d..0d80509ef3c4 100644 --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c @@ -239,6 +239,8 @@ static int lowlevel_hole(struct i915_address_space *vm, u64 hole_start, u64 hole_end, unsigned long end_time) { + const unsigned int min_alignment = + i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM); I915_RND_STATE(seed_prng); struct i915_vma_resource *mock_vma_res; unsigned int size; @@ -252,9 +254,10 @@ static int lowlevel_hole(struct i915_address_space *vm, I915_RND_SUBSTATE(prng, seed_prng); struct drm_i915_gem_object *obj; unsigned int *order, count, n; - u64 hole_size; + u64 hole_size, aligned_size; - hole_size = (hole_end - hole_start) >> size; + aligned_size = max_t(u32, ilog2(min_alignment), size); + hole_size = (hole_end - hole_start) >> aligned_size; if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32)) hole_size = KMALLOC_MAX_SIZE / sizeof(u32); count = hole_size >> 1; @@ -275,8 +278,8 @@ static int lowlevel_hole(struct i915_address_space *vm, } GEM_BUG_ON(!order); - GEM_BUG_ON(count * BIT_ULL(size) > vm->total); - GEM_BUG_ON(hole_start + count * BIT_ULL(size) > hole_end); + GEM_BUG_ON(count * BIT_ULL(aligned_size) > vm->total); + GEM_BUG_ON(hole_start + count * BIT_ULL(aligned_size) > hole_end); /* Ignore allocation failures (i.e. don't report them as * a test failure) as we are purposefully allocating very @@ -299,10 +302,10 @@ static int lowlevel_hole(struct i915_address_space *vm, } for (n = 0; n < count; n++) { - u64 addr = hole_start + order[n] * BIT_ULL(size); + u64 addr = hole_start + order[n] * BIT_ULL(aligned_size); intel_wakeref_t wakeref; - GEM_BUG_ON(addr + BIT_ULL(size) > vm->total); + GEM_BUG_ON(addr + BIT_ULL(aligned_size) > vm->total); if (igt_timeout(end_time, "%s timed out before %d/%d\n", @@ -345,7 +348,7 @@ static int lowlevel_hole(struct i915_address_space *vm, } mock_vma_res->bi.pages = obj->mm.pages; - mock_vma_res->node_size = BIT_ULL(size); + mock_vma_res->node_size = BIT_ULL(aligned_size); mock_vma_res->start = addr; with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref) @@ -356,7 +359,7 @@ static int lowlevel_hole(struct i915_address_space *vm, i915_random_reorder(order, count, &prng); for (n = 0; n < count; n++) { - u64 addr = hole_start + order[n] * BIT_ULL(size); + u64 addr = hole_start + order[n] * BIT_ULL(aligned_size); intel_wakeref_t wakeref; GEM_BUG_ON(addr + BIT_ULL(size) > vm->total); @@ -400,8 +403,10 @@ static int fill_hole(struct i915_address_space *vm, { const u64 hole_size = hole_end - hole_start; struct drm_i915_gem_object *obj; + const unsigned int min_alignment = + i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM); const unsigned long max_pages = - min_t(u64, ULONG_MAX - 1, hole_size/2 >> PAGE_SHIFT); + min_t(u64, ULONG_MAX - 1, (hole_size / 2) >> ilog2(min_alignment)); const unsigned long max_step = max(int_sqrt(max_pages), 2UL); unsigned long npages, prime, flags; struct i915_vma *vma; @@ -442,14 +447,17 @@ static int fill_hole(struct i915_address_space *vm, offset = p->offset; list_for_each_entry(obj, &objects, st_link) { + u64 aligned_size = round_up(obj->base.size, + min_alignment); + vma = i915_vma_instance(obj, vm, NULL); if (IS_ERR(vma)) continue; if (p->step < 0) { - if (offset < hole_start + obj->base.size) + if (offset < hole_start + aligned_size) break; - offset -= obj->base.size; + offset -= aligned_size; } err = i915_vma_pin(vma, 0, 0, offset | flags); @@ -471,22 +479,25 @@ static int fill_hole(struct i915_address_space *vm, i915_vma_unpin(vma); if (p->step > 0) { - if (offset + obj->base.size > hole_end) + if (offset + aligned_size > hole_end) break; - offset += obj->base.size; + offset += aligned_size; } } offset = p->offset; list_for_each_entry(obj, &objects, st_link) { + u64 aligned_size = round_up(obj->base.size, + min_alignment); + vma = i915_vma_instance(obj, vm, NULL); if (IS_ERR(vma)) continue; if (p->step < 0) { - if (offset < hole_start + obj->base.size) + if (offset < hole_start + aligned_size) break; - offset -= obj->base.size; + offset -= aligned_size; } if (!drm_mm_node_allocated(&vma->node) || @@ -507,22 +518,25 @@ static int fill_hole(struct i915_address_space *vm, } if (p->step > 0) { - if (offset + obj->base.size > hole_end) + if (offset + aligned_size > hole_end) break; - offset += obj->base.size; + offset += aligned_size; } } offset = p->offset; list_for_each_entry_reverse(obj, &objects, st_link) { + u64 aligned_size = round_up(obj->base.size, + min_alignment); + vma = i915_vma_instance(obj, vm, NULL); if (IS_ERR(vma)) continue; if (p->step < 0) { - if (offset < hole_start + obj->base.size) + if (offset < hole_start + aligned_size) break; - offset -= obj->base.size; + offset -= aligned_size; } err = i915_vma_pin(vma, 0, 0, offset | flags); @@ -544,22 +558,25 @@ static int fill_hole(struct i915_address_space *vm, i915_vma_unpin(vma); if (p->step > 0) { - if (offset + obj->base.size > hole_end) + if (offset + aligned_size > hole_end) break; - offset += obj->base.size; + offset += aligned_size; } } offset = p->offset; list_for_each_entry_reverse(obj, &objects, st_link) { + u64 aligned_size = round_up(obj->base.size, + min_alignment); + vma = i915_vma_instance(obj, vm, NULL); if (IS_ERR(vma)) continue; if (p->step < 0) { - if (offset < hole_start + obj->base.size) + if (offset < hole_start + aligned_size) break; - offset -= obj->base.size; + offset -= aligned_size; } if (!drm_mm_node_allocated(&vma->node) || @@ -580,9 +597,9 @@ static int fill_hole(struct i915_address_space *vm, } if (p->step > 0) { - if (offset + obj->base.size > hole_end) + if (offset + aligned_size > hole_end) break; - offset += obj->base.size; + offset += aligned_size; } } } @@ -612,6 +629,7 @@ static int walk_hole(struct i915_address_space *vm, const u64 hole_size = hole_end - hole_start; const unsigned long max_pages = min_t(u64, ULONG_MAX - 1, hole_size >> PAGE_SHIFT); + unsigned long min_alignment; unsigned long flags; u64 size; @@ -621,6 +639,8 @@ static int walk_hole(struct i915_address_space *vm, if (i915_is_ggtt(vm)) flags |= PIN_GLOBAL; + min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM); + for_each_prime_number_from(size, 1, max_pages) { struct drm_i915_gem_object *obj; struct i915_vma *vma; @@ -639,7 +659,7 @@ static int walk_hole(struct i915_address_space *vm, for (addr = hole_start; addr + obj->base.size < hole_end; - addr += obj->base.size) { + addr += round_up(obj->base.size, min_alignment)) { err = i915_vma_pin(vma, 0, 0, addr | flags); if (err) { pr_err("%s bind failed at %llx + %llx [hole %llx- %llx] with err=%d\n", @@ -691,6 +711,7 @@ static int pot_hole(struct i915_address_space *vm, { struct drm_i915_gem_object *obj; struct i915_vma *vma; + unsigned int min_alignment; unsigned long flags; unsigned int pot; int err = 0; @@ -699,6 +720,8 @@ static int pot_hole(struct i915_address_space *vm, if (i915_is_ggtt(vm)) flags |= PIN_GLOBAL; + min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM); + obj = i915_gem_object_create_internal(vm->i915, 2 * I915_GTT_PAGE_SIZE); if (IS_ERR(obj)) return PTR_ERR(obj); @@ -711,13 +734,13 @@ static int pot_hole(struct i915_address_space *vm, /* Insert a pair of pages across every pot boundary within the hole */ for (pot = fls64(hole_end - 1) - 1; - pot > ilog2(2 * I915_GTT_PAGE_SIZE); + pot > ilog2(2 * min_alignment); pot--) { u64 step = BIT_ULL(pot); u64 addr; - for (addr = round_up(hole_start + I915_GTT_PAGE_SIZE, step) - I915_GTT_PAGE_SIZE; - addr <= round_down(hole_end - 2*I915_GTT_PAGE_SIZE, step) - I915_GTT_PAGE_SIZE; + for (addr = round_up(hole_start + min_alignment, step) - min_alignment; + addr <= round_down(hole_end - (2 * min_alignment), step) - min_alignment; addr += step) { err = i915_vma_pin(vma, 0, 0, addr | flags); if (err) { @@ -762,6 +785,7 @@ static int drunk_hole(struct i915_address_space *vm, unsigned long end_time) { I915_RND_STATE(prng); + unsigned int min_alignment; unsigned int size; unsigned long flags; @@ -769,15 +793,18 @@ static int drunk_hole(struct i915_address_space *vm, if (i915_is_ggtt(vm)) flags |= PIN_GLOBAL; + min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM); + /* Keep creating larger objects until one cannot fit into the hole */ for (size = 12; (hole_end - hole_start) >> size; size++) { struct drm_i915_gem_object *obj; unsigned int *order, count, n; struct i915_vma *vma; - u64 hole_size; + u64 hole_size, aligned_size; int err = -ENODEV; - hole_size = (hole_end - hole_start) >> size; + aligned_size = max_t(u32, ilog2(min_alignment), size); + hole_size = (hole_end - hole_start) >> aligned_size; if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32)) hole_size = KMALLOC_MAX_SIZE / sizeof(u32); count = hole_size >> 1; @@ -817,7 +844,7 @@ static int drunk_hole(struct i915_address_space *vm, GEM_BUG_ON(vma->size != BIT_ULL(size)); for (n = 0; n < count; n++) { - u64 addr = hole_start + order[n] * BIT_ULL(size); + u64 addr = hole_start + order[n] * BIT_ULL(aligned_size); err = i915_vma_pin(vma, 0, 0, addr | flags); if (err) { @@ -869,11 +896,14 @@ static int __shrink_hole(struct i915_address_space *vm, { struct drm_i915_gem_object *obj; unsigned long flags = PIN_OFFSET_FIXED | PIN_USER; + unsigned int min_alignment; unsigned int order = 12; LIST_HEAD(objects); int err = 0; u64 addr; + min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM); + /* Keep creating larger objects until one cannot fit into the hole */ for (addr = hole_start; addr < hole_end; ) { struct i915_vma *vma; @@ -914,7 +944,7 @@ static int __shrink_hole(struct i915_address_space *vm, } i915_vma_unpin(vma); - addr += size; + addr += round_up(size, min_alignment); /* * Since we are injecting allocation faults at random intervals, From patchwork Fri Feb 18 18:47:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Ramalingam C X-Patchwork-Id: 12751794 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 92459C433FE for ; Fri, 18 Feb 2022 18:48:06 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 6A3E110E844; Fri, 18 Feb 2022 18:48:01 +0000 (UTC) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by gabe.freedesktop.org (Postfix) with ESMTPS id 6AE0D10E844; Fri, 18 Feb 2022 18:47:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1645210079; x=1676746079; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=a+9G6wfsuv2PJnDwgnJetpxq1zlk2LzYMwjkIZqO6S8=; b=KAK378mfpt9cd85bg57u7bCT5eATUX07iucFII9+fkFc1Xg+eobMBr6Q vrmeL95GgD2cB63HcQxIHScvyiuIFL3oNzWGeysO0oOw/3cPbzLtCJP7v ZbpLFkbU47EyM5iBbqIEzPKn5xaohclhtsVuo59Jp/s/hpQqLaQ0sz+u0 oP3+DfQ389jtrY+By0hN1dGbAdxr+MLDre21X16ys/LBCCd+8jqGboCVx svNEUDDjTztXOI8cPhFuFbCJEpt/VDcfbiZ36WVVCuLl/NXeewZRy//Du jmjvN0WcxUaWJB2gagfx51Oq2NlWpTjbRGKg6E1EU4jurxurWiLkcZgFG w==; X-IronPort-AV: E=McAfee;i="6200,9189,10262"; a="238592982" X-IronPort-AV: E=Sophos;i="5.88,379,1635231600"; d="scan'208";a="238592982" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Feb 2022 10:47:59 -0800 X-IronPort-AV: E=Sophos;i="5.88,379,1635231600"; d="scan'208";a="489642201" Received: from ramaling-i9x.iind.intel.com ([10.203.144.108]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Feb 2022 10:47:56 -0800 From: Ramalingam C To: intel-gfx , dri-devel Subject: [PATCH 07/15] drm/i915: support 64K GTT pages for discrete cards Date: Sat, 19 Feb 2022 00:17:44 +0530 Message-Id: <20220218184752.7524-8-ramalingam.c@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220218184752.7524-1-ramalingam.c@intel.com> References: <20220218184752.7524-1-ramalingam.c@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?utf-8?q?Thomas_Hellstr=C3=B6m?= , kernel test robot , lucas.demarchi@intel.com, Matthew Auld , Rodrigo Vivi Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Matthew Auld discrete cards optimise 64K GTT pages for local-memory, since everything should be allocated at 64K granularity. We say goodbye to sparse entries, and instead get a compact 256B page-table for 64K pages, which should be more cache friendly. 4K pages for local-memory are no longer supported by the HW. v4: don't return uninitialized err in igt_ppgtt_compact Reported-by: kernel test robot Signed-off-by: Matthew Auld Signed-off-by: Stuart Summers Signed-off-by: Ramalingam C Signed-off-by: Robert Beckett Reviewed-by: Thomas Hellström Cc: Joonas Lahtinen Cc: Rodrigo Vivi --- .../gpu/drm/i915/gem/selftests/huge_pages.c | 60 ++++++++++ drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 108 +++++++++++++++++- drivers/gpu/drm/i915/gt/intel_gtt.h | 3 + drivers/gpu/drm/i915/gt/intel_ppgtt.c | 1 + 4 files changed, 169 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c index 8424ee8c5eb8..0528fe1fc9b3 100644 --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c @@ -1479,6 +1479,65 @@ static int igt_ppgtt_sanity_check(void *arg) return err; } +static int igt_ppgtt_compact(void *arg) +{ + struct drm_i915_private *i915 = arg; + struct drm_i915_gem_object *obj; + int err; + + /* + * Simple test to catch issues with compact 64K pages -- since the pt is + * compacted to 256B that gives us 32 entries per pt, however since the + * backing page for the pt is 4K, any extra entries we might incorrectly + * write out should be ignored by the HW. If ever hit such a case this + * test should catch it since some of our writes would land in scratch. + */ + + if (!HAS_64K_PAGES(i915)) { + pr_info("device lacks compact 64K page support, skipping\n"); + return 0; + } + + if (!HAS_LMEM(i915)) { + pr_info("device lacks LMEM support, skipping\n"); + return 0; + } + + /* We want the range to cover multiple page-table boundaries. */ + obj = i915_gem_object_create_lmem(i915, SZ_4M, 0); + if (IS_ERR(obj)) + return PTR_ERR(obj); + + err = i915_gem_object_pin_pages_unlocked(obj); + if (err) + goto out_put; + + if (obj->mm.page_sizes.phys < I915_GTT_PAGE_SIZE_64K) { + pr_info("LMEM compact unable to allocate huge-page(s)\n"); + goto out_unpin; + } + + /* + * Disable 2M GTT pages by forcing the page-size to 64K for the GTT + * insertion. + */ + obj->mm.page_sizes.sg = I915_GTT_PAGE_SIZE_64K; + + err = igt_write_huge(i915, obj); + if (err) + pr_err("LMEM compact write-huge failed\n"); + +out_unpin: + i915_gem_object_unpin_pages(obj); +out_put: + i915_gem_object_put(obj); + + if (err == -ENOMEM) + err = 0; + + return err; +} + static int igt_tmpfs_fallback(void *arg) { struct drm_i915_private *i915 = arg; @@ -1736,6 +1795,7 @@ int i915_gem_huge_page_live_selftests(struct drm_i915_private *i915) SUBTEST(igt_tmpfs_fallback), SUBTEST(igt_ppgtt_smoke_huge), SUBTEST(igt_ppgtt_sanity_check), + SUBTEST(igt_ppgtt_compact), }; if (!HAS_PPGTT(i915)) { diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c index c43e724afa9f..62471730266c 100644 --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c @@ -233,6 +233,8 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm, start, end, lvl); } else { unsigned int count; + unsigned int pte = gen8_pd_index(start, 0); + unsigned int num_ptes; u64 *vaddr; count = gen8_pt_count(start, end); @@ -242,10 +244,18 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm, atomic_read(&pt->used)); GEM_BUG_ON(!count || count >= atomic_read(&pt->used)); + num_ptes = count; + if (pt->is_compact) { + GEM_BUG_ON(num_ptes % 16); + GEM_BUG_ON(pte % 16); + num_ptes /= 16; + pte /= 16; + } + vaddr = px_vaddr(pt); - memset64(vaddr + gen8_pd_index(start, 0), + memset64(vaddr + pte, vm->scratch[0]->encode, - count); + num_ptes); atomic_sub(count, &pt->used); start += count; @@ -453,6 +463,95 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt, return idx; } +static void +xehpsdv_ppgtt_insert_huge(struct i915_address_space *vm, + struct i915_vma_resource *vma_res, + struct sgt_dma *iter, + enum i915_cache_level cache_level, + u32 flags) +{ + const gen8_pte_t pte_encode = vm->pte_encode(0, cache_level, flags); + unsigned int rem = sg_dma_len(iter->sg); + u64 start = vma_res->start; + + GEM_BUG_ON(!i915_vm_is_4lvl(vm)); + + do { + struct i915_page_directory * const pdp = + gen8_pdp_for_page_address(vm, start); + struct i915_page_directory * const pd = + i915_pd_entry(pdp, __gen8_pte_index(start, 2)); + struct i915_page_table *pt = + i915_pt_entry(pd, __gen8_pte_index(start, 1)); + gen8_pte_t encode = pte_encode; + unsigned int page_size; + gen8_pte_t *vaddr; + u16 index, max; + + max = I915_PDES; + + if (vma_res->bi.page_sizes.sg & I915_GTT_PAGE_SIZE_2M && + IS_ALIGNED(iter->dma, I915_GTT_PAGE_SIZE_2M) && + rem >= I915_GTT_PAGE_SIZE_2M && + !__gen8_pte_index(start, 0)) { + index = __gen8_pte_index(start, 1); + encode |= GEN8_PDE_PS_2M; + page_size = I915_GTT_PAGE_SIZE_2M; + + vaddr = px_vaddr(pd); + } else { + if (encode & GEN12_PPGTT_PTE_LM) { + GEM_BUG_ON(__gen8_pte_index(start, 0) % 16); + GEM_BUG_ON(rem < I915_GTT_PAGE_SIZE_64K); + GEM_BUG_ON(!IS_ALIGNED(iter->dma, + I915_GTT_PAGE_SIZE_64K)); + + index = __gen8_pte_index(start, 0) / 16; + page_size = I915_GTT_PAGE_SIZE_64K; + + max /= 16; + + vaddr = px_vaddr(pd); + vaddr[__gen8_pte_index(start, 1)] |= GEN12_PDE_64K; + + pt->is_compact = true; + } else { + GEM_BUG_ON(pt->is_compact); + index = __gen8_pte_index(start, 0); + page_size = I915_GTT_PAGE_SIZE; + } + + vaddr = px_vaddr(pt); + } + + do { + GEM_BUG_ON(rem < page_size); + vaddr[index++] = encode | iter->dma; + + start += page_size; + iter->dma += page_size; + rem -= page_size; + if (iter->dma >= iter->max) { + iter->sg = __sg_next(iter->sg); + if (!iter->sg) + break; + + rem = sg_dma_len(iter->sg); + if (!rem) + break; + + iter->dma = sg_dma_address(iter->sg); + iter->max = iter->dma + rem; + + if (unlikely(!IS_ALIGNED(iter->dma, page_size))) + break; + } + } while (rem >= page_size && index < max); + + vma_res->page_sizes_gtt |= page_size; + } while (iter->sg && sg_dma_len(iter->sg)); +} + static void gen8_ppgtt_insert_huge(struct i915_address_space *vm, struct i915_vma_resource *vma_res, struct sgt_dma *iter, @@ -586,7 +685,10 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm, struct sgt_dma iter = sgt_dma(vma_res); if (vma_res->bi.page_sizes.sg > I915_GTT_PAGE_SIZE) { - gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags); + if (HAS_64K_PAGES(vm->i915)) + xehpsdv_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags); + else + gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags); } else { u64 idx = vma_res->start >> GEN8_PTE_SHIFT; diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h index 6cd518a3277c..5e038cef0d9f 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.h +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h @@ -92,6 +92,8 @@ typedef u64 gen8_pte_t; #define GEN12_GGTT_PTE_LM BIT_ULL(1) +#define GEN12_PDE_64K BIT(6) + /* * Cacheability Control is a 4-bit value. The low three bits are stored in bits * 3:1 of the PTE, while the fourth bit is stored in bit 11 of the PTE. @@ -160,6 +162,7 @@ struct i915_page_table { atomic_t used; struct i915_page_table *stash; }; + bool is_compact; }; struct i915_page_directory { diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c index 48e6e2f87700..043652dc6892 100644 --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c @@ -26,6 +26,7 @@ struct i915_page_table *alloc_pt(struct i915_address_space *vm) return ERR_PTR(-ENOMEM); } + pt->is_compact = false; atomic_set(&pt->used, 0); return pt; } From patchwork Fri Feb 18 18:47:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Ramalingam C X-Patchwork-Id: 12751799 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id F102EC433FE for ; Fri, 18 Feb 2022 18:48:30 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 6FAAA10E954; Fri, 18 Feb 2022 18:48:21 +0000 (UTC) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by gabe.freedesktop.org (Postfix) with ESMTPS id 617ED10E862; Fri, 18 Feb 2022 18:48:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1645210082; x=1676746082; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=cMX+v9LBjXGksbfLw0LbeVC5ne2Nz9FNpGhCT7RWb5k=; b=CDuy/hjzWt/eKW2ZWDmyE6W0IsUAaYIxhz/A+7Z8CbzuxuG3dbGm4/hb T5z4c52ydy+JIPpQx/dMWjgMJJch2S0lNL3cFvOklzAlDAxACd4v2cqIO zIGS9TH6kWfJ5tbo5hXsrOE8GSDW+RbCONogW/JCIhrl+oy5jQ41uz5lE DiiYx38m07NibRbqi/woINMm1L1OUUMqsCKnXqx7rkWKbNWeM7MID2ylw aIu7IwA9O4INDZZtSW9xxyiRGa9H0Uil0QjAXLqWm9OB5MHCo4Ix4VeSj Rvh14Ujz8qmh9lFnqphQczL162vU9BoRMz5OoQvvXn2OFNRQ4aZIMTi31 A==; X-IronPort-AV: E=McAfee;i="6200,9189,10262"; a="238592991" X-IronPort-AV: E=Sophos;i="5.88,379,1635231600"; d="scan'208";a="238592991" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Feb 2022 10:48:02 -0800 X-IronPort-AV: E=Sophos;i="5.88,379,1635231600"; d="scan'208";a="489642238" Received: from ramaling-i9x.iind.intel.com ([10.203.144.108]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Feb 2022 10:47:59 -0800 From: Ramalingam C To: intel-gfx , dri-devel Subject: [PATCH 08/15] drm/i915: add gtt misalignment test Date: Sat, 19 Feb 2022 00:17:45 +0530 Message-Id: <20220218184752.7524-9-ramalingam.c@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220218184752.7524-1-ramalingam.c@intel.com> References: <20220218184752.7524-1-ramalingam.c@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Robert Beckett , =?utf-8?q?Thomas_Hellstr?= =?utf-8?q?=C3=B6m?= , kernel test robot , lucas.demarchi@intel.com Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Robert Beckett add test to check handling of misaligned offsets and sizes v4: * remove spurious blank lines * explicitly cast intel_region_id to intel_memory_type in misaligned_pin Reported-by: kernel test robot v6: * use NEEDS_COMPACT_PT instead of hard coding for DG2 v7: * use i915_vma_unbind_unlocked in misalignment test v8: * handle stolen smem region returning -ENODEV due to uninitialized on some setups * avoid trying to test bad alignments on single page hole regions Signed-off-by: Robert Beckett Reviewed-by: Thomas Hellström --- drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 126 ++++++++++++++++++ 1 file changed, 126 insertions(+) diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c index 0d80509ef3c4..cc814abb0105 100644 --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c @@ -26,10 +26,12 @@ #include #include "gem/i915_gem_context.h" +#include "gem/i915_gem_region.h" #include "gem/i915_gem_internal.h" #include "gem/selftests/mock_context.h" #include "gt/intel_context.h" #include "gt/intel_gpu_commands.h" +#include "gt/intel_gtt.h" #include "i915_random.h" #include "i915_selftest.h" @@ -1068,6 +1070,118 @@ static int shrink_boom(struct i915_address_space *vm, return err; } +static int misaligned_case(struct i915_address_space *vm, struct intel_memory_region *mr, + u64 addr, u64 size, unsigned long flags) +{ + struct drm_i915_gem_object *obj; + struct i915_vma *vma; + int err = 0; + u64 expected_vma_size, expected_node_size; + bool is_stolen = mr->type == INTEL_MEMORY_STOLEN_SYSTEM || + mr->type == INTEL_MEMORY_STOLEN_LOCAL; + + obj = i915_gem_object_create_region(mr, size, 0, 0); + if (IS_ERR(obj)) { + /* if iGVT-g or DMAR is active, stolen mem will be uninitialized */ + if (PTR_ERR(obj) == -ENODEV && is_stolen) + return 0; + return PTR_ERR(obj); + } + + vma = i915_vma_instance(obj, vm, NULL); + if (IS_ERR(vma)) { + err = PTR_ERR(vma); + goto err_put; + } + + err = i915_vma_pin(vma, 0, 0, addr | flags); + if (err) + goto err_put; + i915_vma_unpin(vma); + + if (!drm_mm_node_allocated(&vma->node)) { + err = -EINVAL; + goto err_put; + } + + if (i915_vma_misplaced(vma, 0, 0, addr | flags)) { + err = -EINVAL; + goto err_put; + } + + expected_vma_size = round_up(size, 1 << (ffs(vma->resource->page_sizes_gtt) - 1)); + expected_node_size = expected_vma_size; + + if (NEEDS_COMPACT_PT(vm->i915) && i915_gem_object_is_lmem(obj)) { + /* compact-pt should expand lmem node to 2MB */ + expected_vma_size = round_up(size, I915_GTT_PAGE_SIZE_64K); + expected_node_size = round_up(size, I915_GTT_PAGE_SIZE_2M); + } + + if (vma->size != expected_vma_size || vma->node.size != expected_node_size) { + err = i915_vma_unbind_unlocked(vma); + err = -EBADSLT; + goto err_put; + } + + err = i915_vma_unbind_unlocked(vma); + if (err) + goto err_put; + + GEM_BUG_ON(drm_mm_node_allocated(&vma->node)); + +err_put: + i915_gem_object_put(obj); + cleanup_freed_objects(vm->i915); + return err; +} + +static int misaligned_pin(struct i915_address_space *vm, + u64 hole_start, u64 hole_end, + unsigned long end_time) +{ + struct intel_memory_region *mr; + enum intel_region_id id; + unsigned long flags = PIN_OFFSET_FIXED | PIN_USER; + int err = 0; + u64 hole_size = hole_end - hole_start; + + if (i915_is_ggtt(vm)) + flags |= PIN_GLOBAL; + + for_each_memory_region(mr, vm->i915, id) { + u64 min_alignment = i915_vm_min_alignment(vm, (enum intel_memory_type)id); + u64 size = min_alignment; + u64 addr = round_down(hole_start + (hole_size / 2), min_alignment); + + /* avoid -ENOSPC on very small hole setups */ + if (hole_size < 3 * min_alignment) + continue; + + /* we can't test < 4k alignment due to flags being encoded in lower bits */ + if (min_alignment != I915_GTT_PAGE_SIZE_4K) { + err = misaligned_case(vm, mr, addr + (min_alignment / 2), size, flags); + /* misaligned should error with -EINVAL*/ + if (!err) + err = -EBADSLT; + if (err != -EINVAL) + return err; + } + + /* test for vma->size expansion to min page size */ + err = misaligned_case(vm, mr, addr, PAGE_SIZE, flags); + if (err) + return err; + + /* test for intermediate size not expanding vma->size for large alignments */ + err = misaligned_case(vm, mr, addr, size / 2, flags); + if (err) + return err; + } + + return 0; +} + static int exercise_ppgtt(struct drm_i915_private *dev_priv, int (*func)(struct i915_address_space *vm, u64 hole_start, u64 hole_end, @@ -1137,6 +1251,11 @@ static int igt_ppgtt_shrink_boom(void *arg) return exercise_ppgtt(arg, shrink_boom); } +static int igt_ppgtt_misaligned_pin(void *arg) +{ + return exercise_ppgtt(arg, misaligned_pin); +} + static int sort_holes(void *priv, const struct list_head *A, const struct list_head *B) { @@ -1209,6 +1328,11 @@ static int igt_ggtt_lowlevel(void *arg) return exercise_ggtt(arg, lowlevel_hole); } +static int igt_ggtt_misaligned_pin(void *arg) +{ + return exercise_ggtt(arg, misaligned_pin); +} + static int igt_ggtt_page(void *arg) { const unsigned int count = PAGE_SIZE/sizeof(u32); @@ -2181,12 +2305,14 @@ int i915_gem_gtt_live_selftests(struct drm_i915_private *i915) SUBTEST(igt_ppgtt_fill), SUBTEST(igt_ppgtt_shrink), SUBTEST(igt_ppgtt_shrink_boom), + SUBTEST(igt_ppgtt_misaligned_pin), SUBTEST(igt_ggtt_lowlevel), SUBTEST(igt_ggtt_drunk), SUBTEST(igt_ggtt_walk), SUBTEST(igt_ggtt_pot), SUBTEST(igt_ggtt_fill), SUBTEST(igt_ggtt_page), + SUBTEST(igt_ggtt_misaligned_pin), SUBTEST(igt_cs_tlb), }; From patchwork Fri Feb 18 18:47:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Ramalingam C X-Patchwork-Id: 12751796 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3A3CFC433FE for ; Fri, 18 Feb 2022 18:48:19 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 97A4110E8A8; Fri, 18 Feb 2022 18:48:16 +0000 (UTC) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by gabe.freedesktop.org (Postfix) with ESMTPS id 0EC6D10E887; Fri, 18 Feb 2022 18:48:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1645210085; x=1676746085; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=qylx3PcVcQZ/5xYAjaCOuzD/Pb9x0ArWVZ36SE6s75w=; b=eXWDflC3a4YZ4cNITCViKApqPDq307LvdwsyKFYz2zBqFXvT/jspM0aH gw98b9MkFlvF8zllOiElWb/TE9KyKR2RGbj8xgE142S4BPIs1eu1nAt4+ rQePWmVLT0SdTuopgNagHRFwHpeg0npftUoKprS2zBXXWu4YRrWegnP+q NQOoWEcbtx9hhbAlWDxp9fhurI9cZ7V7Xz4u0S29HxOv9Dgy/+aRbAS9C NN1Vv4pnKO2BwoDe8R5p1AQTvMIBpKa8fbh9uJ3mdbahiFmBRvLxTLxWo ysg5kWKyylLWdsUNBRgzcItFAj2lFSWeM9sjuPOxNbiC3G8kRyy0aSNdA Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10262"; a="238592999" X-IronPort-AV: E=Sophos;i="5.88,379,1635231600"; d="scan'208";a="238592999" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Feb 2022 10:48:04 -0800 X-IronPort-AV: E=Sophos;i="5.88,379,1635231600"; d="scan'208";a="489642266" Received: from ramaling-i9x.iind.intel.com ([10.203.144.108]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Feb 2022 10:48:02 -0800 From: Ramalingam C To: intel-gfx , dri-devel Subject: [PATCH 09/15] drm/i915/gtt: allow overriding the pt alignment Date: Sat, 19 Feb 2022 00:17:46 +0530 Message-Id: <20220218184752.7524-10-ramalingam.c@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220218184752.7524-1-ramalingam.c@intel.com> References: <20220218184752.7524-1-ramalingam.c@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?utf-8?q?Thomas_Hellstr=C3=B6m?= , lucas.demarchi@intel.com, Matthew Auld Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Matthew Auld On some platforms we have alignment restrictions when accessing LMEM from the GTT. In the next few patches we need to be able to modify the page-tables directly via the GTT itself. Suggested-by: Ramalingam C Signed-off-by: Matthew Auld Cc: Thomas Hellström Cc: Ramalingam C Reviewed-by: Ramalingam C --- drivers/gpu/drm/i915/gt/intel_gtt.h | 10 +++++++++- drivers/gpu/drm/i915/gt/intel_ppgtt.c | 16 ++++++++++++---- 2 files changed, 21 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h index 5e038cef0d9f..9d83c2d3959c 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.h +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h @@ -200,6 +200,14 @@ void *__px_vaddr(struct drm_i915_gem_object *p); struct i915_vm_pt_stash { /* preallocated chains of page tables/directories */ struct i915_page_table *pt[2]; + /* + * Optionally override the alignment/size of the physical page that + * contains each PT. If not set defaults back to the usual + * I915_GTT_PAGE_SIZE_4K. This does not influence the other paging + * structures. MUST be a power-of-two. ONLY applicable on discrete + * platforms. + */ + int pt_sz; }; struct i915_vma_ops { @@ -595,7 +603,7 @@ void free_scratch(struct i915_address_space *vm); struct drm_i915_gem_object *alloc_pt_dma(struct i915_address_space *vm, int sz); struct drm_i915_gem_object *alloc_pt_lmem(struct i915_address_space *vm, int sz); -struct i915_page_table *alloc_pt(struct i915_address_space *vm); +struct i915_page_table *alloc_pt(struct i915_address_space *vm, int sz); struct i915_page_directory *alloc_pd(struct i915_address_space *vm); struct i915_page_directory *__alloc_pd(int npde); diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c index 043652dc6892..d91e2beb7517 100644 --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c @@ -12,7 +12,7 @@ #include "gen6_ppgtt.h" #include "gen8_ppgtt.h" -struct i915_page_table *alloc_pt(struct i915_address_space *vm) +struct i915_page_table *alloc_pt(struct i915_address_space *vm, int sz) { struct i915_page_table *pt; @@ -20,7 +20,7 @@ struct i915_page_table *alloc_pt(struct i915_address_space *vm) if (unlikely(!pt)) return ERR_PTR(-ENOMEM); - pt->base = vm->alloc_pt_dma(vm, I915_GTT_PAGE_SIZE_4K); + pt->base = vm->alloc_pt_dma(vm, sz); if (IS_ERR(pt->base)) { kfree(pt); return ERR_PTR(-ENOMEM); @@ -221,17 +221,25 @@ int i915_vm_alloc_pt_stash(struct i915_address_space *vm, u64 size) { unsigned long count; - int shift, n; + int shift, n, pt_sz; shift = vm->pd_shift; if (!shift) return 0; + pt_sz = stash->pt_sz; + if (!pt_sz) + pt_sz = I915_GTT_PAGE_SIZE_4K; + else + GEM_BUG_ON(!IS_DGFX(vm->i915)); + + GEM_BUG_ON(!is_power_of_2(pt_sz)); + count = pd_count(size, shift); while (count--) { struct i915_page_table *pt; - pt = alloc_pt(vm); + pt = alloc_pt(vm, pt_sz); if (IS_ERR(pt)) { i915_vm_free_pt_stash(vm, stash); return PTR_ERR(pt); From patchwork Fri Feb 18 18:47:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Ramalingam C X-Patchwork-Id: 12751795 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 16C81C433F5 for ; Fri, 18 Feb 2022 18:48:17 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 04AA210E8D5; Fri, 18 Feb 2022 18:48:16 +0000 (UTC) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by gabe.freedesktop.org (Postfix) with ESMTPS id BDEAE10E8C8; Fri, 18 Feb 2022 18:48:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1645210087; x=1676746087; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=mbuRJ0e7nK525gI4ZD7nkJZtF2bVkOJgOdvwDTzxYXI=; b=SwyjGoX/Q/FWFjFg9Pca40vGec6zEqEBwoHb8BTrhP+1GHKnQwRhglcg MoRTmOOv0wUx6D72XwEDjKBrZkQ7/QA/GWoj9MkhR379Yy7wgPnv+9rIy GJFYXAyI8zL4ykQIwf4HYGVvCoXP/JAVfUxgS9ODuXw9SgjagC4t4pTMI amOUnbeHpcPW7gD+R1o79YvOdEbxxB5Y2b/7BOunVtBFep2V5K+Ee5U9S uhi5/TTyH6DxlXcUmIB1vRm68gxyp8aLVZpmn5+ojRBuRwTo+AgNPtt+F oBUZyrHqR1TjkpswU4Uyx9kFIgAR6jB5knn2RjypbYeNbo1UBmugMZnIe g==; X-IronPort-AV: E=McAfee;i="6200,9189,10262"; a="238593008" X-IronPort-AV: E=Sophos;i="5.88,379,1635231600"; d="scan'208";a="238593008" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Feb 2022 10:48:07 -0800 X-IronPort-AV: E=Sophos;i="5.88,379,1635231600"; d="scan'208";a="489642292" Received: from ramaling-i9x.iind.intel.com ([10.203.144.108]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Feb 2022 10:48:05 -0800 From: Ramalingam C To: intel-gfx , dri-devel Subject: [PATCH 10/15] drm/i915/gtt: add xehpsdv_ppgtt_insert_entry Date: Sat, 19 Feb 2022 00:17:47 +0530 Message-Id: <20220218184752.7524-11-ramalingam.c@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220218184752.7524-1-ramalingam.c@intel.com> References: <20220218184752.7524-1-ramalingam.c@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?utf-8?q?Thomas_Hellstr=C3=B6m?= , lucas.demarchi@intel.com, Matthew Auld Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Matthew Auld If this is LMEM then we get a 32 entry PT, with each PTE pointing to some 64K block of memory, otherwise it's just the usual 512 entry PT. This very much assumes the caller knows what they are doing. Signed-off-by: Matthew Auld Cc: Thomas Hellström Cc: Ramalingam C Reviewed-by: Ramalingam C --- drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 50 ++++++++++++++++++++++++++-- 1 file changed, 48 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c index 62471730266c..f574da00eff1 100644 --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c @@ -715,13 +715,56 @@ static void gen8_ppgtt_insert_entry(struct i915_address_space *vm, gen8_pdp_for_page_index(vm, idx); struct i915_page_directory *pd = i915_pd_entry(pdp, gen8_pd_index(idx, 2)); + struct i915_page_table *pt = i915_pt_entry(pd, gen8_pd_index(idx, 1)); gen8_pte_t *vaddr; - vaddr = px_vaddr(i915_pt_entry(pd, gen8_pd_index(idx, 1))); + GEM_BUG_ON(pt->is_compact); + + vaddr = px_vaddr(pt); vaddr[gen8_pd_index(idx, 0)] = gen8_pte_encode(addr, level, flags); clflush_cache_range(&vaddr[gen8_pd_index(idx, 0)], sizeof(*vaddr)); } +static void __xehpsdv_ppgtt_insert_entry_lm(struct i915_address_space *vm, + dma_addr_t addr, + u64 offset, + enum i915_cache_level level, + u32 flags) +{ + u64 idx = offset >> GEN8_PTE_SHIFT; + struct i915_page_directory * const pdp = + gen8_pdp_for_page_index(vm, idx); + struct i915_page_directory *pd = + i915_pd_entry(pdp, gen8_pd_index(idx, 2)); + struct i915_page_table *pt = i915_pt_entry(pd, gen8_pd_index(idx, 1)); + gen8_pte_t *vaddr; + + GEM_BUG_ON(!IS_ALIGNED(addr, SZ_64K)); + GEM_BUG_ON(!IS_ALIGNED(offset, SZ_64K)); + + if (!pt->is_compact) { + vaddr = px_vaddr(pd); + vaddr[gen8_pd_index(idx, 1)] |= GEN12_PDE_64K; + pt->is_compact = true; + } + + vaddr = px_vaddr(pt); + vaddr[gen8_pd_index(idx, 0) / 16] = gen8_pte_encode(addr, level, flags); +} + +static void xehpsdv_ppgtt_insert_entry(struct i915_address_space *vm, + dma_addr_t addr, + u64 offset, + enum i915_cache_level level, + u32 flags) +{ + if (flags & PTE_LM) + return __xehpsdv_ppgtt_insert_entry_lm(vm, addr, offset, + level, flags); + + return gen8_ppgtt_insert_entry(vm, addr, offset, level, flags); +} + static int gen8_init_scratch(struct i915_address_space *vm) { u32 pte_flags; @@ -921,7 +964,10 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt, ppgtt->vm.bind_async_flags = I915_VMA_LOCAL_BIND; ppgtt->vm.insert_entries = gen8_ppgtt_insert; - ppgtt->vm.insert_page = gen8_ppgtt_insert_entry; + if (HAS_64K_PAGES(gt->i915)) + ppgtt->vm.insert_page = xehpsdv_ppgtt_insert_entry; + else + ppgtt->vm.insert_page = gen8_ppgtt_insert_entry; ppgtt->vm.allocate_va_range = gen8_ppgtt_alloc; ppgtt->vm.clear_range = gen8_ppgtt_clear; ppgtt->vm.foreach = gen8_ppgtt_foreach; From patchwork Fri Feb 18 18:47:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Ramalingam C X-Patchwork-Id: 12751797 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B6424C433F5 for ; Fri, 18 Feb 2022 18:48:25 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 6CAD610E946; Fri, 18 Feb 2022 18:48:21 +0000 (UTC) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by gabe.freedesktop.org (Postfix) with ESMTPS id 8011410E8C8; Fri, 18 Feb 2022 18:48:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1645210090; x=1676746090; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=hRIILt9tJhC26kqrnWf/jfBds5O9Mz0NUy2C+A3WfSs=; b=BHEB4KhdEK6PoxntAvFSiScJatAD0H4AwMMH9pd7dDuTxtLNzfxNhdjT G3/Bx86eZLHJ/LEfJQSogNNtxPgbCttVcponU3A9Dv+xY2btywB6/v8m1 1bMRVMOLnCZ8yhPeSk51yR1Ga7bXQZbYuVTKXPNLJhE1mtHL1PDE6lbZy bBoDBvOmNBMfoC7OfU/2nIrrAxPFC3D3lLxs08vJEVECih3OSfW10T4PT /21M29SmRIK0Sx5DPm9Bqw0eYqMHhh6Yds/1sXbjpZXmKNNFiDXR9yYfY NM68gc1sazKNVoLgkt5uPFmHVZsrgnvHkKxTTxfKUXXZnhwzg9L7hK/8P g==; X-IronPort-AV: E=McAfee;i="6200,9189,10262"; a="238593023" X-IronPort-AV: E=Sophos;i="5.88,379,1635231600"; d="scan'208";a="238593023" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Feb 2022 10:48:10 -0800 X-IronPort-AV: E=Sophos;i="5.88,379,1635231600"; d="scan'208";a="489642320" Received: from ramaling-i9x.iind.intel.com ([10.203.144.108]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Feb 2022 10:48:07 -0800 From: Ramalingam C To: intel-gfx , dri-devel Subject: [PATCH 11/15] drm/i915/migrate: add acceleration support for DG2 Date: Sat, 19 Feb 2022 00:17:48 +0530 Message-Id: <20220218184752.7524-12-ramalingam.c@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220218184752.7524-1-ramalingam.c@intel.com> References: <20220218184752.7524-1-ramalingam.c@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?utf-8?q?Thomas_Hellstr=C3=B6m?= , lucas.demarchi@intel.com, Matthew Auld Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Matthew Auld This is all kinds of awkward since we now have to contend with using 64K GTT pages when mapping anything in LMEM(including the page-tables themselves). v2(Ram) - Document the ppGTT layout and add a better description for the different windows. Signed-off-by: Matthew Auld Cc: Thomas Hellström Cc: Ramalingam C Reviewed-by: Ramalingam C --- drivers/gpu/drm/i915/gt/intel_migrate.c | 196 ++++++++++++++++++++---- 1 file changed, 164 insertions(+), 32 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c index 18b44af56969..20444d6ceb3c 100644 --- a/drivers/gpu/drm/i915/gt/intel_migrate.c +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c @@ -32,6 +32,38 @@ static bool engine_supports_migration(struct intel_engine_cs *engine) return true; } +static void xehpsdv_toggle_pdes(struct i915_address_space *vm, + struct i915_page_table *pt, + void *data) +{ + struct insert_pte_data *d = data; + + /* + * Insert a dummy PTE into every PT that will map to LMEM to ensure + * we have a correctly setup PDE structure for later use. + */ + vm->insert_page(vm, 0, d->offset, I915_CACHE_NONE, PTE_LM); + GEM_BUG_ON(!pt->is_compact); + d->offset += SZ_2M; +} + +static void xehpsdv_insert_pte(struct i915_address_space *vm, + struct i915_page_table *pt, + void *data) +{ + struct insert_pte_data *d = data; + + /* + * We are playing tricks here, since the actual pt, from the hw + * pov, is only 256bytes with 32 entries, or 4096bytes with 512 + * entries, but we are still guaranteed that the physical + * alignment is 64K underneath for the pt, and we are careful + * not to access the space in the void. + */ + vm->insert_page(vm, px_dma(pt), d->offset, I915_CACHE_NONE, PTE_LM); + d->offset += SZ_64K; +} + static void insert_pte(struct i915_address_space *vm, struct i915_page_table *pt, void *data) @@ -74,7 +106,32 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt) * i.e. within the same non-preemptible window so that we do not switch * to another migration context that overwrites the PTE. * - * TODO: Add support for huge LMEM PTEs + * This changes quite a bit on platforms with HAS_64K_PAGES support, + * where we instead have three windows, each CHUNK_SIZE in size. The + * first is reserved for mapping system-memory, and that just uses the + * 512 entry layout using 4K GTT pages. The other two windows just map + * lmem pages and must use the new compact 32 entry layout using 64K GTT + * pages, which ensures we can address any lmem object that the user + * throws at us. We then also use the xehpsdv_toggle_pdes as a way of + * just toggling the PDE bit(GEN12_PDE_64K) for us, to enable the + * compact layout for each of these page-tables, that fall within the + * [CHUNK_SIZE, 3 * CHUNK_SIZE) range. + * + * We lay the ppGTT out as: + * + * [0, CHUNK_SZ) -> first window/object, maps smem + * [CHUNK_SZ, 2 * CHUNK_SZ) -> second window/object, maps lmem src + * [2 * CHUNK_SZ, 3 * CHUNK_SZ) -> third window/object, maps lmem dst + * + * For the PTE window it's also quite different, since each PTE must + * point to some 64K page, one for each PT(since it's in lmem), and yet + * each is only <= 4096bytes, but since the unused space within that PTE + * range is never touched, this should be fine. + * + * So basically each PT now needs 64K of virtual memory, instead of 4K, + * which looks like: + * + * [3 * CHUNK_SZ, 3 * CHUNK_SZ + ((3 * CHUNK_SZ / SZ_2M) * SZ_64K)] -> PTE */ vm = i915_ppgtt_create(gt, I915_BO_ALLOC_PM_EARLY); @@ -86,6 +143,9 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt) goto err_vm; } + if (HAS_64K_PAGES(gt->i915)) + stash.pt_sz = I915_GTT_PAGE_SIZE_64K; + /* * Each engine instance is assigned its own chunk in the VM, so * that we can run multiple instances concurrently @@ -105,14 +165,20 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt) * We copy in 8MiB chunks. Each PDE covers 2MiB, so we need * 4x2 page directories for source/destination. */ - sz = 2 * CHUNK_SZ; + if (HAS_64K_PAGES(gt->i915)) + sz = 3 * CHUNK_SZ; + else + sz = 2 * CHUNK_SZ; d.offset = base + sz; /* * We need another page directory setup so that we can write * the 8x512 PTE in each chunk. */ - sz += (sz >> 12) * sizeof(u64); + if (HAS_64K_PAGES(gt->i915)) + sz += (sz / SZ_2M) * SZ_64K; + else + sz += (sz >> 12) * sizeof(u64); err = i915_vm_alloc_pt_stash(&vm->vm, &stash, sz); if (err) @@ -133,7 +199,18 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt) goto err_vm; /* Now allow the GPU to rewrite the PTE via its own ppGTT */ - vm->vm.foreach(&vm->vm, base, d.offset - base, insert_pte, &d); + if (HAS_64K_PAGES(gt->i915)) { + vm->vm.foreach(&vm->vm, base, d.offset - base, + xehpsdv_insert_pte, &d); + d.offset = base + CHUNK_SZ; + vm->vm.foreach(&vm->vm, + d.offset, + 2 * CHUNK_SZ, + xehpsdv_toggle_pdes, &d); + } else { + vm->vm.foreach(&vm->vm, base, d.offset - base, + insert_pte, &d); + } } return &vm->vm; @@ -269,19 +346,38 @@ static int emit_pte(struct i915_request *rq, u64 offset, int length) { + bool has_64K_pages = HAS_64K_PAGES(rq->engine->i915); const u64 encode = rq->context->vm->pte_encode(0, cache_level, is_lmem ? PTE_LM : 0); struct intel_ring *ring = rq->ring; - int total = 0; + int pkt, dword_length; + u32 total = 0; + u32 page_size; u32 *hdr, *cs; - int pkt; GEM_BUG_ON(GRAPHICS_VER(rq->engine->i915) < 8); + page_size = I915_GTT_PAGE_SIZE; + dword_length = 0x400; + /* Compute the page directory offset for the target address range */ - offset >>= 12; - offset *= sizeof(u64); - offset += 2 * CHUNK_SZ; + if (has_64K_pages) { + GEM_BUG_ON(!IS_ALIGNED(offset, SZ_2M)); + + offset /= SZ_2M; + offset *= SZ_64K; + offset += 3 * CHUNK_SZ; + + if (is_lmem) { + page_size = I915_GTT_PAGE_SIZE_64K; + dword_length = 0x40; + } + } else { + offset >>= 12; + offset *= sizeof(u64); + offset += 2 * CHUNK_SZ; + } + offset += (u64)rq->engine->instance << 32; cs = intel_ring_begin(rq, 6); @@ -289,7 +385,7 @@ static int emit_pte(struct i915_request *rq, return PTR_ERR(cs); /* Pack as many PTE updates as possible into a single MI command */ - pkt = min_t(int, 0x400, ring->space / sizeof(u32) + 5); + pkt = min_t(int, dword_length, ring->space / sizeof(u32) + 5); pkt = min_t(int, pkt, (ring->size - ring->emit) / sizeof(u32) + 5); hdr = cs; @@ -299,6 +395,8 @@ static int emit_pte(struct i915_request *rq, do { if (cs - hdr >= pkt) { + int dword_rem; + *hdr += cs - hdr - 2; *cs++ = MI_NOOP; @@ -310,7 +408,18 @@ static int emit_pte(struct i915_request *rq, if (IS_ERR(cs)) return PTR_ERR(cs); - pkt = min_t(int, 0x400, ring->space / sizeof(u32) + 5); + dword_rem = dword_length; + if (has_64K_pages) { + if (IS_ALIGNED(total, SZ_2M)) { + offset = round_up(offset, SZ_64K); + } else { + dword_rem = SZ_2M - (total & (SZ_2M - 1)); + dword_rem /= page_size; + dword_rem *= 2; + } + } + + pkt = min_t(int, dword_rem, ring->space / sizeof(u32) + 5); pkt = min_t(int, pkt, (ring->size - ring->emit) / sizeof(u32) + 5); hdr = cs; @@ -319,13 +428,15 @@ static int emit_pte(struct i915_request *rq, *cs++ = upper_32_bits(offset); } + GEM_BUG_ON(!IS_ALIGNED(it->dma, page_size)); + *cs++ = lower_32_bits(encode | it->dma); *cs++ = upper_32_bits(encode | it->dma); offset += 8; - total += I915_GTT_PAGE_SIZE; + total += page_size; - it->dma += I915_GTT_PAGE_SIZE; + it->dma += page_size; if (it->dma >= it->max) { it->sg = __sg_next(it->sg); if (!it->sg || sg_dma_len(it->sg) == 0) @@ -356,7 +467,8 @@ static bool wa_1209644611_applies(int ver, u32 size) return height % 4 == 3 && height <= 8; } -static int emit_copy(struct i915_request *rq, int size) +static int emit_copy(struct i915_request *rq, + u32 dst_offset, u32 src_offset, int size) { const int ver = GRAPHICS_VER(rq->engine->i915); u32 instance = rq->engine->instance; @@ -371,31 +483,31 @@ static int emit_copy(struct i915_request *rq, int size) *cs++ = BLT_DEPTH_32 | PAGE_SIZE; *cs++ = 0; *cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4; - *cs++ = CHUNK_SZ; /* dst offset */ + *cs++ = dst_offset; *cs++ = instance; *cs++ = 0; *cs++ = PAGE_SIZE; - *cs++ = 0; /* src offset */ + *cs++ = src_offset; *cs++ = instance; } else if (ver >= 8) { *cs++ = XY_SRC_COPY_BLT_CMD | BLT_WRITE_RGBA | (10 - 2); *cs++ = BLT_DEPTH_32 | BLT_ROP_SRC_COPY | PAGE_SIZE; *cs++ = 0; *cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4; - *cs++ = CHUNK_SZ; /* dst offset */ + *cs++ = dst_offset; *cs++ = instance; *cs++ = 0; *cs++ = PAGE_SIZE; - *cs++ = 0; /* src offset */ + *cs++ = src_offset; *cs++ = instance; } else { GEM_BUG_ON(instance); *cs++ = SRC_COPY_BLT_CMD | BLT_WRITE_RGBA | (6 - 2); *cs++ = BLT_DEPTH_32 | BLT_ROP_SRC_COPY | PAGE_SIZE; *cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE; - *cs++ = CHUNK_SZ; /* dst offset */ + *cs++ = dst_offset; *cs++ = PAGE_SIZE; - *cs++ = 0; /* src offset */ + *cs++ = src_offset; } intel_ring_advance(rq, cs); @@ -423,6 +535,7 @@ intel_context_migrate_copy(struct intel_context *ce, GEM_BUG_ON(ce->ring->size < SZ_64K); do { + u32 src_offset, dst_offset; int len; rq = i915_request_create(ce); @@ -450,15 +563,28 @@ intel_context_migrate_copy(struct intel_context *ce, if (err) goto out_rq; - len = emit_pte(rq, &it_src, src_cache_level, src_is_lmem, 0, - CHUNK_SZ); + src_offset = 0; + dst_offset = CHUNK_SZ; + if (HAS_64K_PAGES(ce->engine->i915)) { + GEM_BUG_ON(!src_is_lmem && !dst_is_lmem); + + src_offset = 0; + dst_offset = 0; + if (src_is_lmem) + src_offset = CHUNK_SZ; + if (dst_is_lmem) + dst_offset = 2 * CHUNK_SZ; + } + + len = emit_pte(rq, &it_src, src_cache_level, src_is_lmem, + src_offset, CHUNK_SZ); if (len <= 0) { err = len; goto out_rq; } err = emit_pte(rq, &it_dst, dst_cache_level, dst_is_lmem, - CHUNK_SZ, len); + dst_offset, len); if (err < 0) goto out_rq; if (err < len) { @@ -470,7 +596,7 @@ intel_context_migrate_copy(struct intel_context *ce, if (err) goto out_rq; - err = emit_copy(rq, len); + err = emit_copy(rq, dst_offset, src_offset, len); /* Arbitration is re-enabled between requests. */ out_rq: @@ -488,14 +614,15 @@ intel_context_migrate_copy(struct intel_context *ce, return err; } -static int emit_clear(struct i915_request *rq, int size, u32 value) +static int emit_clear(struct i915_request *rq, u64 offset, int size, u32 value) { const int ver = GRAPHICS_VER(rq->engine->i915); - u32 instance = rq->engine->instance; u32 *cs; GEM_BUG_ON(size >> PAGE_SHIFT > S16_MAX); + offset += (u64)rq->engine->instance << 32; + cs = intel_ring_begin(rq, ver >= 8 ? 8 : 6); if (IS_ERR(cs)) return PTR_ERR(cs); @@ -505,17 +632,17 @@ static int emit_clear(struct i915_request *rq, int size, u32 value) *cs++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY | PAGE_SIZE; *cs++ = 0; *cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4; - *cs++ = 0; /* offset */ - *cs++ = instance; + *cs++ = lower_32_bits(offset); + *cs++ = upper_32_bits(offset); *cs++ = value; *cs++ = MI_NOOP; } else { - GEM_BUG_ON(instance); + GEM_BUG_ON(upper_32_bits(offset)); *cs++ = XY_COLOR_BLT_CMD | BLT_WRITE_RGBA | (6 - 2); *cs++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY | PAGE_SIZE; *cs++ = 0; *cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4; - *cs++ = 0; + *cs++ = lower_32_bits(offset); *cs++ = value; } @@ -542,6 +669,7 @@ intel_context_migrate_clear(struct intel_context *ce, GEM_BUG_ON(ce->ring->size < SZ_64K); do { + u32 offset; int len; rq = i915_request_create(ce); @@ -569,7 +697,11 @@ intel_context_migrate_clear(struct intel_context *ce, if (err) goto out_rq; - len = emit_pte(rq, &it, cache_level, is_lmem, 0, CHUNK_SZ); + offset = 0; + if (HAS_64K_PAGES(ce->engine->i915) && is_lmem) + offset = CHUNK_SZ; + + len = emit_pte(rq, &it, cache_level, is_lmem, offset, CHUNK_SZ); if (len <= 0) { err = len; goto out_rq; @@ -579,7 +711,7 @@ intel_context_migrate_clear(struct intel_context *ce, if (err) goto out_rq; - err = emit_clear(rq, len, value); + err = emit_clear(rq, offset, len, value); /* Arbitration is re-enabled between requests. */ out_rq: From patchwork Fri Feb 18 18:47:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Ramalingam C X-Patchwork-Id: 12751800 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id AB7FAC433EF for ; Fri, 18 Feb 2022 18:48:32 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 2E0B210E911; Fri, 18 Feb 2022 18:48:23 +0000 (UTC) Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by gabe.freedesktop.org (Postfix) with ESMTPS id C590910E8EE; Fri, 18 Feb 2022 18:48:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1645210096; x=1676746096; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=A1Py057cgx8vK/xPqhYzp1XVx50RaQUDPsgswyL714g=; b=FgKPE69D9ZVjPm6jQbMewGFgfWyqpk2VFaNWhBLxftgDeSA3ayGAXBkb eL/3Rjk+iJPgjw+d4enf3d6LzjAMP+6RtkD+qD3p9YMqTIiZ57mRoZWFK 4sdWWfeNwmlls8AM1ak8Ex3ZcZXole0/4Q/NkJYh/DxOpYI3jdclZ9NAZ v5OfqmAKhuyAyzoZFVNeRPzI+oocAXK7cwNq5mBIAdoKhSkT3cdhsqaoL 7jgjVsGKG5LqZxSMidvlYrhwxn5FdqARWjxO+yJfCGBX845WWKXQFISNB UVnQVhI8zEt1rXWZYwY7jeJm81hHDr/xE5Zy6iJbGQFmWG9e983EYh6rx g==; X-IronPort-AV: E=McAfee;i="6200,9189,10262"; a="251388808" X-IronPort-AV: E=Sophos;i="5.88,379,1635231600"; d="scan'208";a="251388808" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Feb 2022 10:48:16 -0800 X-IronPort-AV: E=Sophos;i="5.88,379,1635231600"; d="scan'208";a="489642384" Received: from ramaling-i9x.iind.intel.com ([10.203.144.108]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Feb 2022 10:48:10 -0800 From: Ramalingam C To: intel-gfx , dri-devel Subject: [PATCH 12/15] drm/i915/uapi: document behaviour for DG2 64K support Date: Sat, 19 Feb 2022 00:17:49 +0530 Message-Id: <20220218184752.7524-13-ramalingam.c@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220218184752.7524-1-ramalingam.c@intel.com> References: <20220218184752.7524-1-ramalingam.c@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Tony Ye , =?utf-8?q?Thomas_Hellstr=C3=B6m?= , lucas.demarchi@intel.com, Kenneth Graunke , Slawomir Milczarek , Matthew Auld , Jordan Justen , mesa-dev@lists.freedesktop.org Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Matthew Auld On discrete platforms like DG2, we need to support a minimum page size of 64K when dealing with device local-memory. This is quite tricky for various reasons, so try to document the new implicit uapi for this. v4: Kdoc modification. v3: fix typos and less emphasis v2: Fixed suggestions on formatting [Daniel] Signed-off-by: Matthew Auld Signed-off-by: Ramalingam C Signed-off-by: Robert Beckett Acked-by: Jordan Justen Reviewed-by: Ramalingam C Reviewed-by: Thomas Hellström cc: Simon Ser cc: Pekka Paalanen Cc: Jordan Justen Cc: Kenneth Graunke Cc: mesa-dev@lists.freedesktop.org Cc: Tony Ye Cc: Slawomir Milczarek --- include/uapi/drm/i915_drm.h | 45 ++++++++++++++++++++++++++++++++----- 1 file changed, 40 insertions(+), 5 deletions(-) diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 914ebd9290e5..05c3642aaece 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -1118,10 +1118,16 @@ struct drm_i915_gem_exec_object2 { /** * When the EXEC_OBJECT_PINNED flag is specified this is populated by * the user with the GTT offset at which this object will be pinned. + * * When the I915_EXEC_NO_RELOC flag is specified this must contain the * presumed_offset of the object. + * * During execbuffer2 the kernel populates it with the value of the * current GTT offset of the object, for future presumed_offset writes. + * + * See struct drm_i915_gem_create_ext for the rules when dealing with + * alignment restrictions with I915_MEMORY_CLASS_DEVICE, on devices with + * minimum page sizes, like DG2. */ __u64 offset; @@ -3144,11 +3150,40 @@ struct drm_i915_gem_create_ext { * * The (page-aligned) allocated size for the object will be returned. * - * Note that for some devices we have might have further minimum - * page-size restrictions(larger than 4K), like for device local-memory. - * However in general the final size here should always reflect any - * rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS - * extension to place the object in device local-memory. + * + * DG2 64K min page size implications: + * + * On discrete platforms, starting from DG2, we have to contend with GTT + * page size restrictions when dealing with I915_MEMORY_CLASS_DEVICE + * objects. Specifically the hardware only supports 64K or larger GTT + * page sizes for such memory. The kernel will already ensure that all + * I915_MEMORY_CLASS_DEVICE memory is allocated using 64K or larger page + * sizes underneath. + * + * Note that the returned size here will always reflect any required + * rounding up done by the kernel, i.e 4K will now become 64K on devices + * such as DG2. + * + * Special DG2 GTT address alignment requirement: + * + * The GTT alignment will also need to be at least 2M for such objects. + * + * Note that due to how the hardware implements 64K GTT page support, we + * have some further complications: + * + * 1) The entire PDE (which covers a 2MB virtual address range), must + * contain only 64K PTEs, i.e mixing 4K and 64K PTEs in the same + * PDE is forbidden by the hardware. + * + * 2) We still need to support 4K PTEs for I915_MEMORY_CLASS_SYSTEM + * objects. + * + * To keep things simple for userland, we mandate that any GTT mappings + * must be aligned to and rounded up to 2MB. The kernel will internally + * pad them out to the next 2MB boundary. As this only wastes virtual + * address space and avoids userland having to copy any needlessly + * complicated PDE sharing scheme (coloring) and only affects DG2, this + * is deemed to be a good compromise. */ __u64 size; /** From patchwork Fri Feb 18 18:47:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ramalingam C X-Patchwork-Id: 12751802 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3DCF3C433EF for ; Fri, 18 Feb 2022 18:48:36 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 69E7410E99A; Fri, 18 Feb 2022 18:48:24 +0000 (UTC) Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by gabe.freedesktop.org (Postfix) with ESMTPS id 300A310E8FA; Fri, 18 Feb 2022 18:48:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1645210098; x=1676746098; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=2McoXFW4blWiG44CtZCyhsypC8+YJg8m7+56KYFgi8Y=; b=Wd0oTb3bDfrHGunpaKL6cBwYtl+e7G0byKGg/WwpHyvx45i+06VoPwts fwAbsWTPCgS7Z2IcP7knPRgh4VHDF0Bn8pqdGc3U7aaK5G63H3m0HDifd 6NehgC08xlY5YeUupTbMadU/0kohpbtj2eWgp4l6VE3BHt4/kg8zD9Mj9 Qi1ggcriXDxqMrGKNWxWoqupwronnj0H5JnjQaETeioS9TpDb3WiC70TQ 77wk1EKEmAL5zlthpfnubFQQ/u/auwVQ7DgDi0b0Lo9XaeMrEzQmGb3qt sROtIYYft/9KHvsQq3yLVMvrtYeM3PMQDWImFb5gxEFV/hVpypVQe8lxK Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10262"; a="251388815" X-IronPort-AV: E=Sophos;i="5.88,379,1635231600"; d="scan'208";a="251388815" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Feb 2022 10:48:17 -0800 X-IronPort-AV: E=Sophos;i="5.88,379,1635231600"; d="scan'208";a="489642424" Received: from ramaling-i9x.iind.intel.com ([10.203.144.108]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Feb 2022 10:48:15 -0800 From: Ramalingam C To: intel-gfx , dri-devel Subject: [PATCH 13/15] drm/i915/xehpsdv: Add has_flat_ccs to device info Date: Sat, 19 Feb 2022 00:17:50 +0530 Message-Id: <20220218184752.7524-14-ramalingam.c@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220218184752.7524-1-ramalingam.c@intel.com> References: <20220218184752.7524-1-ramalingam.c@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: lucas.demarchi@intel.com, CQ Tang , Matthew Auld Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: CQ Tang Platforms of XeHP and beyond support 3D surface (buffer) compression and various compression formats. This is accomplished by an additional compression control state (CCS) stored for each surface. Gen 12 devices(TGL family and DG1) stores compression states in a separate region of memory. It is managed by user-space and has an associated set of user-space managed page tables used by hardware for address translation. In Xe HP and beyond (XEHPSDV, DG2, etc), there is a new feature introduced i.e Flat CCS. It replaced AUX page tables with a flat indexed region of device memory for storing compression states. Cc: Joonas Lahtinen Cc: Matthew Auld Signed-off-by: CQ Tang Signed-off-by: Ramalingam C Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/i915_drv.h | 6 ++++++ drivers/gpu/drm/i915/i915_pci.c | 1 + drivers/gpu/drm/i915/intel_device_info.h | 1 + 3 files changed, 8 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 4a3ac66e777a..1c2f4ae4ebf9 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1356,6 +1356,12 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915, #define HAS_REGION(i915, i) (INTEL_INFO(i915)->memory_regions & (i)) #define HAS_LMEM(i915) HAS_REGION(i915, REGION_LMEM) +/* + * Platform has the dedicated compression control state for each lmem surfaces + * stored in lmem to support the 3D and media compression formats. + */ +#define HAS_FLAT_CCS(dev_priv) (INTEL_INFO(dev_priv)->has_flat_ccs) + #define HAS_GT_UC(dev_priv) (INTEL_INFO(dev_priv)->has_gt_uc) #define HAS_POOLED_EU(dev_priv) (INTEL_INFO(dev_priv)->has_pooled_eu) diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c index 8df8887d76ae..f449c454b6f8 100644 --- a/drivers/gpu/drm/i915/i915_pci.c +++ b/drivers/gpu/drm/i915/i915_pci.c @@ -1005,6 +1005,7 @@ static const struct intel_device_info adl_p_info = { XE_HP_PAGE_SIZES, \ .dma_mask_size = 46, \ .has_64bit_reloc = 1, \ + .has_flat_ccs = 1, \ .has_global_mocs = 1, \ .has_gt_uc = 1, \ .has_llc = 1, \ diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h index f75673da768d..2508a47fb3f5 100644 --- a/drivers/gpu/drm/i915/intel_device_info.h +++ b/drivers/gpu/drm/i915/intel_device_info.h @@ -134,6 +134,7 @@ enum intel_ppgtt_type { func(needs_compact_pt); \ func(gpu_reset_clobbers_display); \ func(has_reset_engine); \ + func(has_flat_ccs); \ func(has_global_mocs); \ func(has_gt_uc); \ func(has_guc_deprivilege); \ From patchwork Fri Feb 18 18:47:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Ramalingam C X-Patchwork-Id: 12751801 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7B10FC433F5 for ; Fri, 18 Feb 2022 18:48:34 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 34AA410E685; Fri, 18 Feb 2022 18:48:24 +0000 (UTC) Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by gabe.freedesktop.org (Postfix) with ESMTPS id 0BF0C10E911; Fri, 18 Feb 2022 18:48:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1645210101; x=1676746101; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=QPwPhXlY6E60GxXCmIPmPxpHMmRfjfanJdvrFi/OmMg=; b=Wve9lVZgV6f4ACXZTqDQdmJXAxxvym/f/1glI7tVmRT1+AvvPWdg8Guk wLVtFYZhBeB5kiACsGEAj1fPP1e4B45D3z17hrjGUnW7QRw6b/kMN8ll8 8K0oLY2Ggi5qVjOdtorwAo4WlfgaocuCiduv4/PFlRnYz2+n9f3rBunBj fa4CdzYhRRU+OghzeDyJDgkDsKwqQioJoE79HOXjjkQs4dfS/VPQ3XE9C G/F3wU+aq/QsHd9pehuVd2sLy/fn93SmQL+pLJC951/c03ZbhSWOHGtYE kDmoYBRtZ54aV9i+EyxbsUfhehCCRF35enP9PklakT4E6jDNcoWq5uWGU w==; X-IronPort-AV: E=McAfee;i="6200,9189,10262"; a="251388829" X-IronPort-AV: E=Sophos;i="5.88,379,1635231600"; d="scan'208";a="251388829" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Feb 2022 10:48:20 -0800 X-IronPort-AV: E=Sophos;i="5.88,379,1635231600"; d="scan'208";a="489642465" Received: from ramaling-i9x.iind.intel.com ([10.203.144.108]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Feb 2022 10:48:18 -0800 From: Ramalingam C To: intel-gfx , dri-devel Subject: [PATCH 14/15] drm/i915/lmem: Enable lmem for platforms with Flat CCS Date: Sat, 19 Feb 2022 00:17:51 +0530 Message-Id: <20220218184752.7524-15-ramalingam.c@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220218184752.7524-1-ramalingam.c@intel.com> References: <20220218184752.7524-1-ramalingam.c@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Abdiel Janulgue , lucas.demarchi@intel.com, Matthew Auld Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Abdiel Janulgue A portion of device memory is reserved for Flat CCS so usable device memory will be reduced by size of Flat CCS. Size of Flat CCS is specified in “XEHPSDV_FLAT_CCS_BASE_ADDR”. So to get effective device memory we need to subtract total device memory by Flat CCS memory size. v2: Addressed the small bar related issue [Matt] Removed a reduntant check [Matt] v3: reg addr def is moved to intel_gt_regs.h [Lucas] removed a variable s/DRM_ERROR/drm_err [Lucas] Cc: Matthew Auld Signed-off-by: Abdiel Janulgue Signed-off-by: Ramalingam C Reviewed-by: Lucas De Marchi --- drivers/gpu/drm/i915/gt/intel_gt.c | 19 +++++++++++++++ drivers/gpu/drm/i915/gt/intel_gt.h | 1 + drivers/gpu/drm/i915/gt/intel_gt_regs.h | 3 +++ drivers/gpu/drm/i915/gt/intel_region_lmem.c | 26 +++++++++++++++++++-- 4 files changed, 47 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index e8403fa53909..2da7dd0f66d7 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -913,6 +913,25 @@ u32 intel_gt_read_register_fw(struct intel_gt *gt, i915_reg_t reg) return intel_uncore_read_fw(gt->uncore, reg); } +u32 intel_gt_read_register(struct intel_gt *gt, i915_reg_t reg) +{ + int type; + u8 sliceid, subsliceid; + + for (type = 0; type < NUM_STEERING_TYPES; type++) { + if (intel_gt_reg_needs_read_steering(gt, reg, type)) { + intel_gt_get_valid_steering(gt, type, &sliceid, + &subsliceid); + return intel_uncore_read_with_mcr_steering(gt->uncore, + reg, + sliceid, + subsliceid); + } + } + + return intel_uncore_read(gt->uncore, reg); +} + void intel_gt_info_print(const struct intel_gt_info *info, struct drm_printer *p) { diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h index 2dad46c3eff2..0f571c8ee22b 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.h +++ b/drivers/gpu/drm/i915/gt/intel_gt.h @@ -85,6 +85,7 @@ static inline bool intel_gt_needs_read_steering(struct intel_gt *gt, } u32 intel_gt_read_register_fw(struct intel_gt *gt, i915_reg_t reg); +u32 intel_gt_read_register(struct intel_gt *gt, i915_reg_t reg); void intel_gt_info_print(const struct intel_gt_info *info, struct drm_printer *p); diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h index bf4b942c62ee..935ba793a13b 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h @@ -906,6 +906,9 @@ #define XEHP_L3NODEARBCFG _MMIO(0xb0b4) #define XEHP_LNESPARE REG_BIT(19) +#define XEHPSDV_FLAT_CCS_BASE_ADDR _MMIO(0x4910) +#define XEHPSDV_CCS_BASE_SHIFT 8 + #define GEN8_L3SQCREG1 _MMIO(0xb100) /* * Note that on CHV the following has an off-by-one error wrt. to BSpec. diff --git a/drivers/gpu/drm/i915/gt/intel_region_lmem.c b/drivers/gpu/drm/i915/gt/intel_region_lmem.c index cb3f66707b21..f3f0ce2c553a 100644 --- a/drivers/gpu/drm/i915/gt/intel_region_lmem.c +++ b/drivers/gpu/drm/i915/gt/intel_region_lmem.c @@ -12,6 +12,7 @@ #include "gem/i915_gem_region.h" #include "gem/i915_gem_ttm.h" #include "gt/intel_gt.h" +#include "gt/intel_gt_regs.h" static int init_fake_lmem_bar(struct intel_memory_region *mem) { @@ -206,8 +207,29 @@ static struct intel_memory_region *setup_lmem(struct intel_gt *gt) if (!IS_DGFX(i915)) return ERR_PTR(-ENODEV); - /* Stolen starts from GSMBASE on DG1 */ - lmem_size = intel_uncore_read64(uncore, GEN12_GSMBASE); + if (HAS_FLAT_CCS(i915)) { + u64 tile_stolen, flat_ccs_base; + + lmem_size = pci_resource_len(pdev, 2); + flat_ccs_base = intel_gt_read_register(gt, XEHPSDV_FLAT_CCS_BASE_ADDR); + flat_ccs_base = (flat_ccs_base >> XEHPSDV_CCS_BASE_SHIFT) * SZ_64K; + + if (GEM_WARN_ON(lmem_size < flat_ccs_base)) + return ERR_PTR(-ENODEV); + + tile_stolen = lmem_size - flat_ccs_base; + + /* If the FLAT_CCS_BASE_ADDR register is not populated, flag an error */ + if (tile_stolen == lmem_size) + drm_err(&i915->drm, + "CCS_BASE_ADDR register did not have expected value\n"); + + lmem_size -= tile_stolen; + } else { + /* Stolen starts from GSMBASE without CCS */ + lmem_size = intel_uncore_read64(&i915->uncore, GEN12_GSMBASE); + } + io_start = pci_resource_start(pdev, 2); if (GEM_WARN_ON(lmem_size > pci_resource_len(pdev, 2))) From patchwork Fri Feb 18 18:47:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ramalingam C X-Patchwork-Id: 12751803 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E1D2CC433FE for ; Fri, 18 Feb 2022 18:48:37 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 5827D10E9A0; Fri, 18 Feb 2022 18:48:26 +0000 (UTC) Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by gabe.freedesktop.org (Postfix) with ESMTPS id B486310E8FA; Fri, 18 Feb 2022 18:48:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1645210103; x=1676746103; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=+NmTkdBXabwQBVcTobA69dhtxWgh2gLVuhu5fK1PJBk=; b=m24zFSnKls03rzZlGyXJGlTKV2hYTH+cDzn15yj36uTDZSMNLQv+N+VM VRd61KVtr+vDKKnnWPlJGqHd6ppa0Qh5igJsAFIVUb2gXov/AZKq6qz3c Xcj1XGw7JcCYiksgcUwmJUI3waIeP+/RYR1pbf6lfJYZf5st6oBMBxqY6 bGIc6H7LateC5vkxCTiEUgSbWoX6RGKR/f6zjmKYC7dDtFtx9vtvxuxrK JSWCA8+vRtnb5MUixaT7EucVgbNiu5ReEAhKTWdY/pXwD5U9gFgrWxibv RLuVh4/BU4dweZy39vHG14nOv7kjuf3n1NGeX5eaTthITyembtn1nvBfo Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10262"; a="251388847" X-IronPort-AV: E=Sophos;i="5.88,379,1635231600"; d="scan'208";a="251388847" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Feb 2022 10:48:23 -0800 X-IronPort-AV: E=Sophos;i="5.88,379,1635231600"; d="scan'208";a="489642494" Received: from ramaling-i9x.iind.intel.com ([10.203.144.108]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Feb 2022 10:48:20 -0800 From: Ramalingam C To: intel-gfx , dri-devel Subject: [PATCH 15/15] drm/i915/gt: Clear compress metadata for Xe_HP platforms Date: Sat, 19 Feb 2022 00:17:52 +0530 Message-Id: <20220218184752.7524-16-ramalingam.c@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220218184752.7524-1-ramalingam.c@intel.com> References: <20220218184752.7524-1-ramalingam.c@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: lucas.demarchi@intel.com, CQ Tang , Ayaz A Siddiqui Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Ayaz A Siddiqui Xe-HP and latest devices support Flat CCS which reserved a portion of the device memory to store compression metadata, during the clearing of device memory buffer object we also need to clear the associated CCS buffer. Flat CCS memory can not be directly accessed by S/W. Address of CCS buffer associated main BO is automatically calculated by device itself. KMD/UMD can only access this buffer indirectly using XY_CTRL_SURF_COPY_BLT cmd via the address of device memory buffer. v2: Fixed issues with platform naming [Lucas] v3: Rebased [Ram] Used the round_up funcs [Bob] v4: Fixed ccs blk calculation [Ram] Added Kdoc on flat-ccs. Cc: CQ Tang Signed-off-by: Ayaz A Siddiqui Signed-off-by: Ramalingam C --- drivers/gpu/drm/i915/gt/intel_gpu_commands.h | 15 ++ drivers/gpu/drm/i915/gt/intel_migrate.c | 145 ++++++++++++++++++- 2 files changed, 156 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h index f8253012d166..166de5436c4a 100644 --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h @@ -203,6 +203,21 @@ #define GFX_OP_DRAWRECT_INFO ((0x3<<29)|(0x1d<<24)|(0x80<<16)|(0x3)) #define GFX_OP_DRAWRECT_INFO_I965 ((0x7900<<16)|0x2) +#define XY_CTRL_SURF_INSTR_SIZE 5 +#define MI_FLUSH_DW_SIZE 3 +#define XY_CTRL_SURF_COPY_BLT ((2 << 29) | (0x48 << 22) | 3) +#define SRC_ACCESS_TYPE_SHIFT 21 +#define DST_ACCESS_TYPE_SHIFT 20 +#define CCS_SIZE_SHIFT 8 +#define XY_CTRL_SURF_MOCS_SHIFT 25 +#define NUM_CCS_BYTES_PER_BLOCK 256 +#define NUM_BYTES_PER_CCS_BYTE 256 +#define NUM_CCS_BLKS_PER_XFER 1024 +#define INDIRECT_ACCESS 0 +#define DIRECT_ACCESS 1 +#define MI_FLUSH_LLC BIT(9) +#define MI_FLUSH_CCS BIT(16) + #define COLOR_BLT_CMD (2 << 29 | 0x40 << 22 | (5 - 2)) #define XY_COLOR_BLT_CMD (2 << 29 | 0x50 << 22) #define SRC_COPY_BLT_CMD (2 << 29 | 0x43 << 22) diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c index 20444d6ceb3c..9f9cd2649377 100644 --- a/drivers/gpu/drm/i915/gt/intel_migrate.c +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c @@ -16,6 +16,8 @@ struct insert_pte_data { }; #define CHUNK_SZ SZ_8M /* ~1ms at 8GiB/s preemption delay */ +#define GET_CCS_BYTES(i915, size) (HAS_FLAT_CCS(i915) ? \ + DIV_ROUND_UP(size, NUM_BYTES_PER_CCS_BYTE) : 0) static bool engine_supports_migration(struct intel_engine_cs *engine) { @@ -467,6 +469,113 @@ static bool wa_1209644611_applies(int ver, u32 size) return height % 4 == 3 && height <= 8; } +/** + * DOC: Flat-CCS - Memory compression for Local memory + * + * On Xe-HP and later devices, we use dedicated compression control state (CCS) + * stored in local memory for each surface, to support the 3D and media + * compression formats. + * + * The memory required for the CCS of the entire local memory is 1/256 of the + * local memory size. So before the kernel boot, the required memory is reserved + * for the CCS data and a secure register will be programmed with the CCS base + * address. + * + * Flat CCS data needs to be cleared when a lmem object is allocated. + * And CCS data can be copied in and out of CCS region through + * XY_CTRL_SURF_COPY_BLT. CPU can't access the CCS data directly. + * + * When we exaust the lmem, if the object's placements support smem, then we can + * directly decompress the compressed lmem object into smem and start using it + * from smem itself. + * + * But when we need to swapout the compressed lmem object into a smem region + * though objects' placement doesn't support smem, then we copy the lmem content + * as it is into smem region along with ccs data (using XY_CTRL_SURF_COPY_BLT). + * When the object is referred, lmem content will be swaped in along with + * restoration of the CCS data (using XY_CTRL_SURF_COPY_BLT) at corresponding + * location. + */ + +static inline u32 *i915_flush_dw(u32 *cmd, u64 dst, u32 flags) +{ + /* Mask the 3 LSB to use the PPGTT address space */ + *cmd++ = MI_FLUSH_DW | flags; + *cmd++ = lower_32_bits(dst); + *cmd++ = upper_32_bits(dst); + + return cmd; +} + +static u32 calc_ctrl_surf_instr_size(struct drm_i915_private *i915, int size) +{ + u32 num_cmds, num_blks, total_size; + + if (!GET_CCS_BYTES(i915, size)) + return 0; + + /* + * XY_CTRL_SURF_COPY_BLT transfers CCS in 256 byte + * blocks. one XY_CTRL_SURF_COPY_BLT command can + * trnasfer upto 1024 blocks. + */ + num_blks = DIV_ROUND_UP(GET_CCS_BYTES(i915, size), + NUM_CCS_BYTES_PER_BLOCK); + num_cmds = DIV_ROUND_UP(num_blks, NUM_CCS_BLKS_PER_XFER); + total_size = (XY_CTRL_SURF_INSTR_SIZE) * num_cmds; + + /* + * We need to add a flush before and after + * XY_CTRL_SURF_COPY_BLT + */ + total_size += 2 * MI_FLUSH_DW_SIZE; + return total_size; +} + +static u32 *_i915_ctrl_surf_copy_blt(u32 *cmd, u64 src_addr, u64 dst_addr, + u8 src_mem_access, u8 dst_mem_access, + int src_mocs, int dst_mocs, + u16 num_ccs_blocks) +{ + int i = num_ccs_blocks; + + /* + * The XY_CTRL_SURF_COPY_BLT instruction is used to copy the CCS + * data in and out of the CCS region. + * + * We can copy at most 1024 blocks of 256 bytes using one + * XY_CTRL_SURF_COPY_BLT instruction. + * + * In case we need to copy more than 1024 blocks, we need to add + * another instruction to the same batch buffer. + * + * 1024 blocks of 256 bytes of CCS represent a total 256KB of CCS. + * + * 256 KB of CCS represents 256 * 256 KB = 64 MB of LMEM. + */ + do { + /* + * We use logical AND with 1023 since the size field + * takes values which is in the range of 0 - 1023 + */ + *cmd++ = ((XY_CTRL_SURF_COPY_BLT) | + (src_mem_access << SRC_ACCESS_TYPE_SHIFT) | + (dst_mem_access << DST_ACCESS_TYPE_SHIFT) | + (((i - 1) & 1023) << CCS_SIZE_SHIFT)); + *cmd++ = lower_32_bits(src_addr); + *cmd++ = ((upper_32_bits(src_addr) & 0xFFFF) | + (src_mocs << XY_CTRL_SURF_MOCS_SHIFT)); + *cmd++ = lower_32_bits(dst_addr); + *cmd++ = ((upper_32_bits(dst_addr) & 0xFFFF) | + (dst_mocs << XY_CTRL_SURF_MOCS_SHIFT)); + src_addr += SZ_64M; + dst_addr += SZ_64M; + i -= NUM_CCS_BLKS_PER_XFER; + } while (i > 0); + + return cmd; +} + static int emit_copy(struct i915_request *rq, u32 dst_offset, u32 src_offset, int size) { @@ -614,16 +723,23 @@ intel_context_migrate_copy(struct intel_context *ce, return err; } -static int emit_clear(struct i915_request *rq, u64 offset, int size, u32 value) +static int emit_clear(struct i915_request *rq, u64 offset, int size, + u32 value, bool is_lmem) { - const int ver = GRAPHICS_VER(rq->engine->i915); + struct drm_i915_private *i915 = rq->engine->i915; + const int ver = GRAPHICS_VER(i915); + u32 num_ccs_blks, ccs_ring_size; u32 *cs; GEM_BUG_ON(size >> PAGE_SHIFT > S16_MAX); offset += (u64)rq->engine->instance << 32; - cs = intel_ring_begin(rq, ver >= 8 ? 8 : 6); + /* Clear flat css only when value is 0 */ + ccs_ring_size = (is_lmem && !value) ? + calc_ctrl_surf_instr_size(i915, size) : 0; + + cs = intel_ring_begin(rq, round_up(ver >= 8 ? 8 + ccs_ring_size : 6, 2)); if (IS_ERR(cs)) return PTR_ERR(cs); @@ -646,6 +762,27 @@ static int emit_clear(struct i915_request *rq, u64 offset, int size, u32 value) *cs++ = value; } + if (is_lmem && HAS_FLAT_CCS(i915) && !value) { + num_ccs_blks = DIV_ROUND_UP(GET_CCS_BYTES(i915, size), + NUM_CCS_BYTES_PER_BLOCK); + + /* + * Flat CCS surface can only be accessed via + * XY_CTRL_SURF_COPY_BLT CMD and using indirect + * mapping of associated LMEM. + * We can clear ccs surface by writing all 0s, + * so we will flush the previously cleared buffer + * and use it as a source. + */ + cs = i915_flush_dw(cs, offset, MI_FLUSH_LLC | MI_FLUSH_CCS); + cs = _i915_ctrl_surf_copy_blt(cs, offset, offset, + DIRECT_ACCESS, INDIRECT_ACCESS, + 1, 1, num_ccs_blks); + cs = i915_flush_dw(cs, offset, MI_FLUSH_LLC | MI_FLUSH_CCS); + + if (ccs_ring_size & 1) + *cs++ = MI_NOOP; + } intel_ring_advance(rq, cs); return 0; } @@ -711,7 +848,7 @@ intel_context_migrate_clear(struct intel_context *ce, if (err) goto out_rq; - err = emit_clear(rq, offset, len, value); + err = emit_clear(rq, offset, len, value, is_lmem); /* Arbitration is re-enabled between requests. */ out_rq: