From patchwork Tue Mar 21 17:09:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andi Shyti X-Patchwork-Id: 13182991 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id F2DDDC74A5B for ; Tue, 21 Mar 2023 17:13:21 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 3351810E0E5; Tue, 21 Mar 2023 17:13:21 +0000 (UTC) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by gabe.freedesktop.org (Postfix) with ESMTPS id ABAEE10E0E5; Tue, 21 Mar 2023 17:13:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679418799; x=1710954799; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=12D5X5s+1qw1W+tePxORrVcP9Ucznxxt2uEsp64GXLI=; b=CldyASPB/W7iJ3dvPAYBzNRQiHaMIJUzJTdr+v3MeIM8IIv6wfDd5zQ2 /qQFfkoB1kLaGxnK9afXxRD+THeyjyFfU9sKKQuIWTOnOWSOzNc7CXauK 7eqjUyijIRRXQWZPqaPa6tA/BbbrGJtPBjoMOFHfnkmDlLQge6tUPVf6H ousE7Cg0m2+DaAXrsGm4uUp3TYPGqVGWZmurqiSzNRMS25osRHOfqnfzD DqIZGQZQUfJX+q2hi5iKUwbpHdaEOjsATXWYJi3X9wOPCvM2Sy8laI3DV QAeGdy/mF7IFebhXExDsnaLEs/NAgVWmtEx/vBDjhsExKcG1DNuWUdU7J g==; X-IronPort-AV: E=McAfee;i="6600,9927,10656"; a="425290339" X-IronPort-AV: E=Sophos;i="5.98,279,1673942400"; d="scan'208";a="425290339" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Mar 2023 10:10:19 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10656"; a="745943987" X-IronPort-AV: E=Sophos;i="5.98,279,1673942400"; d="scan'208";a="745943987" Received: from rbirkl-mobl.ger.corp.intel.com (HELO intel.com) ([10.251.222.70]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Mar 2023 10:10:17 -0700 From: Andi Shyti To: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, Matt Roper Date: Tue, 21 Mar 2023 18:09:35 +0100 Message-Id: <20230321170936.478631-2-andi.shyti@linux.intel.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230321170936.478631-1-andi.shyti@linux.intel.com> References: <20230321170936.478631-1-andi.shyti@linux.intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v2 1/2] drm/i915: Sanitycheck MMIO access early in driver load X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andi Shyti Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" From: Matt Roper We occasionally see the PCI device in a non-accessible state at the point the driver is loaded. When this happens, all BAR accesses will read back as 0xFFFFFFFF. Rather than reading registers and misinterpreting their (invalid) values, let's specifically check for 0xFFFFFFFF in a register that cannot have that value to see if the device is accessible. Signed-off-by: Matt Roper Cc: Mika Kuoppala Signed-off-by: Andi Shyti Reviewed-by: Andi Shyti --- drivers/gpu/drm/i915/intel_uncore.c | 35 +++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c index e1e1f34490c8e..0b69081d6d285 100644 --- a/drivers/gpu/drm/i915/intel_uncore.c +++ b/drivers/gpu/drm/i915/intel_uncore.c @@ -2602,11 +2602,46 @@ static int uncore_forcewake_init(struct intel_uncore *uncore) return 0; } +static int sanity_check_mmio_access(struct intel_uncore *uncore) +{ + struct drm_i915_private *i915 = uncore->i915; + int ret; + + if (GRAPHICS_VER(i915) < 8) + return 0; + + /* + * Sanitycheck that MMIO access to the device is working properly. If + * the CPU is unable to communcate with a PCI device, BAR reads will + * return 0xFFFFFFFF. Let's make sure the device isn't in this state + * before we start trying to access registers. + * + * We use the primary GT's forcewake register as our guinea pig since + * it's been around since HSW and it's a masked register so the upper + * 16 bits can never read back as 1's if device access is operating + * properly. + * + * If MMIO isn't working, we'll wait up to 2 seconds to see if it + * recovers, then give up. + */ + ret = intel_wait_for_register_fw(uncore, FORCEWAKE_MT, 0, 0, 2000000); + if (ret == -ETIMEDOUT) { + drm_err(&i915->drm, "Device is non-operational; MMIO access returns 0xFFFFFFFF!\n"); + return -EIO; + } + + return 0; +} + int intel_uncore_init_mmio(struct intel_uncore *uncore) { struct drm_i915_private *i915 = uncore->i915; int ret; + ret = sanity_check_mmio_access(uncore); + if (ret) + return ret; + /* * The boot firmware initializes local memory and assesses its health. * If memory training fails, the punit will have been instructed to From patchwork Tue Mar 21 17:09:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andi Shyti X-Patchwork-Id: 13182992 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9B741C74A5B for ; Tue, 21 Mar 2023 17:13:25 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 2726E10E242; Tue, 21 Mar 2023 17:13:23 +0000 (UTC) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by gabe.freedesktop.org (Postfix) with ESMTPS id 5A25210E0E5; Tue, 21 Mar 2023 17:13:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679418800; x=1710954800; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=8buKxJBw0bIXoKN08CE5SXbtFLq5tdE6W+113z8jGLI=; b=DDHomeqyTOupWOCriGr0Izh3WroynaQnPAuU8pZg8A2eRNUBbtHPETXg pya8tsLEYa2wSZXrKIrMNE3mZJEwQh3C9BlPE13x3wb93SMqxt5Aj8R3t gSNqG2l4wZ28g9HQlCwacTRnRmkz1nQ2KqN/+KRqkL57XyPMnjbME0/Ut /LEqiPSLaikpkxpak9o906G1Xb5QfIdLOs3bYSdqFne/StyUl1M60C/jK /Dph/By2uDuTfKCqUuyFFJ20g+mRUhy5gFGdgp6oB4mXBn+K7kZXiNKq8 1L405a31wV2zsOyCVHCVwX7fONI0w9sIvbERYpBiqWuomV8aaBhAmpM3O g==; X-IronPort-AV: E=McAfee;i="6600,9927,10656"; a="425290364" X-IronPort-AV: E=Sophos;i="5.98,279,1673942400"; d="scan'208";a="425290364" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Mar 2023 10:10:25 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10656"; a="745944124" X-IronPort-AV: E=Sophos;i="5.98,279,1673942400"; d="scan'208";a="745944124" Received: from rbirkl-mobl.ger.corp.intel.com (HELO intel.com) ([10.251.222.70]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Mar 2023 10:10:22 -0700 From: Andi Shyti To: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, Matt Roper Date: Tue, 21 Mar 2023 18:09:36 +0100 Message-Id: <20230321170936.478631-3-andi.shyti@linux.intel.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230321170936.478631-1-andi.shyti@linux.intel.com> References: <20230321170936.478631-1-andi.shyti@linux.intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v2 2/2] drm/i915: Check for unreliable MMIO during forcewake X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andi Shyti Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" From: Matt Roper Although we now sanitycheck MMIO access during driver load to make sure the MMIO BAR isn't returning all 0xFFFFFFFF, there have been a few cases where (temporarily?) unreliable MMIO access has happened after GPU resets or power events. We'll often notice this on our next GT register access since forcewake handling will fail; let's change our handling slightly so that when this happens we print a more meaningful message clarifying that the problem is the MMIO access, not forcewake specifically. Signed-off-by: Matt Roper Cc: Mika Kuoppala Signed-off-by: Andi Shyti Reviewed-by: Andi Shyti Reviewed-by: Andrzej Hajda --- drivers/gpu/drm/i915/intel_uncore.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c index 0b69081d6d285..303a5d38c93a5 100644 --- a/drivers/gpu/drm/i915/intel_uncore.c +++ b/drivers/gpu/drm/i915/intel_uncore.c @@ -178,9 +178,15 @@ static inline void fw_domain_wait_ack_clear(const struct intel_uncore_forcewake_domain *d) { if (wait_ack_clear(d, FORCEWAKE_KERNEL)) { - drm_err(&d->uncore->i915->drm, - "%s: timed out waiting for forcewake ack to clear.\n", - intel_uncore_forcewake_domain_to_str(d->id)); + if (fw_ack(d) == ~0) + drm_err(&d->uncore->i915->drm, + "%s: MMIO unreliable (forcewake register returns 0xFFFFFFFF)!\n", + intel_uncore_forcewake_domain_to_str(d->id)); + else + drm_err(&d->uncore->i915->drm, + "%s: timed out waiting for forcewake ack to clear.\n", + intel_uncore_forcewake_domain_to_str(d->id)); + add_taint_for_CI(d->uncore->i915, TAINT_WARN); /* CI now unreliable */ } }