From patchwork Sat Mar 11 06:37:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Harrison X-Patchwork-Id: 13170642 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5503EC678D5 for ; Sat, 11 Mar 2023 06:38:22 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 5E0F410EA57; Sat, 11 Mar 2023 06:38:18 +0000 (UTC) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by gabe.freedesktop.org (Postfix) with ESMTPS id 1B7B310E0A1; Sat, 11 Mar 2023 06:38:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1678516696; x=1710052696; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=7J7lRFZPUr3BHW7/m3oEBmgCduD9y0HNKFUPix8KbTs=; b=Q9WOomSWm3qP+nRhKN3On4PNOafiyae+bRCBUrywFbDT8bTuOMnQ/y6l qgk1QwC+ApqSvo+pWUtmO+yDCIjfQZlUtYK+i3AVdkqqys7BKLhpFXrkd 72JPyQIXpvaINcvcVugfB30pljAEDDpK/b3jFo8yykqpgZSa/kclJv1+D mTG87wsyG422tMkLCsWVYcVmcSmQEroKd18Q3+hxQ36/4RyuswV/id27U JmtHuPa7LmwLpii4+0GOUJWEndFQHQ+1wHzf9YsOhJ6ICqBU6GUnImNUe jIZz+UJwrWjD5obplz/FxYTGLk/3PX3PKzhWPKNMrH3YGiYg5fc/Q65d6 A==; X-IronPort-AV: E=McAfee;i="6500,9779,10645"; a="335579552" X-IronPort-AV: E=Sophos;i="5.98,252,1673942400"; d="scan'208";a="335579552" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Mar 2023 22:38:15 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10645"; a="678086694" X-IronPort-AV: E=Sophos;i="5.98,252,1673942400"; d="scan'208";a="678086694" Received: from relo-linux-5.jf.intel.com ([10.165.21.152]) by orsmga002.jf.intel.com with ESMTP; 10 Mar 2023 22:38:15 -0800 From: John.C.Harrison@Intel.com To: Intel-GFX@Lists.FreeDesktop.Org Date: Fri, 10 Mar 2023 22:37:12 -0800 Message-Id: <20230311063714.570389-2-John.C.Harrison@Intel.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20230311063714.570389-1-John.C.Harrison@Intel.com> References: <20230311063714.570389-1-John.C.Harrison@Intel.com> MIME-Version: 1.0 Organization: Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ Subject: [Intel-gfx] [PATCH v3 1/3] drm/i915/guc: Fix missing ecodes X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Michael Cheng , Alan Previn , Matt Roper , Matthew Auld , Lucas De Marchi , DRI-Devel@Lists.FreeDesktop.Org, Rodrigo Vivi Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" From: John Harrison Error captures are tagged with an 'ecode'. This is a pseduo-unique magic number that is meant to distinguish similar seeming bugs with different underlying signatures. It is a combination of two ring state registers. Unfortunately, the register state being used is only valid in execlist mode. In GuC mode, the register state exists in a separate list of arbitrary register address/value pairs rather than the named entry structure. So, search through that list to find the two exciting registers and copy them over to the structure's named members. v2: if else if instead of if if (Alan) Signed-off-by: John Harrison Reviewed-by: Alan Previn Fixes: a6f0f9cf330a ("drm/i915/guc: Plumb GuC-capture into gpu_coredump") Cc: Alan Previn Cc: Umesh Nerlige Ramappa Cc: Lucas De Marchi Cc: Jani Nikula Cc: Joonas Lahtinen Cc: Rodrigo Vivi Cc: Tvrtko Ursulin Cc: Matt Roper Cc: Aravind Iddamsetty Cc: Michael Cheng Cc: Matthew Brost Cc: Bruce Chang Cc: Daniele Ceraolo Spurio Cc: Matthew Auld --- .../gpu/drm/i915/gt/uc/intel_guc_capture.c | 22 +++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c index 101d44de729b1..36196cbb24c6b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c @@ -1562,6 +1562,27 @@ int intel_guc_capture_print_engine_node(struct drm_i915_error_state_buf *ebuf, #endif //CONFIG_DRM_I915_CAPTURE_ERROR +static void guc_capture_find_ecode(struct intel_engine_coredump *ee) +{ + struct gcap_reg_list_info *reginfo; + struct guc_mmio_reg *regs; + i915_reg_t reg_ipehr = RING_IPEHR(0); + i915_reg_t reg_instdone = RING_INSTDONE(0); + int i; + + if (!ee->guc_capture_node) + return; + + reginfo = ee->guc_capture_node->reginfo + GUC_CAPTURE_LIST_TYPE_ENGINE_INSTANCE; + regs = reginfo->regs; + for (i = 0; i < reginfo->num_regs; i++) { + if (regs[i].offset == reg_ipehr.reg) + ee->ipehr = regs[i].value; + else if (regs[i].offset == reg_instdone.reg) + ee->instdone.instdone = regs[i].value; + } +} + void intel_guc_capture_free_node(struct intel_engine_coredump *ee) { if (!ee || !ee->guc_capture_node) @@ -1601,6 +1622,7 @@ void intel_guc_capture_get_matching_node(struct intel_gt *gt, list_del(&n->link); ee->guc_capture_node = n; ee->guc_capture = guc->capture; + guc_capture_find_ecode(ee); return; } } From patchwork Sat Mar 11 06:37:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Harrison X-Patchwork-Id: 13170641 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id CE421C61DA4 for ; Sat, 11 Mar 2023 06:38:18 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id D862810E09D; Sat, 11 Mar 2023 06:38:17 +0000 (UTC) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by gabe.freedesktop.org (Postfix) with ESMTPS id 3BD3010E09D; Sat, 11 Mar 2023 06:38:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1678516696; x=1710052696; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=9xJ78ZPjSwZj2s9QdeqUFq5jM7hTALRDfyEnmi9fAbM=; b=jjDpuox6jjcwYcKY5CJgOj3o9OsJ3yxYsn6SEuBgnGP7FQxdHWifsbKe C55frbMI5Q0xoSx9840OwYY1q3Fu3ZxboggcvXkhAc6RdGyNcH2o7g+Wt r9Hh4sJSIRdPzNrTnvRGS3/7qefr+0LxKVlEb2XrTJqgdkVbZa685DUmu EE6E/x4aycKoqCMeZmdaRKfDfQ9MmiSVOSr9foLbXcVo2ku6sfSOKVKyy /W9u8hHu8O2TIdxiqRVP5l8rklsDPxook3wQopi4fsGMJk9cCNSk3GVij bxbqEjtwyBCeabsYjU5tX+7y7eQP/RhCxUszbGx4qNKwWCj2zHdVpTbeT g==; X-IronPort-AV: E=McAfee;i="6500,9779,10645"; a="335579553" X-IronPort-AV: E=Sophos;i="5.98,252,1673942400"; d="scan'208";a="335579553" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Mar 2023 22:38:15 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10645"; a="678086697" X-IronPort-AV: E=Sophos;i="5.98,252,1673942400"; d="scan'208";a="678086697" Received: from relo-linux-5.jf.intel.com ([10.165.21.152]) by orsmga002.jf.intel.com with ESMTP; 10 Mar 2023 22:38:15 -0800 From: John.C.Harrison@Intel.com To: Intel-GFX@Lists.FreeDesktop.Org Date: Fri, 10 Mar 2023 22:37:13 -0800 Message-Id: <20230311063714.570389-3-John.C.Harrison@Intel.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20230311063714.570389-1-John.C.Harrison@Intel.com> References: <20230311063714.570389-1-John.C.Harrison@Intel.com> MIME-Version: 1.0 Organization: Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ Subject: [Intel-gfx] [PATCH v3 2/3] drm/i915/guc: Clean up of register capture search X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alan Previn , DRI-Devel@Lists.FreeDesktop.Org Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" From: John Harrison The comparison in the search for a matching register capture node was not the most readable. It was also assuming that a zero GuC id means invalid, which it does not. So remove one invalid term, one redundant term and re-format to keep each term on a single line, and only one term per line. Signed-off-by: John Harrison Reviewed-by: Alan Previn --- drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c index 36196cbb24c6b..cf49188db6a6e 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c @@ -1616,9 +1616,8 @@ void intel_guc_capture_get_matching_node(struct intel_gt *gt, list_for_each_entry_safe(n, ntmp, &guc->capture->outlist, link) { if (n->eng_inst == GUC_ID_TO_ENGINE_INSTANCE(ee->engine->guc_id) && n->eng_class == GUC_ID_TO_ENGINE_CLASS(ee->engine->guc_id) && - n->guc_id && n->guc_id == ce->guc_id.id && - (n->lrca & CTX_GTT_ADDRESS_MASK) && (n->lrca & CTX_GTT_ADDRESS_MASK) == - (ce->lrc.lrca & CTX_GTT_ADDRESS_MASK)) { + n->guc_id == ce->guc_id.id && + (n->lrca & CTX_GTT_ADDRESS_MASK) == (ce->lrc.lrca & CTX_GTT_ADDRESS_MASK)) { list_del(&n->link); ee->guc_capture_node = n; ee->guc_capture = guc->capture; From patchwork Sat Mar 11 06:37:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Harrison X-Patchwork-Id: 13170644 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 66FF0C6FD1E for ; Sat, 11 Mar 2023 06:38:25 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 3853B10EA52; Sat, 11 Mar 2023 06:38:18 +0000 (UTC) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by gabe.freedesktop.org (Postfix) with ESMTPS id 5C19610E0D1; Sat, 11 Mar 2023 06:38:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1678516696; x=1710052696; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=d8h4HW5/JM9xOCNH7wDZWckq71z5KOcSv3cidNaFdQo=; b=QMgk3a0qGgtcrwV+jsRSB90Ipl/P5n2DmP1LU/sEOmxdh7lpMWkc1zhi UEp5mbBmqcekjP2HxCrcC64TNt4OleogzdUlEUi3l9019wF/aXhksFp4V mGcWpLbwc2b95Xrw5D8bpsWpXVMX0R4W9vdagM3UCIA/00qTRb+yfe2rO gHnCOxc5iO5kRW16uNsZ/elBwRcpZYIxkEtG3S+lTAAaxGNryVcNfvrLk 2sWxANChx6Cp1LwIeFVYcsbUrRlWrSzuAhzcCGZAPqm5WRJ9GpMEgOlYz qsFgaTNiPMQ9Dvq+7G3LWU8R9sLnJch4QrFo45cA0x9iW6027MvpwGApX A==; X-IronPort-AV: E=McAfee;i="6500,9779,10645"; a="335579554" X-IronPort-AV: E=Sophos;i="5.98,252,1673942400"; d="scan'208";a="335579554" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Mar 2023 22:38:15 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10645"; a="678086700" X-IronPort-AV: E=Sophos;i="5.98,252,1673942400"; d="scan'208";a="678086700" Received: from relo-linux-5.jf.intel.com ([10.165.21.152]) by orsmga002.jf.intel.com with ESMTP; 10 Mar 2023 22:38:15 -0800 From: John.C.Harrison@Intel.com To: Intel-GFX@Lists.FreeDesktop.Org Date: Fri, 10 Mar 2023 22:37:14 -0800 Message-Id: <20230311063714.570389-4-John.C.Harrison@Intel.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20230311063714.570389-1-John.C.Harrison@Intel.com> References: <20230311063714.570389-1-John.C.Harrison@Intel.com> MIME-Version: 1.0 Organization: Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ Subject: [Intel-gfx] [PATCH v3 3/3] drm/i915: Include timeline seqno in error capture X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alan Previn , DRI-Devel@Lists.FreeDesktop.Org Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" From: John Harrison The seqno value actually written out to memory is no longer in the regular HWSP. Instead, it is now in its own private timeline buffer. Thus, it is no longer visible in an error capture. So, explicitly read the value and include that in the capture. v2: %d -> %u (Alan) Signed-off-by: John Harrison Reviewed-by: Alan Previn --- drivers/gpu/drm/i915/i915_gpu_error.c | 3 +++ drivers/gpu/drm/i915/i915_gpu_error.h | 1 + 2 files changed, 4 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index 904f21e1380cd..f020c0086fbcd 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -505,6 +505,7 @@ static void error_print_context(struct drm_i915_error_state_buf *m, header, ctx->comm, ctx->pid, ctx->sched_attr.priority, ctx->guilty, ctx->active, ctx->total_runtime, ctx->avg_runtime); + err_printf(m, " context timeline seqno %u\n", ctx->hwsp_seqno); } static struct i915_vma_coredump * @@ -1395,6 +1396,8 @@ static bool record_context(struct i915_gem_context_coredump *e, e->sched_attr = ctx->sched; e->guilty = atomic_read(&ctx->guilty_count); e->active = atomic_read(&ctx->active_count); + e->hwsp_seqno = (ce->timeline && ce->timeline->hwsp_seqno) ? + *ce->timeline->hwsp_seqno : ~0U; e->total_runtime = intel_context_get_total_runtime_ns(ce); e->avg_runtime = intel_context_get_avg_runtime_ns(ce); diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h index 56027ffbce51f..a91932cc65317 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.h +++ b/drivers/gpu/drm/i915/i915_gpu_error.h @@ -107,6 +107,7 @@ struct intel_engine_coredump { int active; int guilty; struct i915_sched_attr sched_attr; + u32 hwsp_seqno; } context; struct i915_vma_coredump *vma;