From patchwork Fri Feb 17 02:24:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Harrison X-Patchwork-Id: 13144210 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id BD8DFC61DA4 for ; Fri, 17 Feb 2023 02:25:27 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id AF40B10E3C4; Fri, 17 Feb 2023 02:25:11 +0000 (UTC) Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by gabe.freedesktop.org (Postfix) with ESMTPS id 2C78910E1A3; Fri, 17 Feb 2023 02:25:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1676600704; x=1708136704; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=7J7lRFZPUr3BHW7/m3oEBmgCduD9y0HNKFUPix8KbTs=; b=fZXmIHOgfPIeqXI+vyz5Nu+a2Sp7EjhqrQspW6b4Na3jfu7FX6/gfb/i TcA7MymdsC861LC2LOjMPESGXsM/xFIIDbnC3S+XgRHAZLoiUpVS3fOMX qb0ZlRFsSl8q0rVXIA2FucwPqHCaf0DObJHddsljgUMhwTGy8hQioEb1f ACX/2t6PW2laomW1Gfguk2eyWWZ0RQehs/k0yIYfCcmAeGLK8u2hsU4+p jofoEgG3md6V4kIMXmWykE/PjgmqCROkpcoum+3EPzIhcFkn0db/wjTEl CkLV3EKj0i1hnyMrrvztaz6GYSxu2h7cM+V7F0YaDsQy0ef8nmxsyZPSo A==; X-IronPort-AV: E=McAfee;i="6500,9779,10623"; a="334093364" X-IronPort-AV: E=Sophos;i="5.97,304,1669104000"; d="scan'208";a="334093364" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Feb 2023 18:25:02 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10623"; a="670389085" X-IronPort-AV: E=Sophos;i="5.97,304,1669104000"; d="scan'208";a="670389085" Received: from relo-linux-5.jf.intel.com ([10.165.21.152]) by orsmga002.jf.intel.com with ESMTP; 16 Feb 2023 18:25:02 -0800 From: John.C.Harrison@Intel.com To: Intel-GFX@Lists.FreeDesktop.Org Subject: [PATCH v2 1/3] drm/i915/guc: Fix missing ecodes Date: Thu, 16 Feb 2023 18:24:18 -0800 Message-Id: <20230217022420.2664116-2-John.C.Harrison@Intel.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20230217022420.2664116-1-John.C.Harrison@Intel.com> References: <20230217022420.2664116-1-John.C.Harrison@Intel.com> MIME-Version: 1.0 Organization: Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Matthew Brost , Tvrtko Ursulin , Michael Cheng , Alan Previn , Matt Roper , Matthew Auld , Lucas De Marchi , Daniele Ceraolo Spurio , DRI-Devel@Lists.FreeDesktop.Org, Aravind Iddamsetty , Rodrigo Vivi , Umesh Nerlige Ramappa , John Harrison , Bruce Chang Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: John Harrison Error captures are tagged with an 'ecode'. This is a pseduo-unique magic number that is meant to distinguish similar seeming bugs with different underlying signatures. It is a combination of two ring state registers. Unfortunately, the register state being used is only valid in execlist mode. In GuC mode, the register state exists in a separate list of arbitrary register address/value pairs rather than the named entry structure. So, search through that list to find the two exciting registers and copy them over to the structure's named members. v2: if else if instead of if if (Alan) Signed-off-by: John Harrison Reviewed-by: Alan Previn Fixes: a6f0f9cf330a ("drm/i915/guc: Plumb GuC-capture into gpu_coredump") Cc: Alan Previn Cc: Umesh Nerlige Ramappa Cc: Lucas De Marchi Cc: Jani Nikula Cc: Joonas Lahtinen Cc: Rodrigo Vivi Cc: Tvrtko Ursulin Cc: Matt Roper Cc: Aravind Iddamsetty Cc: Michael Cheng Cc: Matthew Brost Cc: Bruce Chang Cc: Daniele Ceraolo Spurio Cc: Matthew Auld --- .../gpu/drm/i915/gt/uc/intel_guc_capture.c | 22 +++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c index 101d44de729b1..36196cbb24c6b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c @@ -1562,6 +1562,27 @@ int intel_guc_capture_print_engine_node(struct drm_i915_error_state_buf *ebuf, #endif //CONFIG_DRM_I915_CAPTURE_ERROR +static void guc_capture_find_ecode(struct intel_engine_coredump *ee) +{ + struct gcap_reg_list_info *reginfo; + struct guc_mmio_reg *regs; + i915_reg_t reg_ipehr = RING_IPEHR(0); + i915_reg_t reg_instdone = RING_INSTDONE(0); + int i; + + if (!ee->guc_capture_node) + return; + + reginfo = ee->guc_capture_node->reginfo + GUC_CAPTURE_LIST_TYPE_ENGINE_INSTANCE; + regs = reginfo->regs; + for (i = 0; i < reginfo->num_regs; i++) { + if (regs[i].offset == reg_ipehr.reg) + ee->ipehr = regs[i].value; + else if (regs[i].offset == reg_instdone.reg) + ee->instdone.instdone = regs[i].value; + } +} + void intel_guc_capture_free_node(struct intel_engine_coredump *ee) { if (!ee || !ee->guc_capture_node) @@ -1601,6 +1622,7 @@ void intel_guc_capture_get_matching_node(struct intel_gt *gt, list_del(&n->link); ee->guc_capture_node = n; ee->guc_capture = guc->capture; + guc_capture_find_ecode(ee); return; } } From patchwork Fri Feb 17 02:24:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Harrison X-Patchwork-Id: 13144209 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C882BC636CC for ; Fri, 17 Feb 2023 02:25:21 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 8761910E3BB; Fri, 17 Feb 2023 02:25:06 +0000 (UTC) Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by gabe.freedesktop.org (Postfix) with ESMTPS id 5D6A210E195; Fri, 17 Feb 2023 02:25:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1676600704; x=1708136704; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=9xJ78ZPjSwZj2s9QdeqUFq5jM7hTALRDfyEnmi9fAbM=; b=nUb2nnGpawPFVP0ROiLzwl7fE0OKoTXtxa18gLr7aKI4AUzdZo0g3Zl1 1PtSOhAQL2XqBV9g8fprt2whzC1jDN6se5VOZhuLNiXDrlX4T5rIto6S4 sjiOD+8GA0Cn4stdl26ZAfoOYeBO58QHIaStbncOEsw6ZhhTkucI+xIsK MW1gaeJOjclFUgKi8evXXcAW3VKKpU4Sk/b5Bhy1wc38FypF5EBGPwRzc Xph7XWIYO+i2EzwbtFxtNfSRBlX19Kqdkz2tFgiThw4NSNMaQ+XL0NIJN 0L1LzQ+9NZA4LwayhiGzcTVeqcxfFWFuX4noohxDs+IZ1Wl9eeOwewg+V w==; X-IronPort-AV: E=McAfee;i="6500,9779,10623"; a="334093365" X-IronPort-AV: E=Sophos;i="5.97,304,1669104000"; d="scan'208";a="334093365" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Feb 2023 18:25:02 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10623"; a="670389089" X-IronPort-AV: E=Sophos;i="5.97,304,1669104000"; d="scan'208";a="670389089" Received: from relo-linux-5.jf.intel.com ([10.165.21.152]) by orsmga002.jf.intel.com with ESMTP; 16 Feb 2023 18:25:02 -0800 From: John.C.Harrison@Intel.com To: Intel-GFX@Lists.FreeDesktop.Org Subject: [PATCH v2 2/3] drm/i915/guc: Clean up of register capture search Date: Thu, 16 Feb 2023 18:24:19 -0800 Message-Id: <20230217022420.2664116-3-John.C.Harrison@Intel.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20230217022420.2664116-1-John.C.Harrison@Intel.com> References: <20230217022420.2664116-1-John.C.Harrison@Intel.com> MIME-Version: 1.0 Organization: Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alan Previn , John Harrison , DRI-Devel@Lists.FreeDesktop.Org Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: John Harrison The comparison in the search for a matching register capture node was not the most readable. It was also assuming that a zero GuC id means invalid, which it does not. So remove one invalid term, one redundant term and re-format to keep each term on a single line, and only one term per line. Signed-off-by: John Harrison Reviewed-by: Alan Previn --- drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c index 36196cbb24c6b..cf49188db6a6e 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c @@ -1616,9 +1616,8 @@ void intel_guc_capture_get_matching_node(struct intel_gt *gt, list_for_each_entry_safe(n, ntmp, &guc->capture->outlist, link) { if (n->eng_inst == GUC_ID_TO_ENGINE_INSTANCE(ee->engine->guc_id) && n->eng_class == GUC_ID_TO_ENGINE_CLASS(ee->engine->guc_id) && - n->guc_id && n->guc_id == ce->guc_id.id && - (n->lrca & CTX_GTT_ADDRESS_MASK) && (n->lrca & CTX_GTT_ADDRESS_MASK) == - (ce->lrc.lrca & CTX_GTT_ADDRESS_MASK)) { + n->guc_id == ce->guc_id.id && + (n->lrca & CTX_GTT_ADDRESS_MASK) == (ce->lrc.lrca & CTX_GTT_ADDRESS_MASK)) { list_del(&n->link); ee->guc_capture_node = n; ee->guc_capture = guc->capture; From patchwork Fri Feb 17 02:24:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Harrison X-Patchwork-Id: 13144207 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id F3E73C636CC for ; Fri, 17 Feb 2023 02:25:06 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id C522310E195; Fri, 17 Feb 2023 02:25:05 +0000 (UTC) Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by gabe.freedesktop.org (Postfix) with ESMTPS id 922E910E1A3; Fri, 17 Feb 2023 02:25:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1676600704; x=1708136704; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=PCzGL8HyoQjOxs2uE0Tn4vpdvjpxrJIyAJdOPd2gJAo=; b=Vl0UpMmZC9Wl/au4ltYrCF+yNr6NA4OviHy4YCb7B7dosB3NDS7YX8td b57a7OhI+sGY5gDEfRuew21Zu1ufTwak27GGJ40CJNmfGbqitdOUFp4nN u1G9j6HuP0R8qTYT9LtNA574SJxd3hBQSiJgKyFGBx/mYPzn+r4UBtvcQ XisDYhAdM+0OayqgXo4Et4dTPT5ygH3S9BEWgH8l4mYx+za9cW32OqAqH b7tNRE3EPKCqqRBp1NdVmDMVEGJ/oW+gL12rKn4SZTLEys9NwW+FUstY9 +gYWjnyW+xY9y3Wf+46c1vzCpdErm/i6vz99JWrNMLmQewEOZV9x4/oh/ w==; X-IronPort-AV: E=McAfee;i="6500,9779,10623"; a="334093367" X-IronPort-AV: E=Sophos;i="5.97,304,1669104000"; d="scan'208";a="334093367" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Feb 2023 18:25:03 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10623"; a="670389092" X-IronPort-AV: E=Sophos;i="5.97,304,1669104000"; d="scan'208";a="670389092" Received: from relo-linux-5.jf.intel.com ([10.165.21.152]) by orsmga002.jf.intel.com with ESMTP; 16 Feb 2023 18:25:02 -0800 From: John.C.Harrison@Intel.com To: Intel-GFX@Lists.FreeDesktop.Org Subject: [PATCH v2 3/3] drm/i915: Include timeline seqno in error capture Date: Thu, 16 Feb 2023 18:24:20 -0800 Message-Id: <20230217022420.2664116-4-John.C.Harrison@Intel.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20230217022420.2664116-1-John.C.Harrison@Intel.com> References: <20230217022420.2664116-1-John.C.Harrison@Intel.com> MIME-Version: 1.0 Organization: Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: John Harrison , DRI-Devel@Lists.FreeDesktop.Org Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: John Harrison The seqno value actually written out to memory is no longer in the regular HWSP. Instead, it is now in its own private timeline buffer. Thus, it is no longer visible in an error capture. So, explicitly read the value and include that in the capture. Signed-off-by: John Harrison Reviewed-by: Alan Previn --- drivers/gpu/drm/i915/i915_gpu_error.c | 3 +++ drivers/gpu/drm/i915/i915_gpu_error.h | 1 + 2 files changed, 4 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index 904f21e1380cd..036a65c9cbf67 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -505,6 +505,7 @@ static void error_print_context(struct drm_i915_error_state_buf *m, header, ctx->comm, ctx->pid, ctx->sched_attr.priority, ctx->guilty, ctx->active, ctx->total_runtime, ctx->avg_runtime); + err_printf(m, " context timeline seqno %d\n", ctx->hwsp_seqno); } static struct i915_vma_coredump * @@ -1395,6 +1396,8 @@ static bool record_context(struct i915_gem_context_coredump *e, e->sched_attr = ctx->sched; e->guilty = atomic_read(&ctx->guilty_count); e->active = atomic_read(&ctx->active_count); + e->hwsp_seqno = (ce->timeline && ce->timeline->hwsp_seqno) ? + *ce->timeline->hwsp_seqno : ~0U; e->total_runtime = intel_context_get_total_runtime_ns(ce); e->avg_runtime = intel_context_get_avg_runtime_ns(ce); diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h index 56027ffbce51f..a91932cc65317 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.h +++ b/drivers/gpu/drm/i915/i915_gpu_error.h @@ -107,6 +107,7 @@ struct intel_engine_coredump { int active; int guilty; struct i915_sched_attr sched_attr; + u32 hwsp_seqno; } context; struct i915_vma_coredump *vma;