From patchwork Fri Sep 20 03:19:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Harrison X-Patchwork-Id: 13808089 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 97D0CCF58C3 for ; Fri, 20 Sep 2024 03:20:18 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 22E5210E786; Fri, 20 Sep 2024 03:20:10 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="hrDsqoSN"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) by gabe.freedesktop.org (Postfix) with ESMTPS id 05C1010E77B; Fri, 20 Sep 2024 03:20:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1726802408; x=1758338408; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=HZeQs53wm6ed54JkpA6N1W85jW5iKCuP6XPXOUbb8nw=; b=hrDsqoSNtGS/nPfe70fsSSmAfZ3+VhTZvoINL6TYLgBcL1WLgsmIFJ33 08mmIOSFBIcdUipeuG0ob0GRMP2JWXlAGRqeBuMXiRs0Wuzbq5+HXwD2T LKMKv4X5uxOg1jNC/Sm6q0/3csJKnqtyJUuhmN6mE2MGpDS13p9KXJJ3s spFFdqTN3uTtMqmJQwG+PaYzGNGkgURNGZHRDUSyFRncvERj64dy3tzUK DqAY14w5Z1mi9+xIUfcGCRHMti2GQCDJLOiz2xSx44E1nJEZ0rCXLSo+2 WJOxdKz5DSuv4ivG8uKOtleWu64fE/+oqD6m2GWNHu8pzw6IqiIlPQ0gH A==; X-CSE-ConnectionGUID: EGB1lWYVREyzzYL/k8zv7Q== X-CSE-MsgGUID: ypQamg57QLG7zKqwYVkL2w== X-IronPort-AV: E=McAfee;i="6700,10204,11200"; a="25269797" X-IronPort-AV: E=Sophos;i="6.10,243,1719903600"; d="scan'208";a="25269797" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Sep 2024 20:20:07 -0700 X-CSE-ConnectionGUID: 2OtUVmhJRTm8SXq58GXczg== X-CSE-MsgGUID: yzS07JVOSsmUhjbD2MRlUA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,243,1719903600"; d="scan'208";a="69746170" Received: from relo-linux-5.jf.intel.com ([10.165.21.152]) by fmviesa007.fm.intel.com with ESMTP; 19 Sep 2024 20:20:07 -0700 From: John.C.Harrison@Intel.com To: Intel-GFX@Lists.FreeDesktop.Org Cc: DRI-Devel@Lists.FreeDesktop.Org, John Harrison , Michal Wajdeczko , Julia Filipchuk Subject: [PATCH v8 01/11] drm/xe/guc: Remove spurious line feed in debug print Date: Thu, 19 Sep 2024 20:19:56 -0700 Message-ID: <20240920032007.629624-2-John.C.Harrison@Intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20240920032007.629624-1-John.C.Harrison@Intel.com> References: <20240920032007.629624-1-John.C.Harrison@Intel.com> MIME-Version: 1.0 Organization: Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: John Harrison Including line feeds at the start of a debug print messes up the output when sent to dmesg. The break appears between all the useful prefix information and the actual string being printed. In this case, each block of data has a very clear start line and an extra delimeter is really not necessary. So don't do it. v2: Fix typo in commit message (review feedback from Michal W.) Signed-off-by: John Harrison Reviewed-by: Michal Wajdeczko Reviewed-by: Julia Filipchuk --- drivers/gpu/drm/xe/xe_guc_ct.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c index 4b95f75b1546..a63fe0a9077a 100644 --- a/drivers/gpu/drm/xe/xe_guc_ct.c +++ b/drivers/gpu/drm/xe/xe_guc_ct.c @@ -1523,7 +1523,7 @@ void xe_guc_ct_snapshot_print(struct xe_guc_ct_snapshot *snapshot, drm_puts(p, "H2G CTB (all sizes in DW):\n"); guc_ctb_snapshot_print(&snapshot->h2g, p); - drm_puts(p, "\nG2H CTB (all sizes in DW):\n"); + drm_puts(p, "G2H CTB (all sizes in DW):\n"); guc_ctb_snapshot_print(&snapshot->g2h, p); drm_printf(p, "\tg2h outstanding: %d\n", From patchwork Fri Sep 20 03:19:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Harrison X-Patchwork-Id: 13808091 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D4143CF58C5 for ; Fri, 20 Sep 2024 03:20:20 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 3777D10E789; Fri, 20 Sep 2024 03:20:12 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="d1eneOB6"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) by gabe.freedesktop.org (Postfix) with ESMTPS id 2C17810E6FD; Fri, 20 Sep 2024 03:20:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1726802408; x=1758338408; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=d0jz/sNpxhur+Y9GdZN/bJoyTMBOCVGBWF6yDOExCi0=; b=d1eneOB6AMduPHO3VmLRwb3K1Ne59sGb2cuUSMwLoQTaiynxdJiIyspd k0KzlCeNNRiBXDR6tSV2BzO5DhQk4z4oefUhJlY8m9LtaX7wTC02xBHdG IMFTrXy7frogSRLI5M4KE7mvbZzVLIdZKcSM/AjIKSXpgL98IxOfg7JRU NXFsAaSr4XNCpU3hXqEO/BHaLq6CXah8ur0Dh6GjlcouM8LC7r8tP/VtB drMM7oPOI2GU6q0HvpgHGWpIKHGynfRobzi24D2FtLD2OHGOeIqU9jnzU sdCTruj+2GX3e9GjBo+/h4m/YWemeqCkFwwcnENjvEcJhShLyiKfgGKDz Q==; X-CSE-ConnectionGUID: Rx1YiTwrQJuciMNhxtnQuw== X-CSE-MsgGUID: he+YD2H+Q1+OF0SCUYkIdA== X-IronPort-AV: E=McAfee;i="6700,10204,11200"; a="25269798" X-IronPort-AV: E=Sophos;i="6.10,243,1719903600"; d="scan'208";a="25269798" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Sep 2024 20:20:08 -0700 X-CSE-ConnectionGUID: NCN/zkzMSwirEMksVObFJg== X-CSE-MsgGUID: 5jwMRB4xRnKQdCHKi4KjEw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,243,1719903600"; d="scan'208";a="69746173" Received: from relo-linux-5.jf.intel.com ([10.165.21.152]) by fmviesa007.fm.intel.com with ESMTP; 19 Sep 2024 20:20:07 -0700 From: John.C.Harrison@Intel.com To: Intel-GFX@Lists.FreeDesktop.Org Cc: DRI-Devel@Lists.FreeDesktop.Org, John Harrison Subject: [PATCH v8 02/11] drm/xe/devcoredump: Use drm_puts and already cached local variables Date: Thu, 19 Sep 2024 20:19:57 -0700 Message-ID: <20240920032007.629624-3-John.C.Harrison@Intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20240920032007.629624-1-John.C.Harrison@Intel.com> References: <20240920032007.629624-1-John.C.Harrison@Intel.com> MIME-Version: 1.0 Organization: Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: John Harrison There are a bunch of calls to drm_printf with static strings. Switch them to drm_puts instead. There are also a bunch of 'coredump->snapshot.XXX' references when 'coredump->snapshot' has alread been cached locally as 'ss'. So use 'ss->XXX' instead. Signed-off-by: John Harrison --- drivers/gpu/drm/xe/xe_devcoredump.c | 40 ++++++++++++++--------------- 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c b/drivers/gpu/drm/xe/xe_devcoredump.c index bdb76e834e4c..d23719d5c2a3 100644 --- a/drivers/gpu/drm/xe/xe_devcoredump.c +++ b/drivers/gpu/drm/xe/xe_devcoredump.c @@ -85,9 +85,9 @@ static ssize_t __xe_devcoredump_read(char *buffer, size_t count, p = drm_coredump_printer(&iter); - drm_printf(&p, "**** Xe Device Coredump ****\n"); - drm_printf(&p, "kernel: " UTS_RELEASE "\n"); - drm_printf(&p, "module: " KBUILD_MODNAME "\n"); + drm_puts(&p, "**** Xe Device Coredump ****\n"); + drm_puts(&p, "kernel: " UTS_RELEASE "\n"); + drm_puts(&p, "module: " KBUILD_MODNAME "\n"); ts = ktime_to_timespec64(ss->snapshot_time); drm_printf(&p, "Snapshot time: %lld.%09ld\n", ts.tv_sec, ts.tv_nsec); @@ -96,20 +96,20 @@ static ssize_t __xe_devcoredump_read(char *buffer, size_t count, drm_printf(&p, "Process: %s\n", ss->process_name); xe_device_snapshot_print(xe, &p); - drm_printf(&p, "\n**** GuC CT ****\n"); - xe_guc_ct_snapshot_print(coredump->snapshot.ct, &p); - xe_guc_exec_queue_snapshot_print(coredump->snapshot.ge, &p); + drm_puts(&p, "\n**** GuC CT ****\n"); + xe_guc_ct_snapshot_print(ss->ct, &p); + xe_guc_exec_queue_snapshot_print(ss->ge, &p); - drm_printf(&p, "\n**** Job ****\n"); - xe_sched_job_snapshot_print(coredump->snapshot.job, &p); + drm_puts(&p, "\n**** Job ****\n"); + xe_sched_job_snapshot_print(ss->job, &p); - drm_printf(&p, "\n**** HW Engines ****\n"); + drm_puts(&p, "\n**** HW Engines ****\n"); for (i = 0; i < XE_NUM_HW_ENGINES; i++) - if (coredump->snapshot.hwe[i]) - xe_hw_engine_snapshot_print(coredump->snapshot.hwe[i], - &p); - drm_printf(&p, "\n**** VM state ****\n"); - xe_vm_snapshot_print(coredump->snapshot.vm, &p); + if (ss->hwe[i]) + xe_hw_engine_snapshot_print(ss->hwe[i], &p); + + drm_puts(&p, "\n**** VM state ****\n"); + xe_vm_snapshot_print(ss->vm, &p); return count - iter.remain; } @@ -247,18 +247,18 @@ static void devcoredump_snapshot(struct xe_devcoredump *coredump, if (xe_force_wake_get(gt_to_fw(q->gt), XE_FORCEWAKE_ALL)) xe_gt_info(ss->gt, "failed to get forcewake for coredump capture\n"); - coredump->snapshot.ct = xe_guc_ct_snapshot_capture(&guc->ct, true); - coredump->snapshot.ge = xe_guc_exec_queue_snapshot_capture(q); - coredump->snapshot.job = xe_sched_job_snapshot_capture(job); - coredump->snapshot.vm = xe_vm_snapshot_capture(q->vm); + ss->ct = xe_guc_ct_snapshot_capture(&guc->ct, true); + ss->ge = xe_guc_exec_queue_snapshot_capture(q); + ss->job = xe_sched_job_snapshot_capture(job); + ss->vm = xe_vm_snapshot_capture(q->vm); for_each_hw_engine(hwe, q->gt, id) { if (hwe->class != q->hwe->class || !(BIT(hwe->logical_instance) & adj_logical_mask)) { - coredump->snapshot.hwe[id] = NULL; + ss->hwe[id] = NULL; continue; } - coredump->snapshot.hwe[id] = xe_hw_engine_snapshot_capture(hwe); + ss->hwe[id] = xe_hw_engine_snapshot_capture(hwe); } queue_work(system_unbound_wq, &ss->work); From patchwork Fri Sep 20 03:19:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Harrison X-Patchwork-Id: 13808088 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 351ABCF58C4 for ; Fri, 20 Sep 2024 03:20:17 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 82B0B10E6FD; Fri, 20 Sep 2024 03:20:09 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="PgSbHy0J"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) by gabe.freedesktop.org (Postfix) with ESMTPS id 50EAE10E77B; Fri, 20 Sep 2024 03:20:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1726802408; x=1758338408; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=vgTYRvbNN6vtdOgxWqXoi+clJRO6dmTcyaYBqjo+4BA=; b=PgSbHy0JTewcof+Dto5FZCfis34Jg0Cf0KofRAl3bZYYZknQdxYr4oyF 4VyUIh7oU7CsMGVftun/u66/AbWew537bE7Ms1qBPAZg+BDAiZbRso/NR RLfP/E5MfXCo2XNDLQBvtZK3vvTSk1KBREHNkwCnxORs6ZkgUi1GKakqL 4wExGMxIDQ3SWtYzrk4LlXWNsczHHRx8bVNWKpQrgDp/1pe2jtFKBmMhA 2bk/1FYojHo6+w46PPa03eMNTazKjgfEIjtVlUvRKNulZ9Ib4xpRD5XNX SZyO6TppB1mZKYc/YEeRkrOtTMYzXVAKd60yKvi/bWFd0MrMCHXeFBV2/ g==; X-CSE-ConnectionGUID: n+GcdtAZRJOGZpvfk6U3Tw== X-CSE-MsgGUID: SOGVxyBJQK+9uY3aXYkOaA== X-IronPort-AV: E=McAfee;i="6700,10204,11200"; a="25269800" X-IronPort-AV: E=Sophos;i="6.10,243,1719903600"; d="scan'208";a="25269800" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Sep 2024 20:20:08 -0700 X-CSE-ConnectionGUID: hNFn6fGSRUiqIuNh0HRIrA== X-CSE-MsgGUID: B+PmJOz2THGp0g/2Lmz2tA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,243,1719903600"; d="scan'208";a="69746176" Received: from relo-linux-5.jf.intel.com ([10.165.21.152]) by fmviesa007.fm.intel.com with ESMTP; 19 Sep 2024 20:20:08 -0700 From: John.C.Harrison@Intel.com To: Intel-GFX@Lists.FreeDesktop.Org Cc: DRI-Devel@Lists.FreeDesktop.Org, John Harrison Subject: [PATCH v8 03/11] drm/xe/devcoredump: Improve section headings and add tile info Date: Thu, 19 Sep 2024 20:19:58 -0700 Message-ID: <20240920032007.629624-4-John.C.Harrison@Intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20240920032007.629624-1-John.C.Harrison@Intel.com> References: <20240920032007.629624-1-John.C.Harrison@Intel.com> MIME-Version: 1.0 Organization: Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: John Harrison The xe_guc_exec_queue_snapshot is not really a GuC internal thing and is definitely not a GuC CT thing. So give it its own section heading. The snapshot itself is really a capture of the submission backend's internal state. Although all it currently prints out is the submission contexts. So label it as 'Contexts'. If more general state is added later then it could be change to 'Submission backend' or some such. Further, everything from the GuC CT section onwards is GT specific but there was no indication of which GT it was related to (and that is impossible to work out from the other fields that are given). So add a GT section heading. Also include the tile id of the GT, because again significant information. Lastly, drop a couple of unnecessary line feeds within sections. v2: Add GT section heading, add tile id to device section. Signed-off-by: John Harrison --- drivers/gpu/drm/xe/xe_devcoredump.c | 5 +++++ drivers/gpu/drm/xe/xe_devcoredump_types.h | 3 ++- drivers/gpu/drm/xe/xe_device.c | 1 + drivers/gpu/drm/xe/xe_guc_submit.c | 2 +- drivers/gpu/drm/xe/xe_hw_engine.c | 1 - 5 files changed, 9 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c b/drivers/gpu/drm/xe/xe_devcoredump.c index d23719d5c2a3..2690f1d1cde4 100644 --- a/drivers/gpu/drm/xe/xe_devcoredump.c +++ b/drivers/gpu/drm/xe/xe_devcoredump.c @@ -96,8 +96,13 @@ static ssize_t __xe_devcoredump_read(char *buffer, size_t count, drm_printf(&p, "Process: %s\n", ss->process_name); xe_device_snapshot_print(xe, &p); + drm_printf(&p, "\n**** GT #%d ****\n", ss->gt->info.id); + drm_printf(&p, "\tTile: %d\n", ss->gt->tile->id); + drm_puts(&p, "\n**** GuC CT ****\n"); xe_guc_ct_snapshot_print(ss->ct, &p); + + drm_puts(&p, "\n**** Contexts ****\n"); xe_guc_exec_queue_snapshot_print(ss->ge, &p); drm_puts(&p, "\n**** Job ****\n"); diff --git a/drivers/gpu/drm/xe/xe_devcoredump_types.h b/drivers/gpu/drm/xe/xe_devcoredump_types.h index 440d05d77a5a..3cc2f095fdfb 100644 --- a/drivers/gpu/drm/xe/xe_devcoredump_types.h +++ b/drivers/gpu/drm/xe/xe_devcoredump_types.h @@ -37,7 +37,8 @@ struct xe_devcoredump_snapshot { /* GuC snapshots */ /** @ct: GuC CT snapshot */ struct xe_guc_ct_snapshot *ct; - /** @ge: Guc Engine snapshot */ + + /** @ge: GuC Submission Engine snapshot */ struct xe_guc_submit_exec_queue_snapshot *ge; /** @hwe: HW Engine snapshot array */ diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c index 4d3c794f134c..178e5346979c 100644 --- a/drivers/gpu/drm/xe/xe_device.c +++ b/drivers/gpu/drm/xe/xe_device.c @@ -955,6 +955,7 @@ void xe_device_snapshot_print(struct xe_device *xe, struct drm_printer *p) for_each_gt(gt, xe, id) { drm_printf(p, "GT id: %u\n", id); + drm_printf(p, "\tTile: %u\n", gt->tile->id); drm_printf(p, "\tType: %s\n", gt->info.type == XE_GT_TYPE_MAIN ? "main" : "media"); drm_printf(p, "\tIP ver: %u.%u.%u\n", diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c index a98b85129076..4bc5793f627b 100644 --- a/drivers/gpu/drm/xe/xe_guc_submit.c +++ b/drivers/gpu/drm/xe/xe_guc_submit.c @@ -2209,7 +2209,7 @@ xe_guc_exec_queue_snapshot_print(struct xe_guc_submit_exec_queue_snapshot *snaps if (!snapshot) return; - drm_printf(p, "\nGuC ID: %d\n", snapshot->guc.id); + drm_printf(p, "GuC ID: %d\n", snapshot->guc.id); drm_printf(p, "\tName: %s\n", snapshot->name); drm_printf(p, "\tClass: %d\n", snapshot->class); drm_printf(p, "\tLogical mask: 0x%x\n", snapshot->logical_mask); diff --git a/drivers/gpu/drm/xe/xe_hw_engine.c b/drivers/gpu/drm/xe/xe_hw_engine.c index a7abc4b67e67..3ae3713f503b 100644 --- a/drivers/gpu/drm/xe/xe_hw_engine.c +++ b/drivers/gpu/drm/xe/xe_hw_engine.c @@ -1057,7 +1057,6 @@ void xe_hw_engine_snapshot_print(struct xe_hw_engine_snapshot *snapshot, if (snapshot->hwe->class == XE_ENGINE_CLASS_COMPUTE) drm_printf(p, "\tRCU_MODE: 0x%08x\n", snapshot->reg.rcu_mode); - drm_puts(p, "\n"); } /** From patchwork Fri Sep 20 03:19:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Harrison X-Patchwork-Id: 13808086 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 60A2FCF58C1 for ; Fri, 20 Sep 2024 03:20:11 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 87C1D10E77F; Fri, 20 Sep 2024 03:20:09 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="EaGAQuem"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) by gabe.freedesktop.org (Postfix) with ESMTPS id 7654E10E6FD; Fri, 20 Sep 2024 03:20:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1726802408; x=1758338408; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Fs+XtVZS44nUEnynpVBODDwfTEqbVLcWu7ct5t+H1QI=; b=EaGAQuem4Rj+UbGzDXgx1au3BvokCC4W1MOgxys8Moz+FKg6vPW2AfIK Pzr7keIcyWYl6zkXoyIGpIwfLlSRX7NMinepO6cy5xmuffDtzRp8ruQWg Ifl/aLYdOIhCD/2qP0Fv054LbFFj2qGliV0NBxsCYgrDTTqiMtYbqjTZA iKS524w4GIbxgBj/N4TAkpp6tyl8gofOejYOVAPdGwrUOMO0SDwSbOZtw 3OcL3e4sNcFxgkm2n1rdpMC6K7WCZqwANSqUpRihBVOh67HwSeAyTHsb2 jNpxmAKMk4NO4ThT24RdxuL/e0PzWLU4OkLkml7w2MTUZFqi+h22zcmSR g==; X-CSE-ConnectionGUID: tfWEpHUySsusv6uR3JYI/g== X-CSE-MsgGUID: s2L9hy2fSmKcR59LoMWS3g== X-IronPort-AV: E=McAfee;i="6700,10204,11200"; a="25269801" X-IronPort-AV: E=Sophos;i="6.10,243,1719903600"; d="scan'208";a="25269801" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Sep 2024 20:20:08 -0700 X-CSE-ConnectionGUID: VTX0X5azQ1uuEwZDbahpEA== X-CSE-MsgGUID: MGoCovnGSEmtNCAC1hmFlw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,243,1719903600"; d="scan'208";a="69746179" Received: from relo-linux-5.jf.intel.com ([10.165.21.152]) by fmviesa007.fm.intel.com with ESMTP; 19 Sep 2024 20:20:08 -0700 From: John.C.Harrison@Intel.com To: Intel-GFX@Lists.FreeDesktop.Org Cc: DRI-Devel@Lists.FreeDesktop.Org, John Harrison Subject: [PATCH v8 04/11] drm/xe/devcoredump: Add ASCII85 dump helper function Date: Thu, 19 Sep 2024 20:19:59 -0700 Message-ID: <20240920032007.629624-5-John.C.Harrison@Intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20240920032007.629624-1-John.C.Harrison@Intel.com> References: <20240920032007.629624-1-John.C.Harrison@Intel.com> MIME-Version: 1.0 Organization: Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: John Harrison There is a need to include the GuC log and other large binary objects in core dumps and via dmesg. So add a helper for dumping to a printer function via conversion to ASCII85 encoding. Another issue with dumping such a large buffer is that it can be slow, especially if dumping to dmesg over a serial port. So add a yield to prevent the 'task has been stuck for 120s' kernel hang check feature from firing. v2: Add a prefix to the output string. Fix memory allocation bug. v3: Correct a string size calculation and clean up a define (review feedback from Julia F). Signed-off-by: John Harrison --- drivers/gpu/drm/xe/xe_devcoredump.c | 87 +++++++++++++++++++++++++++++ drivers/gpu/drm/xe/xe_devcoredump.h | 6 ++ 2 files changed, 93 insertions(+) diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c b/drivers/gpu/drm/xe/xe_devcoredump.c index 2690f1d1cde4..0884c49942fe 100644 --- a/drivers/gpu/drm/xe/xe_devcoredump.c +++ b/drivers/gpu/drm/xe/xe_devcoredump.c @@ -6,6 +6,7 @@ #include "xe_devcoredump.h" #include "xe_devcoredump_types.h" +#include #include #include @@ -315,3 +316,89 @@ int xe_devcoredump_init(struct xe_device *xe) } #endif + +/** + * xe_print_blob_ascii85 - print a BLOB to some useful location in ASCII85 + * + * The output is split to multiple lines because some print targets, e.g. dmesg + * cannot handle arbitrarily long lines. Note also that printing to dmesg in + * piece-meal fashion is not possible, each separate call to drm_puts() has a + * line-feed automatically added! Therefore, the entire output line must be + * constructed in a local buffer first, then printed in one atomic output call. + * + * There is also a scheduler yield call to prevent the 'task has been stuck for + * 120s' kernel hang check feature from firing when printing to a slow target + * such as dmesg over a serial port. + * + * TODO: Add compression prior to the ASCII85 encoding to shrink huge buffers down. + * + * @p: the printer object to output to + * @prefix: optional prefix to add to output string + * @blob: the Binary Large OBject to dump out + * @offset: offset in bytes to skip from the front of the BLOB, must be a multiple of sizeof(u32) + * @size: the size in bytes of the BLOB, must be a multiple of sizeof(u32) + */ +void xe_print_blob_ascii85(struct drm_printer *p, const char *prefix, + const void *blob, size_t offset, size_t size) +{ + const u32 *blob32 = (const u32 *)blob; + char buff[ASCII85_BUFSZ], *line_buff; + size_t line_pos = 0; + +#define DMESG_MAX_LINE_LEN 800 +#define MIN_SPACE (ASCII85_BUFSZ + 2) /* 85 + "\n\0" */ + + if (size & 3) + drm_printf(p, "Size not word aligned: %zu", size); + if (offset & 3) + drm_printf(p, "Offset not word aligned: %zu", size); + + line_buff = kzalloc(DMESG_MAX_LINE_LEN, GFP_KERNEL); + if (IS_ERR_OR_NULL(line_buff)) { + drm_printf(p, "Failed to allocate line buffer: %pe", line_buff); + return; + } + + blob32 += offset / sizeof(*blob32); + size /= sizeof(*blob32); + + if (prefix) { + strscpy(line_buff, prefix, DMESG_MAX_LINE_LEN - MIN_SPACE - 2); + line_pos = strlen(line_buff); + + line_buff[line_pos++] = ':'; + line_buff[line_pos++] = ' '; + } + + while (size--) { + u32 val = *(blob32++); + + strscpy(line_buff + line_pos, ascii85_encode(val, buff), + DMESG_MAX_LINE_LEN - line_pos); + line_pos += strlen(line_buff + line_pos); + + if ((line_pos + MIN_SPACE) >= DMESG_MAX_LINE_LEN) { + line_buff[line_pos++] = '\n'; + line_buff[line_pos++] = 0; + + drm_puts(p, line_buff); + + line_pos = 0; + + /* Prevent 'stuck thread' time out errors */ + cond_resched(); + } + } + + if (line_pos) { + line_buff[line_pos++] = '\n'; + line_buff[line_pos++] = 0; + + drm_puts(p, line_buff); + } + + kfree(line_buff); + +#undef MIN_SPACE +#undef DMESG_MAX_LINE_LEN +} diff --git a/drivers/gpu/drm/xe/xe_devcoredump.h b/drivers/gpu/drm/xe/xe_devcoredump.h index e2fa65ce0932..a4eebc285fc8 100644 --- a/drivers/gpu/drm/xe/xe_devcoredump.h +++ b/drivers/gpu/drm/xe/xe_devcoredump.h @@ -6,6 +6,9 @@ #ifndef _XE_DEVCOREDUMP_H_ #define _XE_DEVCOREDUMP_H_ +#include + +struct drm_printer; struct xe_device; struct xe_sched_job; @@ -23,4 +26,7 @@ static inline int xe_devcoredump_init(struct xe_device *xe) } #endif +void xe_print_blob_ascii85(struct drm_printer *p, const char *prefix, + const void *blob, size_t offset, size_t size); + #endif From patchwork Fri Sep 20 03:20:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Harrison X-Patchwork-Id: 13808090 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 14529CF58C4 for ; Fri, 20 Sep 2024 03:20:20 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 1FEB310E78A; Fri, 20 Sep 2024 03:20:12 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="FAmx6h7y"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) by gabe.freedesktop.org (Postfix) with ESMTPS id 9F32A10E77B; Fri, 20 Sep 2024 03:20:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1726802408; x=1758338408; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=6bq7x1TR5KHZFiCZoS9iUTEeAoF/JQCRzOhFtLnmQd4=; b=FAmx6h7yKDIpb9WzplIcXYr6HFEVU/PVyEk5nAwCd1F2+zeZvMKlPAs8 rpfocP+ks0FgwLm31OoqnuT+nZW1IR35QP2lDJM5GLXDerEH5pUnRrRAX Bv406dRi6KqD9iq8KY7IAcoLIk7p0jtbU2zYBNtCFA1Ti7YtPQPdloEc2 Mlir8PL8bivwy0d7w9GPuZR9ndiPLH8xq2URVaX39DpkOxL+KgQOKSEe3 9aKHLpLmndEZ+wpdZ+xr7tuO7Gaa0xEDxYJPiqP/bB15tjtiuuyr+cFTS 60Ldmc8eceerGWDgMkWceKiG4x488QetNqTiWVMYDyZn/M2ecsyB+zWDf w==; X-CSE-ConnectionGUID: HZZItTnmQ3OnL4zlSQdLgw== X-CSE-MsgGUID: uPOHiAyRTW64zC9RX9aDtQ== X-IronPort-AV: E=McAfee;i="6700,10204,11200"; a="25269802" X-IronPort-AV: E=Sophos;i="6.10,243,1719903600"; d="scan'208";a="25269802" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Sep 2024 20:20:08 -0700 X-CSE-ConnectionGUID: oRYT9IlsSnWsB0RY3Mei7g== X-CSE-MsgGUID: 6nnmlxJ4S9iAIB3lSiPSsg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,243,1719903600"; d="scan'208";a="69746182" Received: from relo-linux-5.jf.intel.com ([10.165.21.152]) by fmviesa007.fm.intel.com with ESMTP; 19 Sep 2024 20:20:08 -0700 From: John.C.Harrison@Intel.com To: Intel-GFX@Lists.FreeDesktop.Org Cc: DRI-Devel@Lists.FreeDesktop.Org, John Harrison , Julia Filipchuk Subject: [PATCH v8 05/11] drm/xe/guc: Copy GuC log prior to dumping Date: Thu, 19 Sep 2024 20:20:00 -0700 Message-ID: <20240920032007.629624-6-John.C.Harrison@Intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20240920032007.629624-1-John.C.Harrison@Intel.com> References: <20240920032007.629624-1-John.C.Harrison@Intel.com> MIME-Version: 1.0 Organization: Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: John Harrison Add an extra stage to the GuC log print to copy the log buffer into regular host memory first, rather than printing the live GPU buffer object directly. Doing so helps prevent inconsistencies due to the log being updated as it is being dumped. It also allows the use of the ASCII85 helper function for printing the log in a more compact form than a straight hex dump. v2: Use %zx instead of %lx for size_t prints. v3: Replace hexdump code with ascii85 call (review feedback from Matthew B). Move chunking code into next patch as that reduces the deltas of both. v4: Add a prefix to the ASCII85 output to aid tool parsing. Signed-off-by: John Harrison Reviewed-by: Julia Filipchuk --- drivers/gpu/drm/xe/xe_guc_log.c | 40 +++++++++++++++++++-------------- 1 file changed, 23 insertions(+), 17 deletions(-) diff --git a/drivers/gpu/drm/xe/xe_guc_log.c b/drivers/gpu/drm/xe/xe_guc_log.c index a37ee3419428..be47780ec2a7 100644 --- a/drivers/gpu/drm/xe/xe_guc_log.c +++ b/drivers/gpu/drm/xe/xe_guc_log.c @@ -6,9 +6,12 @@ #include "xe_guc_log.h" #include +#include #include "xe_bo.h" +#include "xe_devcoredump.h" #include "xe_gt.h" +#include "xe_gt_printk.h" #include "xe_map.h" #include "xe_module.h" @@ -49,32 +52,35 @@ static size_t guc_log_size(void) CAPTURE_BUFFER_SIZE; } +/** + * xe_guc_log_print - dump a copy of the GuC log to some useful location + * @log: GuC log structure + * @p: the printer object to output to + */ void xe_guc_log_print(struct xe_guc_log *log, struct drm_printer *p) { struct xe_device *xe = log_to_xe(log); size_t size; - int i, j; + void *copy; - xe_assert(xe, log->bo); + if (!log->bo) { + drm_puts(p, "GuC log buffer not allocated"); + return; + } size = log->bo->size; -#define DW_PER_READ 128 - xe_assert(xe, !(size % (DW_PER_READ * sizeof(u32)))); - for (i = 0; i < size / sizeof(u32); i += DW_PER_READ) { - u32 read[DW_PER_READ]; - - xe_map_memcpy_from(xe, read, &log->bo->vmap, i * sizeof(u32), - DW_PER_READ * sizeof(u32)); -#define DW_PER_PRINT 4 - for (j = 0; j < DW_PER_READ / DW_PER_PRINT; ++j) { - u32 *print = read + j * DW_PER_PRINT; - - drm_printf(p, "0x%08x 0x%08x 0x%08x 0x%08x\n", - *(print + 0), *(print + 1), - *(print + 2), *(print + 3)); - } + copy = vmalloc(size); + if (!copy) { + drm_printf(p, "Failed to allocate %zu", size); + return; } + + xe_map_memcpy_from(xe, copy, &log->bo->vmap, 0, size); + + xe_print_blob_ascii85(p, "Log data", copy, 0, size); + + vfree(copy); } int xe_guc_log_init(struct xe_guc_log *log) From patchwork Fri Sep 20 03:20:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Harrison X-Patchwork-Id: 13808096 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0CCD9CF58C2 for ; Fri, 20 Sep 2024 03:20:27 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 9189910E796; Fri, 20 Sep 2024 03:20:13 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="ciincApj"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) by gabe.freedesktop.org (Postfix) with ESMTPS id C6B0710E6FD; Fri, 20 Sep 2024 03:20:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1726802408; x=1758338408; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=p4whIsSIXdn2iaXsE/USUI4x7Nauuq9rFue569AXgGY=; b=ciincApjB+43K1ILXolD6MmjTweOUtA37fhb49yhdEQLfo/Re8yf7lu0 vwF76nu9JHHVufJJWEmtKRE+xbHG73CLStjk75mfRdbzGMpzgqilmTNQg ++v5iR+IuSXWR3rZ/A4f8G0dlqBTOwxXSTOWqLvF1r2ZZt74RLW9GJytB Vx7LRVxTarm3ZbIJrNCrAGzWCsNyHOX0gEj9tzu039mti6rMsZg18Ncmu D//YOW+1tKpPpa5RV1SDEYRZQxNLC6zJWuKiiMTVqS5NogVJk8yCra+k+ ZiUnvmJU6+8YjAYAAsCHbrhTXrKy6EGVA7INCb4oYjGjVfCfp0Nfpg/A1 A==; X-CSE-ConnectionGUID: 0WO7BM3YSXm/99hgks2XvQ== X-CSE-MsgGUID: yCCmqeqVQr2DxYWPmEntBw== X-IronPort-AV: E=McAfee;i="6700,10204,11200"; a="25269803" X-IronPort-AV: E=Sophos;i="6.10,243,1719903600"; d="scan'208";a="25269803" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Sep 2024 20:20:08 -0700 X-CSE-ConnectionGUID: zQQ8F3YXRK+H9nlagprURQ== X-CSE-MsgGUID: SupvIikATeS+u5NL23cQow== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,243,1719903600"; d="scan'208";a="69746185" Received: from relo-linux-5.jf.intel.com ([10.165.21.152]) by fmviesa007.fm.intel.com with ESMTP; 19 Sep 2024 20:20:08 -0700 From: John.C.Harrison@Intel.com To: Intel-GFX@Lists.FreeDesktop.Org Cc: DRI-Devel@Lists.FreeDesktop.Org, John Harrison Subject: [PATCH v8 06/11] drm/xe/guc: Use a two stage dump for GuC logs and add more info Date: Thu, 19 Sep 2024 20:20:01 -0700 Message-ID: <20240920032007.629624-7-John.C.Harrison@Intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20240920032007.629624-1-John.C.Harrison@Intel.com> References: <20240920032007.629624-1-John.C.Harrison@Intel.com> MIME-Version: 1.0 Organization: Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: John Harrison Split the GuC log dump into a two stage snapshot and print mechanism. This allows the log to be captured at the point of an error (which may be in a restricted context) and then dump it out later (from a regular context such as a worker function or a sysfs file handler). Also add a bunch of other useful pieces of information that can help (or are fundamentally required!) to decode and parse the log. v2: Add kerneldoc and fix a couple of comment typos - review feedback from Michal W. v3: Move chunking code to this patch as it makes the deltas simpler. Fix a bunch of kerneldoc issues. v4: Move the CS frequency out of the coredump snapshot function into the debugfs only code (as that info is already part of the main devcoredump). Add a header to the debugfs log to match the one in the devcoredump to aid processing by a unified tool. Add forcewake to the GuC timestamp read so it actually works. v6: Add colon to GuC version string (review feedback by Julia F). Signed-off-by: John Harrison --- drivers/gpu/drm/xe/regs/xe_guc_regs.h | 1 + drivers/gpu/drm/xe/xe_guc_log.c | 178 +++++++++++++++++++++++--- drivers/gpu/drm/xe/xe_guc_log.h | 4 + drivers/gpu/drm/xe/xe_guc_log_types.h | 27 ++++ 4 files changed, 195 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/xe/regs/xe_guc_regs.h b/drivers/gpu/drm/xe/regs/xe_guc_regs.h index a5fd14307f94..b27b73680c12 100644 --- a/drivers/gpu/drm/xe/regs/xe_guc_regs.h +++ b/drivers/gpu/drm/xe/regs/xe_guc_regs.h @@ -84,6 +84,7 @@ #define HUC_LOADING_AGENT_GUC REG_BIT(1) #define GUC_WOPCM_OFFSET_VALID REG_BIT(0) #define GUC_MAX_IDLE_COUNT XE_REG(0xc3e4) +#define GUC_PMTIMESTAMP XE_REG(0xc3e8) #define GUC_SEND_INTERRUPT XE_REG(0xc4c8) #define GUC_SEND_TRIGGER REG_BIT(0) diff --git a/drivers/gpu/drm/xe/xe_guc_log.c b/drivers/gpu/drm/xe/xe_guc_log.c index be47780ec2a7..24564624e91e 100644 --- a/drivers/gpu/drm/xe/xe_guc_log.c +++ b/drivers/gpu/drm/xe/xe_guc_log.c @@ -6,15 +6,23 @@ #include "xe_guc_log.h" #include -#include +#include "regs/xe_guc_regs.h" #include "xe_bo.h" #include "xe_devcoredump.h" +#include "xe_force_wake.h" #include "xe_gt.h" #include "xe_gt_printk.h" #include "xe_map.h" +#include "xe_mmio.h" #include "xe_module.h" +static struct xe_guc * +log_to_guc(struct xe_guc_log *log) +{ + return container_of(log, struct xe_guc, log); +} + static struct xe_gt * log_to_gt(struct xe_guc_log *log) { @@ -52,35 +60,175 @@ static size_t guc_log_size(void) CAPTURE_BUFFER_SIZE; } +#define GUC_LOG_CHUNK_SIZE SZ_2M + +static struct xe_guc_log_snapshot *xe_guc_log_snapshot_alloc(struct xe_guc_log *log, bool atomic) +{ + struct xe_guc_log_snapshot *snapshot; + size_t remain; + int i; + + snapshot = kzalloc(sizeof(*snapshot), atomic ? GFP_ATOMIC : GFP_KERNEL); + if (!snapshot) + return NULL; + + /* + * NB: kmalloc has a hard limit well below the maximum GuC log buffer size. + * Also, can't use vmalloc as might be called from atomic context. So need + * to break the buffer up into smaller chunks that can be allocated. + */ + snapshot->size = log->bo->size; + snapshot->num_chunks = DIV_ROUND_UP(snapshot->size, GUC_LOG_CHUNK_SIZE); + + snapshot->copy = kcalloc(snapshot->num_chunks, sizeof(*snapshot->copy), + atomic ? GFP_ATOMIC : GFP_KERNEL); + if (!snapshot->copy) + goto fail_snap; + + remain = snapshot->size; + for (i = 0; i < snapshot->num_chunks; i++) { + size_t size = min(GUC_LOG_CHUNK_SIZE, remain); + + snapshot->copy[i] = kmalloc(size, atomic ? GFP_ATOMIC : GFP_KERNEL); + if (!snapshot->copy[i]) + goto fail_copy; + remain -= size; + } + + return snapshot; + +fail_copy: + for (i = 0; i < snapshot->num_chunks; i++) + kfree(snapshot->copy[i]); + kfree(snapshot->copy); +fail_snap: + kfree(snapshot); + return NULL; +} + /** - * xe_guc_log_print - dump a copy of the GuC log to some useful location + * xe_guc_log_snapshot_free - free a previously captured GuC log snapshot + * @snapshot: GuC log snapshot structure + * + * Return: pointer to a newly allocated snapshot object or null if out of memory. Caller is + * responsible for calling xe_guc_log_snapshot_free when done with the snapshot. + */ +void xe_guc_log_snapshot_free(struct xe_guc_log_snapshot *snapshot) +{ + int i; + + if (!snapshot) + return; + + if (!snapshot->copy) { + for (i = 0; i < snapshot->num_chunks; i++) + kfree(snapshot->copy[i]); + kfree(snapshot->copy); + } + + kfree(snapshot); +} + +/** + * xe_guc_log_snapshot_capture - create a new snapshot copy the GuC log for later dumping * @log: GuC log structure - * @p: the printer object to output to + * @atomic: is the call inside an atomic section of some kind? + * + * Return: pointer to a newly allocated snapshot object or null if out of memory. Caller is + * responsible for calling xe_guc_log_snapshot_free when done with the snapshot. */ -void xe_guc_log_print(struct xe_guc_log *log, struct drm_printer *p) +struct xe_guc_log_snapshot *xe_guc_log_snapshot_capture(struct xe_guc_log *log, bool atomic) { + struct xe_guc_log_snapshot *snapshot; struct xe_device *xe = log_to_xe(log); - size_t size; - void *copy; + struct xe_guc *guc = log_to_guc(log); + struct xe_gt *gt = log_to_gt(log); + size_t remain; + int i, err; if (!log->bo) { - drm_puts(p, "GuC log buffer not allocated"); - return; + xe_gt_err(gt, "GuC log buffer not allocated\n"); + return NULL; + } + + snapshot = xe_guc_log_snapshot_alloc(log, atomic); + if (!snapshot) { + xe_gt_err(gt, "GuC log snapshot not allocated\n"); + return NULL; } - size = log->bo->size; + remain = snapshot->size; + for (i = 0; i < snapshot->num_chunks; i++) { + size_t size = min(GUC_LOG_CHUNK_SIZE, remain); + + xe_map_memcpy_from(xe, snapshot->copy[i], &log->bo->vmap, + i * GUC_LOG_CHUNK_SIZE, size); + remain -= size; + } + + err = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT); + if (err) { + snapshot->stamp = ~0; + } else { + snapshot->stamp = xe_mmio_read32(>->mmio, GUC_PMTIMESTAMP); + xe_force_wake_put(gt_to_fw(gt), XE_FW_GT); + } + snapshot->ktime = ktime_get_boottime_ns(); + snapshot->level = log->level; + snapshot->ver_found = guc->fw.versions.found[XE_UC_FW_VER_RELEASE]; + snapshot->ver_want = guc->fw.versions.wanted; + snapshot->path = guc->fw.path; + + return snapshot; +} + +/** + * xe_guc_log_snapshot_print - dump a previously saved copy of the GuC log to some useful location + * @snapshot: a snapshot of the GuC log + * @p: the printer object to output to + */ +void xe_guc_log_snapshot_print(struct xe_guc_log_snapshot *snapshot, struct drm_printer *p) +{ + size_t remain; + int i; - copy = vmalloc(size); - if (!copy) { - drm_printf(p, "Failed to allocate %zu", size); + if (!snapshot) { + drm_printf(p, "GuC log snapshot not allocated!\n"); return; } - xe_map_memcpy_from(xe, copy, &log->bo->vmap, 0, size); + drm_printf(p, "GuC firmware: %s\n", snapshot->path); + drm_printf(p, "GuC version: %u.%u.%u (wanted %u.%u.%u)\n", + snapshot->ver_found.major, snapshot->ver_found.minor, snapshot->ver_found.patch, + snapshot->ver_want.major, snapshot->ver_want.minor, snapshot->ver_want.patch); + drm_printf(p, "Kernel timestamp: 0x%08llX [%llu]\n", snapshot->ktime, snapshot->ktime); + drm_printf(p, "GuC timestamp: 0x%08X [%u]\n", snapshot->stamp, snapshot->stamp); + drm_printf(p, "Log level: %u\n", snapshot->level); + + remain = snapshot->size; + for (i = 0; i < snapshot->num_chunks; i++) { + size_t size = min(GUC_LOG_CHUNK_SIZE, remain); + + xe_print_blob_ascii85(p, i ? NULL : "Log data", snapshot->copy[i], 0, size); + remain -= size; + } +} + +/** + * xe_guc_log_print - dump a copy of the GuC log to some useful location + * @log: GuC log structure + * @p: the printer object to output to + */ +void xe_guc_log_print(struct xe_guc_log *log, struct drm_printer *p) +{ + struct xe_guc_log_snapshot *snapshot; - xe_print_blob_ascii85(p, "Log data", copy, 0, size); + drm_printf(p, "**** GuC Log ****\n"); - vfree(copy); + snapshot = xe_guc_log_snapshot_capture(log, false); + drm_printf(p, "CS reference clock: %u\n", log_to_gt(log)->info.reference_clock); + xe_guc_log_snapshot_print(snapshot, p); + xe_guc_log_snapshot_free(snapshot); } int xe_guc_log_init(struct xe_guc_log *log) diff --git a/drivers/gpu/drm/xe/xe_guc_log.h b/drivers/gpu/drm/xe/xe_guc_log.h index 2d25ab28b4b3..949d2c98343d 100644 --- a/drivers/gpu/drm/xe/xe_guc_log.h +++ b/drivers/gpu/drm/xe/xe_guc_log.h @@ -9,6 +9,7 @@ #include "xe_guc_log_types.h" struct drm_printer; +struct xe_device; #if IS_ENABLED(CONFIG_DRM_XE_LARGE_GUC_BUFFER) #define CRASH_BUFFER_SIZE SZ_1M @@ -38,6 +39,9 @@ struct drm_printer; int xe_guc_log_init(struct xe_guc_log *log); void xe_guc_log_print(struct xe_guc_log *log, struct drm_printer *p); +struct xe_guc_log_snapshot *xe_guc_log_snapshot_capture(struct xe_guc_log *log, bool atomic); +void xe_guc_log_snapshot_print(struct xe_guc_log_snapshot *snapshot, struct drm_printer *p); +void xe_guc_log_snapshot_free(struct xe_guc_log_snapshot *snapshot); static inline u32 xe_guc_log_get_level(struct xe_guc_log *log) diff --git a/drivers/gpu/drm/xe/xe_guc_log_types.h b/drivers/gpu/drm/xe/xe_guc_log_types.h index 125080d138a7..962b9edbd9eb 100644 --- a/drivers/gpu/drm/xe/xe_guc_log_types.h +++ b/drivers/gpu/drm/xe/xe_guc_log_types.h @@ -8,8 +8,35 @@ #include +#include "xe_uc_fw_types.h" + struct xe_bo; +/** + * struct xe_guc_log_snapshot: + * Capture of the GuC log plus various state useful for decoding the log + */ +struct xe_guc_log_snapshot { + /** @size: Size in bytes of the @copy allocation */ + size_t size; + /** @copy: Host memory copy of the log buffer for later dumping, split into chunks */ + void **copy; + /** @num_chunks: Number of chunks within @copy */ + int num_chunks; + /** @ktime: Kernel time the snapshot was taken */ + u64 ktime; + /** @stamp: GuC timestamp at which the snapshot was taken */ + u32 stamp; + /** @level: GuC log verbosity level */ + u32 level; + /** @ver_found: GuC firmware version */ + struct xe_uc_fw_version ver_found; + /** @ver_want: GuC firmware version that driver expected */ + struct xe_uc_fw_version ver_want; + /** @path: Path of GuC firmware blob */ + const char *path; +}; + /** * struct xe_guc_log - GuC log */ From patchwork Fri Sep 20 03:20:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Harrison X-Patchwork-Id: 13808094 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 21AB9CF58C7 for ; Fri, 20 Sep 2024 03:20:26 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 39CB410E793; Fri, 20 Sep 2024 03:20:13 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="feEVr8+z"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) by gabe.freedesktop.org (Postfix) with ESMTPS id F07D110E77B; Fri, 20 Sep 2024 03:20:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1726802409; x=1758338409; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=pE9rJGNQU6bQS1D6Z7/XELNdvYB1vUUF7MaKDUcK7KY=; b=feEVr8+z6Y2++qWOkwIIZIqClLd73m55eG2FOtorIXKKY0aom4r826QX 5wxCNit/fBd0Z6v+iVq8kxtnpinDZjOwMkWIK+wbtvWpVxgUu6rF7+uN/ WbeCyf2DynzIQmLHy4YqOQgeKhgRJiHjI6Sxl7ceI76GLg2Y/1/h3iaHS 12ojBaGZklrBNPrWEHVCvjuTp/7Ziuz0QLquKi8ZkPo4BByI2Veekocv8 FZQPU+QeANKHVSg4YecKiZYiNbG9xLcisW6AMu2BXQjldyUteDGgINyBb BtDHKNNSa6VQhib4lQAmkdRrVESI23YJTu1jzXCUcHmcn/3PhfDlWdlZa A==; X-CSE-ConnectionGUID: 8jI5pgWsTUGKac73hMQvRw== X-CSE-MsgGUID: ZYYH/9IBRJaVqRU9DrqQjg== X-IronPort-AV: E=McAfee;i="6700,10204,11200"; a="25269805" X-IronPort-AV: E=Sophos;i="6.10,243,1719903600"; d="scan'208";a="25269805" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Sep 2024 20:20:08 -0700 X-CSE-ConnectionGUID: TUp3ed8KQWuMpHxgsBkPeg== X-CSE-MsgGUID: 60aHJeWASwmjl8jyil1xTw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,243,1719903600"; d="scan'208";a="69746188" Received: from relo-linux-5.jf.intel.com ([10.165.21.152]) by fmviesa007.fm.intel.com with ESMTP; 19 Sep 2024 20:20:08 -0700 From: John.C.Harrison@Intel.com To: Intel-GFX@Lists.FreeDesktop.Org Cc: DRI-Devel@Lists.FreeDesktop.Org, Michal Wajdeczko , Jani Nikula , John Harrison , dri-devel@lists.freedesktop.org Subject: [PATCH v8 07/11] drm/print: Introduce drm_line_printer Date: Thu, 19 Sep 2024 20:20:02 -0700 Message-ID: <20240920032007.629624-8-John.C.Harrison@Intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20240920032007.629624-1-John.C.Harrison@Intel.com> References: <20240920032007.629624-1-John.C.Harrison@Intel.com> MIME-Version: 1.0 Organization: Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Michal Wajdeczko This drm printer wrapper can be used to increase the robustness of the captured output generated by any other drm_printer to make sure we didn't lost any intermediate lines of the output by adding line numbers to each output line. Helpful for capturing some crash data. v2: Extended short int counters to full int (JohnH) Signed-off-by: Michal Wajdeczko Cc: Jani Nikula Cc: John Harrison Cc: dri-devel@lists.freedesktop.org Reviewed-by: Jani Nikula --- drivers/gpu/drm/drm_print.c | 14 ++++++++ include/drm/drm_print.h | 64 +++++++++++++++++++++++++++++++++++++ 2 files changed, 78 insertions(+) diff --git a/drivers/gpu/drm/drm_print.c b/drivers/gpu/drm/drm_print.c index 0081190201a7..08cfea04e22b 100644 --- a/drivers/gpu/drm/drm_print.c +++ b/drivers/gpu/drm/drm_print.c @@ -235,6 +235,20 @@ void __drm_printfn_err(struct drm_printer *p, struct va_format *vaf) } EXPORT_SYMBOL(__drm_printfn_err); +void __drm_printfn_line(struct drm_printer *p, struct va_format *vaf) +{ + unsigned int counter = ++p->line.counter; + const char *prefix = p->prefix ?: ""; + const char *pad = p->prefix ? " " : ""; + + if (p->line.series) + drm_printf(p->arg, "%s%s%u.%u: %pV", + prefix, pad, p->line.series, counter, vaf); + else + drm_printf(p->arg, "%s%s%u: %pV", prefix, pad, counter, vaf); +} +EXPORT_SYMBOL(__drm_printfn_line); + /** * drm_puts - print a const string to a &drm_printer stream * @p: the &drm printer diff --git a/include/drm/drm_print.h b/include/drm/drm_print.h index d2676831d765..b3906dc04388 100644 --- a/include/drm/drm_print.h +++ b/include/drm/drm_print.h @@ -177,6 +177,10 @@ struct drm_printer { void *arg; const void *origin; const char *prefix; + struct { + unsigned int series; + unsigned int counter; + } line; enum drm_debug_category category; }; @@ -187,6 +191,7 @@ void __drm_puts_seq_file(struct drm_printer *p, const char *str); void __drm_printfn_info(struct drm_printer *p, struct va_format *vaf); void __drm_printfn_dbg(struct drm_printer *p, struct va_format *vaf); void __drm_printfn_err(struct drm_printer *p, struct va_format *vaf); +void __drm_printfn_line(struct drm_printer *p, struct va_format *vaf); __printf(2, 3) void drm_printf(struct drm_printer *p, const char *f, ...); @@ -411,6 +416,65 @@ static inline struct drm_printer drm_err_printer(struct drm_device *drm, return p; } +/** + * drm_line_printer - construct a &drm_printer that prefixes outputs with line numbers + * @p: the &struct drm_printer which actually generates the output + * @prefix: optional output prefix, or NULL for no prefix + * @series: optional unique series identifier, or 0 to omit identifier in the output + * + * This printer can be used to increase the robustness of the captured output + * to make sure we didn't lost any intermediate lines of the output. Helpful + * while capturing some crash data. + * + * Example 1:: + * + * void crash_dump(struct drm_device *drm) + * { + * static unsigned int id; + * struct drm_printer p = drm_err_printer(drm, "crash"); + * struct drm_printer lp = drm_line_printer(&p, "dump", ++id); + * + * drm_printf(&lp, "foo"); + * drm_printf(&lp, "bar"); + * } + * + * Above code will print into the dmesg something like:: + * + * [ ] 0000:00:00.0: [drm] *ERROR* crash dump 1.1: foo + * [ ] 0000:00:00.0: [drm] *ERROR* crash dump 1.2: bar + * + * Example 2:: + * + * void line_dump(struct device *dev) + * { + * struct drm_printer p = drm_info_printer(dev); + * struct drm_printer lp = drm_line_printer(&p, NULL, 0); + * + * drm_printf(&lp, "foo"); + * drm_printf(&lp, "bar"); + * } + * + * Above code will print:: + * + * [ ] 0000:00:00.0: [drm] 1: foo + * [ ] 0000:00:00.0: [drm] 2: bar + * + * RETURNS: + * The &drm_printer object + */ +static inline struct drm_printer drm_line_printer(struct drm_printer *p, + const char *prefix, + unsigned int series) +{ + struct drm_printer lp = { + .printfn = __drm_printfn_line, + .arg = p, + .prefix = prefix, + .line = { .series = series, }, + }; + return lp; +} + /* * struct device based logging * From patchwork Fri Sep 20 03:20:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Harrison X-Patchwork-Id: 13808095 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 6068ACF58C6 for ; Fri, 20 Sep 2024 03:20:25 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 0493910E795; Fri, 20 Sep 2024 03:20:13 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="iZW7V/t2"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) by gabe.freedesktop.org (Postfix) with ESMTPS id 23B8410E6FD; Fri, 20 Sep 2024 03:20:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1726802409; x=1758338409; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=2EA07CHP74Jtj+3CNFXkgzHYFj+GgGu0dgxebYLIwks=; b=iZW7V/t2fN2tEykq9PglkqHKCvyREtbnahEgay4N4FihdNzC2u4xt/5Z QmItOKnLU0Obv72iA1OqDL0wek+H4TkJRFS22Kg7+Ymm7Ymc3jyUheGNW BuQH1mseQvMNxUtDSutuWfqUOn3GUue3hJFXmIZOuxY7mVAZjFr9Amx5v sIfmMvRB61h0HH8V1LxEpmIOWSBTZZVaMedrMU85bKU4IQ+JZS2UgnTwn ZFoo4daqrBxl3rfYSdjgqd8TRVVACb5rEPLVNMNRy8mgoxDECEFi7B6EF eLLYQjBBcFSDqGlSdPouapjM9ZZsRue0XEgGY5jyU0gs2PDUFq/v3rN6d A==; X-CSE-ConnectionGUID: Bmwt2UZATxaeEwL4BO+V/w== X-CSE-MsgGUID: f4ULDym0QkmDjsYnMQPHGw== X-IronPort-AV: E=McAfee;i="6700,10204,11200"; a="25269806" X-IronPort-AV: E=Sophos;i="6.10,243,1719903600"; d="scan'208";a="25269806" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Sep 2024 20:20:09 -0700 X-CSE-ConnectionGUID: qabcQ9q+TW6VDrw5/AIQHA== X-CSE-MsgGUID: Pojo4VI9T4+D8L+PVFHnTg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,243,1719903600"; d="scan'208";a="69746191" Received: from relo-linux-5.jf.intel.com ([10.165.21.152]) by fmviesa007.fm.intel.com with ESMTP; 19 Sep 2024 20:20:08 -0700 From: John.C.Harrison@Intel.com To: Intel-GFX@Lists.FreeDesktop.Org Cc: DRI-Devel@Lists.FreeDesktop.Org, John Harrison Subject: [PATCH v8 08/11] drm/xe/guc: Dead CT helper Date: Thu, 19 Sep 2024 20:20:03 -0700 Message-ID: <20240920032007.629624-9-John.C.Harrison@Intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20240920032007.629624-1-John.C.Harrison@Intel.com> References: <20240920032007.629624-1-John.C.Harrison@Intel.com> MIME-Version: 1.0 Organization: Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: John Harrison Add a worker function helper for asynchronously dumping state when an internal/fatal error is detected in CT processing. Being asynchronous is required to avoid deadlocks and scheduling-while-atomic or process-stalled-for-too-long issues. Also check for a bunch more error conditions and improve the handling of some existing checks. v2: Use compile time CONFIG check for new (but not directly CT_DEAD related) checks and use unsigned int for a bitmask, rename CT_DEAD_RESET to CT_DEAD_REARM and add some explaining comments, rename 'hxg' macro parameter to 'ctb' - review feedback from Michal W. Drop CT_DEAD_ALIVE as no need for a bitfield define to just set the entire mask to zero. v3: Fix kerneldoc v4: Nullify some floating pointers after free. v5: Add section headings and device info to make the state dump look more like a devcoredump to allow parsing by the same tools (eventual aim is to just call the devcoredump code itself, but that currently requires an xe_sched_job, which is not available in the CT code). v6: Fix potential for leaking snapshots with concurrent error conditions (review feedback from Julia F). Signed-off-by: John Harrison --- .../drm/xe/abi/guc_communication_ctb_abi.h | 1 + drivers/gpu/drm/xe/xe_guc.c | 2 +- drivers/gpu/drm/xe/xe_guc_ct.c | 320 ++++++++++++++++-- drivers/gpu/drm/xe/xe_guc_ct.h | 2 +- drivers/gpu/drm/xe/xe_guc_ct_types.h | 23 ++ 5 files changed, 318 insertions(+), 30 deletions(-) diff --git a/drivers/gpu/drm/xe/abi/guc_communication_ctb_abi.h b/drivers/gpu/drm/xe/abi/guc_communication_ctb_abi.h index 8f86a16dc577..f58198cf2cf6 100644 --- a/drivers/gpu/drm/xe/abi/guc_communication_ctb_abi.h +++ b/drivers/gpu/drm/xe/abi/guc_communication_ctb_abi.h @@ -52,6 +52,7 @@ struct guc_ct_buffer_desc { #define GUC_CTB_STATUS_OVERFLOW (1 << 0) #define GUC_CTB_STATUS_UNDERFLOW (1 << 1) #define GUC_CTB_STATUS_MISMATCH (1 << 2) +#define GUC_CTB_STATUS_DISABLED (1 << 3) u32 reserved[13]; } __packed; static_assert(sizeof(struct guc_ct_buffer_desc) == 64); diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c index fa2b4ae1257a..b867f30847e5 100644 --- a/drivers/gpu/drm/xe/xe_guc.c +++ b/drivers/gpu/drm/xe/xe_guc.c @@ -1180,7 +1180,7 @@ void xe_guc_print_info(struct xe_guc *guc, struct drm_printer *p) xe_force_wake_put(gt_to_fw(gt), XE_FW_GT); - xe_guc_ct_print(&guc->ct, p, false); + xe_guc_ct_print(&guc->ct, p); xe_guc_submit_print(guc, p); } diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c index a63fe0a9077a..90701fd465c9 100644 --- a/drivers/gpu/drm/xe/xe_guc_ct.c +++ b/drivers/gpu/drm/xe/xe_guc_ct.c @@ -25,12 +25,48 @@ #include "xe_gt_sriov_pf_monitor.h" #include "xe_gt_tlb_invalidation.h" #include "xe_guc.h" +#include "xe_guc_log.h" #include "xe_guc_relay.h" #include "xe_guc_submit.h" #include "xe_map.h" #include "xe_pm.h" #include "xe_trace_guc.h" +#if IS_ENABLED(CONFIG_DRM_XE_DEBUG) +enum { + /* Internal states, not error conditions */ + CT_DEAD_STATE_REARM, /* 0x0001 */ + CT_DEAD_STATE_CAPTURE, /* 0x0002 */ + + /* Error conditions */ + CT_DEAD_SETUP, /* 0x0004 */ + CT_DEAD_H2G_WRITE, /* 0x0008 */ + CT_DEAD_H2G_HAS_ROOM, /* 0x0010 */ + CT_DEAD_G2H_READ, /* 0x0020 */ + CT_DEAD_G2H_RECV, /* 0x0040 */ + CT_DEAD_G2H_RELEASE, /* 0x0080 */ + CT_DEAD_DEADLOCK, /* 0x0100 */ + CT_DEAD_PROCESS_FAILED, /* 0x0200 */ + CT_DEAD_FAST_G2H, /* 0x0400 */ + CT_DEAD_PARSE_G2H_RESPONSE, /* 0x0800 */ + CT_DEAD_PARSE_G2H_UNKNOWN, /* 0x1000 */ + CT_DEAD_PARSE_G2H_ORIGIN, /* 0x2000 */ + CT_DEAD_PARSE_G2H_TYPE, /* 0x4000 */ +}; + +static void ct_dead_worker_func(struct work_struct *w); +static void ct_dead_capture(struct xe_guc_ct *ct, struct guc_ctb *ctb, u32 reason_code); + +#define CT_DEAD(ct, ctb, reason_code) ct_dead_capture((ct), (ctb), CT_DEAD_##reason_code) +#else +#define CT_DEAD(ct, ctb, reason) \ + do { \ + struct guc_ctb *_ctb = (ctb); \ + if (_ctb) \ + _ctb->info.broken = true; \ + } while (0) +#endif + /* Used when a CT send wants to block and / or receive data */ struct g2h_fence { u32 *response_buffer; @@ -183,6 +219,10 @@ int xe_guc_ct_init(struct xe_guc_ct *ct) xa_init(&ct->fence_lookup); INIT_WORK(&ct->g2h_worker, g2h_worker_func); INIT_DELAYED_WORK(&ct->safe_mode_worker, safe_mode_worker_func); +#if IS_ENABLED(CONFIG_DRM_XE_DEBUG) + spin_lock_init(&ct->dead.lock); + INIT_WORK(&ct->dead.worker, ct_dead_worker_func); +#endif init_waitqueue_head(&ct->wq); init_waitqueue_head(&ct->g2h_fence_wq); @@ -419,10 +459,22 @@ int xe_guc_ct_enable(struct xe_guc_ct *ct) if (ct_needs_safe_mode(ct)) ct_enter_safe_mode(ct); +#if IS_ENABLED(CONFIG_DRM_XE_DEBUG) + /* + * The CT has now been reset so the dumper can be re-armed + * after any existing dead state has been dumped. + */ + spin_lock_irq(&ct->dead.lock); + if (ct->dead.reason) + ct->dead.reason |= CT_DEAD_STATE_REARM; + spin_unlock_irq(&ct->dead.lock); +#endif + return 0; err_out: xe_gt_err(gt, "Failed to enable GuC CT (%pe)\n", ERR_PTR(err)); + CT_DEAD(ct, NULL, SETUP); return err; } @@ -466,6 +518,19 @@ static bool h2g_has_room(struct xe_guc_ct *ct, u32 cmd_len) if (cmd_len > h2g->info.space) { h2g->info.head = desc_read(ct_to_xe(ct), h2g, head); + + if (h2g->info.head > h2g->info.size) { + struct xe_device *xe = ct_to_xe(ct); + u32 desc_status = desc_read(xe, h2g, status); + + desc_write(xe, h2g, status, desc_status | GUC_CTB_STATUS_OVERFLOW); + + xe_gt_err(ct_to_gt(ct), "CT: invalid head offset %u >= %u)\n", + h2g->info.head, h2g->info.size); + CT_DEAD(ct, h2g, H2G_HAS_ROOM); + return false; + } + h2g->info.space = CIRC_SPACE(h2g->info.tail, h2g->info.head, h2g->info.size) - h2g->info.resv_space; @@ -521,10 +586,24 @@ static void __g2h_reserve_space(struct xe_guc_ct *ct, u32 g2h_len, u32 num_g2h) static void __g2h_release_space(struct xe_guc_ct *ct, u32 g2h_len) { + bool bad = false; + lockdep_assert_held(&ct->fast_lock); - xe_gt_assert(ct_to_gt(ct), ct->ctbs.g2h.info.space + g2h_len <= - ct->ctbs.g2h.info.size - ct->ctbs.g2h.info.resv_space); - xe_gt_assert(ct_to_gt(ct), ct->g2h_outstanding); + + bad = ct->ctbs.g2h.info.space + g2h_len > + ct->ctbs.g2h.info.size - ct->ctbs.g2h.info.resv_space; + bad |= !ct->g2h_outstanding; + + if (bad) { + xe_gt_err(ct_to_gt(ct), "Invalid G2H release: %d + %d vs %d - %d -> %d vs %d, outstanding = %d!\n", + ct->ctbs.g2h.info.space, g2h_len, + ct->ctbs.g2h.info.size, ct->ctbs.g2h.info.resv_space, + ct->ctbs.g2h.info.space + g2h_len, + ct->ctbs.g2h.info.size - ct->ctbs.g2h.info.resv_space, + ct->g2h_outstanding); + CT_DEAD(ct, &ct->ctbs.g2h, G2H_RELEASE); + return; + } ct->ctbs.g2h.info.space += g2h_len; if (!--ct->g2h_outstanding) @@ -551,12 +630,43 @@ static int h2g_write(struct xe_guc_ct *ct, const u32 *action, u32 len, u32 full_len; struct iosys_map map = IOSYS_MAP_INIT_OFFSET(&h2g->cmds, tail * sizeof(u32)); + u32 desc_status; full_len = len + GUC_CTB_HDR_LEN; lockdep_assert_held(&ct->lock); xe_gt_assert(gt, full_len <= GUC_CTB_MSG_MAX_LEN); - xe_gt_assert(gt, tail <= h2g->info.size); + + desc_status = desc_read(xe, h2g, status); + if (desc_status) { + xe_gt_err(gt, "CT write: non-zero status: %u\n", desc_status); + goto corrupted; + } + + if (IS_ENABLED(CONFIG_DRM_XE_DEBUG)) { + u32 desc_tail = desc_read(xe, h2g, tail); + u32 desc_head = desc_read(xe, h2g, head); + + if (tail != desc_tail) { + desc_write(xe, h2g, status, desc_status | GUC_CTB_STATUS_MISMATCH); + xe_gt_err(gt, "CT write: tail was modified %u != %u\n", desc_tail, tail); + goto corrupted; + } + + if (tail > h2g->info.size) { + desc_write(xe, h2g, status, desc_status | GUC_CTB_STATUS_OVERFLOW); + xe_gt_err(gt, "CT write: tail out of range: %u vs %u\n", + tail, h2g->info.size); + goto corrupted; + } + + if (desc_head >= h2g->info.size) { + desc_write(xe, h2g, status, desc_status | GUC_CTB_STATUS_OVERFLOW); + xe_gt_err(gt, "CT write: invalid head offset %u >= %u)\n", + desc_head, h2g->info.size); + goto corrupted; + } + } /* Command will wrap, zero fill (NOPs), return and check credits again */ if (tail + full_len > h2g->info.size) { @@ -609,6 +719,10 @@ static int h2g_write(struct xe_guc_ct *ct, const u32 *action, u32 len, desc_read(xe, h2g, head), h2g->info.tail); return 0; + +corrupted: + CT_DEAD(ct, &ct->ctbs.h2g, H2G_WRITE); + return -EPIPE; } /* @@ -720,7 +834,6 @@ static int guc_ct_send_locked(struct xe_guc_ct *ct, const u32 *action, u32 len, { struct xe_device *xe = ct_to_xe(ct); struct xe_gt *gt = ct_to_gt(ct); - struct drm_printer p = xe_gt_info_printer(gt); unsigned int sleep_period_ms = 1; int ret; @@ -773,8 +886,13 @@ static int guc_ct_send_locked(struct xe_guc_ct *ct, const u32 *action, u32 len, goto broken; #undef g2h_avail - if (dequeue_one_g2h(ct) < 0) + ret = dequeue_one_g2h(ct); + if (ret < 0) { + if (ret != -ECANCELED) + xe_gt_err(ct_to_gt(ct), "CTB receive failed (%pe)", + ERR_PTR(ret)); goto broken; + } goto try_again; } @@ -783,8 +901,7 @@ static int guc_ct_send_locked(struct xe_guc_ct *ct, const u32 *action, u32 len, broken: xe_gt_err(gt, "No forward process on H2G, reset required\n"); - xe_guc_ct_print(ct, &p, true); - ct->ctbs.h2g.info.broken = true; + CT_DEAD(ct, &ct->ctbs.h2g, DEADLOCK); return -EDEADLK; } @@ -1011,6 +1128,7 @@ static int parse_g2h_response(struct xe_guc_ct *ct, u32 *msg, u32 len) else xe_gt_err(gt, "unexpected response %u for FAST_REQ H2G fence 0x%x!\n", type, fence); + CT_DEAD(ct, NULL, PARSE_G2H_RESPONSE); return -EPROTO; } @@ -1019,8 +1137,9 @@ static int parse_g2h_response(struct xe_guc_ct *ct, u32 *msg, u32 len) if (unlikely(!g2h_fence)) { /* Don't tear down channel, as send could've timed out */ xe_gt_warn(gt, "G2H fence (%u) not found!\n", fence); + CT_DEAD(ct, NULL, PARSE_G2H_UNKNOWN); g2h_release_space(ct, GUC_CTB_HXG_MSG_MAX_LEN); - return 0; + return -EPROTO; } xe_gt_assert(gt, fence == g2h_fence->seqno); @@ -1062,7 +1181,7 @@ static int parse_g2h_msg(struct xe_guc_ct *ct, u32 *msg, u32 len) if (unlikely(origin != GUC_HXG_ORIGIN_GUC)) { xe_gt_err(gt, "G2H channel broken on read, origin=%u, reset required\n", origin); - ct->ctbs.g2h.info.broken = true; + CT_DEAD(ct, &ct->ctbs.g2h, PARSE_G2H_ORIGIN); return -EPROTO; } @@ -1080,7 +1199,7 @@ static int parse_g2h_msg(struct xe_guc_ct *ct, u32 *msg, u32 len) default: xe_gt_err(gt, "G2H channel broken on read, type=%u, reset required\n", type); - ct->ctbs.g2h.info.broken = true; + CT_DEAD(ct, &ct->ctbs.g2h, PARSE_G2H_TYPE); ret = -EOPNOTSUPP; } @@ -1157,9 +1276,11 @@ static int process_g2h_msg(struct xe_guc_ct *ct, u32 *msg, u32 len) xe_gt_err(gt, "unexpected G2H action 0x%04x\n", action); } - if (ret) + if (ret) { xe_gt_err(gt, "G2H action 0x%04x failed (%pe)\n", action, ERR_PTR(ret)); + CT_DEAD(ct, NULL, PROCESS_FAILED); + } return 0; } @@ -1169,7 +1290,7 @@ static int g2h_read(struct xe_guc_ct *ct, u32 *msg, bool fast_path) struct xe_device *xe = ct_to_xe(ct); struct xe_gt *gt = ct_to_gt(ct); struct guc_ctb *g2h = &ct->ctbs.g2h; - u32 tail, head, len; + u32 tail, head, len, desc_status; s32 avail; u32 action; u32 *hxg; @@ -1188,6 +1309,50 @@ static int g2h_read(struct xe_guc_ct *ct, u32 *msg, bool fast_path) xe_gt_assert(gt, xe_guc_ct_enabled(ct)); + desc_status = desc_read(xe, g2h, status); + if (desc_status) { + if (desc_status & GUC_CTB_STATUS_DISABLED) { + /* + * Potentially valid if a CLIENT_RESET request resulted in + * contexts/engines being reset. But should never happen as + * no contexts should be active when CLIENT_RESET is sent. + */ + xe_gt_err(gt, "CT read: unexpected G2H after GuC has stopped!\n"); + desc_status &= ~GUC_CTB_STATUS_DISABLED; + } + + if (desc_status) { + xe_gt_err(gt, "CT read: non-zero status: %u\n", desc_status); + goto corrupted; + } + } + + if (IS_ENABLED(CONFIG_DRM_XE_DEBUG)) { + u32 desc_tail = desc_read(xe, g2h, tail); + u32 desc_head = desc_read(xe, g2h, head); + + if (g2h->info.head != desc_head) { + desc_write(xe, g2h, status, desc_status | GUC_CTB_STATUS_MISMATCH); + xe_gt_err(gt, "CT read: head was modified %u != %u\n", + desc_head, g2h->info.head); + goto corrupted; + } + + if (g2h->info.head > g2h->info.size) { + desc_write(xe, g2h, status, desc_status | GUC_CTB_STATUS_OVERFLOW); + xe_gt_err(gt, "CT read: head out of range: %u vs %u\n", + g2h->info.head, g2h->info.size); + goto corrupted; + } + + if (desc_tail >= g2h->info.size) { + desc_write(xe, g2h, status, desc_status | GUC_CTB_STATUS_OVERFLOW); + xe_gt_err(gt, "CT read: invalid tail offset %u >= %u)\n", + desc_tail, g2h->info.size); + goto corrupted; + } + } + /* Calculate DW available to read */ tail = desc_read(xe, g2h, tail); avail = tail - g2h->info.head; @@ -1204,9 +1369,7 @@ static int g2h_read(struct xe_guc_ct *ct, u32 *msg, bool fast_path) if (len > avail) { xe_gt_err(gt, "G2H channel broken on read, avail=%d, len=%d, reset required\n", avail, len); - g2h->info.broken = true; - - return -EPROTO; + goto corrupted; } head = (g2h->info.head + 1) % g2h->info.size; @@ -1252,6 +1415,10 @@ static int g2h_read(struct xe_guc_ct *ct, u32 *msg, bool fast_path) action, len, g2h->info.head, tail); return len; + +corrupted: + CT_DEAD(ct, &ct->ctbs.g2h, G2H_READ); + return -EPROTO; } static void g2h_fast_path(struct xe_guc_ct *ct, u32 *msg, u32 len) @@ -1278,9 +1445,11 @@ static void g2h_fast_path(struct xe_guc_ct *ct, u32 *msg, u32 len) xe_gt_warn(gt, "NOT_POSSIBLE"); } - if (ret) + if (ret) { xe_gt_err(gt, "G2H action 0x%04x failed (%pe)\n", action, ERR_PTR(ret)); + CT_DEAD(ct, NULL, FAST_G2H); + } } /** @@ -1340,7 +1509,6 @@ static int dequeue_one_g2h(struct xe_guc_ct *ct) static void receive_g2h(struct xe_guc_ct *ct) { - struct xe_gt *gt = ct_to_gt(ct); bool ongoing; int ret; @@ -1377,9 +1545,8 @@ static void receive_g2h(struct xe_guc_ct *ct) mutex_unlock(&ct->lock); if (unlikely(ret == -EPROTO || ret == -EOPNOTSUPP)) { - struct drm_printer p = xe_gt_info_printer(gt); - - xe_guc_ct_print(ct, &p, false); + xe_gt_err(ct_to_gt(ct), "CT dequeue failed: %d", ret); + CT_DEAD(ct, NULL, G2H_RECV); kick_reset(ct); } } while (ret == 1); @@ -1407,9 +1574,8 @@ static void guc_ctb_snapshot_capture(struct xe_device *xe, struct guc_ctb *ctb, snapshot->cmds = kmalloc_array(ctb->info.size, sizeof(u32), atomic ? GFP_ATOMIC : GFP_KERNEL); - if (!snapshot->cmds) { - drm_err(&xe->drm, "Skipping CTB commands snapshot. Only CTB info will be available.\n"); + drm_err(&xe->drm, "Skipping CTB commands snapshot. Only CT info will be available.\n"); return; } @@ -1490,7 +1656,7 @@ struct xe_guc_ct_snapshot *xe_guc_ct_snapshot_capture(struct xe_guc_ct *ct, atomic ? GFP_ATOMIC : GFP_KERNEL); if (!snapshot) { - drm_err(&xe->drm, "Skipping CTB snapshot entirely.\n"); + xe_gt_err(ct_to_gt(ct), "Skipping CTB snapshot entirely.\n"); return NULL; } @@ -1554,16 +1720,114 @@ void xe_guc_ct_snapshot_free(struct xe_guc_ct_snapshot *snapshot) * xe_guc_ct_print - GuC CT Print. * @ct: GuC CT. * @p: drm_printer where it will be printed out. - * @atomic: Boolean to indicate if this is called from atomic context like - * reset or CTB handler or from some regular path like debugfs. * * This function quickly capture a snapshot and immediately print it out. */ -void xe_guc_ct_print(struct xe_guc_ct *ct, struct drm_printer *p, bool atomic) +void xe_guc_ct_print(struct xe_guc_ct *ct, struct drm_printer *p) { struct xe_guc_ct_snapshot *snapshot; - snapshot = xe_guc_ct_snapshot_capture(ct, atomic); + snapshot = xe_guc_ct_snapshot_capture(ct, false); xe_guc_ct_snapshot_print(snapshot, p); xe_guc_ct_snapshot_free(snapshot); } + +#if IS_ENABLED(CONFIG_DRM_XE_DEBUG) +static void ct_dead_capture(struct xe_guc_ct *ct, struct guc_ctb *ctb, u32 reason_code) +{ + struct xe_guc_log_snapshot *snapshot_log; + struct xe_guc_ct_snapshot *snapshot_ct; + struct xe_guc *guc = ct_to_guc(ct); + unsigned long flags; + bool have_capture; + + if (ctb) + ctb->info.broken = true; + + /* Ignore further errors after the first dump until a reset */ + if (ct->dead.reported) + return; + + spin_lock_irqsave(&ct->dead.lock, flags); + + /* And only capture one dump at a time */ + have_capture = ct->dead.reason & CT_DEAD_STATE_CAPTURE; + ct->dead.reason |= (1 << reason_code) | + (1 << CT_DEAD_STATE_CAPTURE); + + spin_unlock_irqrestore(&ct->dead.lock, flags); + + if (have_capture) + return; + + snapshot_log = xe_guc_log_snapshot_capture(&guc->log, true); + snapshot_ct = xe_guc_ct_snapshot_capture((ct), true); + + spin_lock_irqsave(&ct->dead.lock, flags); + + if (ct->dead.snapshot_log || ct->dead.snapshot_ct) { + xe_gt_err(ct_to_gt(ct), "Got unexpected dead CT capture!\n"); + xe_guc_log_snapshot_free(snapshot_log); + xe_guc_ct_snapshot_free(snapshot_ct); + } else { + ct->dead.snapshot_log = snapshot_log; + ct->dead.snapshot_ct = snapshot_ct; + } + + spin_unlock_irqrestore(&ct->dead.lock, flags); + + queue_work(system_unbound_wq, &(ct)->dead.worker); +} + +static void ct_dead_print(struct xe_dead_ct *dead) +{ + struct xe_guc_ct *ct = container_of(dead, struct xe_guc_ct, dead); + struct xe_device *xe = ct_to_xe(ct); + struct xe_gt *gt = ct_to_gt(ct); + static int g_count; + struct drm_printer ip = xe_gt_info_printer(gt); + struct drm_printer lp = drm_line_printer(&ip, "Capture", ++g_count); + + if (!dead->reason) { + xe_gt_err(gt, "CTB is dead for no reason!?\n"); + return; + } + + drm_printf(&lp, "CTB is dead - reason=0x%X\n", dead->reason); + + /* Can't generate a genuine core dump at this point, so just do the good bits */ + drm_printf(&lp, "**** Xe Device Coredump ****\n"); + xe_device_snapshot_print(xe, &lp); + drm_printf(&lp, "**** GuC Log ****\n"); + xe_guc_log_snapshot_print(dead->snapshot_log, &lp); + drm_printf(&lp, "**** GuC CT ****\n"); + xe_guc_ct_snapshot_print(dead->snapshot_ct, &lp); + + drm_printf(&lp, "Done.\n"); +} + +static void ct_dead_worker_func(struct work_struct *w) +{ + struct xe_guc_ct *ct = container_of(w, struct xe_guc_ct, dead.worker); + + if (!ct->dead.reported) { + ct->dead.reported = true; + ct_dead_print(&ct->dead); + } + + spin_lock_irq(&ct->dead.lock); + + xe_guc_log_snapshot_free(ct->dead.snapshot_log); + ct->dead.snapshot_log = NULL; + xe_guc_ct_snapshot_free(ct->dead.snapshot_ct); + ct->dead.snapshot_ct = NULL; + + if (ct->dead.reason & CT_DEAD_STATE_REARM) { + /* A reset has occurred so re-arm the error reporting */ + ct->dead.reason = 0; + ct->dead.reported = false; + } + + spin_unlock_irq(&ct->dead.lock); +} +#endif diff --git a/drivers/gpu/drm/xe/xe_guc_ct.h b/drivers/gpu/drm/xe/xe_guc_ct.h index 190202fce2d0..293041bed7ed 100644 --- a/drivers/gpu/drm/xe/xe_guc_ct.h +++ b/drivers/gpu/drm/xe/xe_guc_ct.h @@ -21,7 +21,7 @@ xe_guc_ct_snapshot_capture(struct xe_guc_ct *ct, bool atomic); void xe_guc_ct_snapshot_print(struct xe_guc_ct_snapshot *snapshot, struct drm_printer *p); void xe_guc_ct_snapshot_free(struct xe_guc_ct_snapshot *snapshot); -void xe_guc_ct_print(struct xe_guc_ct *ct, struct drm_printer *p, bool atomic); +void xe_guc_ct_print(struct xe_guc_ct *ct, struct drm_printer *p); static inline bool xe_guc_ct_enabled(struct xe_guc_ct *ct) { diff --git a/drivers/gpu/drm/xe/xe_guc_ct_types.h b/drivers/gpu/drm/xe/xe_guc_ct_types.h index 761cb9031298..85e127ec91d7 100644 --- a/drivers/gpu/drm/xe/xe_guc_ct_types.h +++ b/drivers/gpu/drm/xe/xe_guc_ct_types.h @@ -86,6 +86,24 @@ enum xe_guc_ct_state { XE_GUC_CT_STATE_ENABLED, }; +#if IS_ENABLED(CONFIG_DRM_XE_DEBUG) +/** struct xe_dead_ct - Information for debugging a dead CT */ +struct xe_dead_ct { + /** @lock: protects memory allocation/free operations, and @reason updates */ + spinlock_t lock; + /** @reason: bit mask of CT_DEAD_* reason codes */ + unsigned int reason; + /** @reported: for preventing multiple dumps per error sequence */ + bool reported; + /** @worker: worker thread to get out of interrupt context before dumping */ + struct work_struct worker; + /** snapshot_ct: copy of CT state and CTB content at point of error */ + struct xe_guc_ct_snapshot *snapshot_ct; + /** snapshot_log: copy of GuC log at point of error */ + struct xe_guc_log_snapshot *snapshot_log; +}; +#endif + /** * struct xe_guc_ct - GuC command transport (CT) layer * @@ -128,6 +146,11 @@ struct xe_guc_ct { u32 msg[GUC_CTB_MSG_MAX_LEN]; /** @fast_msg: Message buffer */ u32 fast_msg[GUC_CTB_MSG_MAX_LEN]; + +#if IS_ENABLED(CONFIG_DRM_XE_DEBUG) + /** @dead: information for debugging dead CTs */ + struct xe_dead_ct dead; +#endif }; #endif From patchwork Fri Sep 20 03:20:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Harrison X-Patchwork-Id: 13808097 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id DD958CF58C1 for ; Fri, 20 Sep 2024 03:20:27 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 2528E10E79C; Fri, 20 Sep 2024 03:20:15 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="W10wzMCP"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) by gabe.freedesktop.org (Postfix) with ESMTPS id 4318A10E77B; Fri, 20 Sep 2024 03:20:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1726802409; x=1758338409; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=XEdILZGFHVVr2nRvhsvgVC2PkstSC+lh/vnz6+dBdp4=; b=W10wzMCPUqDOnieSX8UboSp/xrp9puWwzBM+XSLVKudxDgfjYBB3UELz b6NKUSQdLDL/ki5VIVafl3NLQjvdMdcO+nooVsWto+itTEy2xVP5qzXO2 ctdPZZ/uIVq/t8ACBv/SQpcHAZIWa642jOeJITBmnKnvEXRwgGwFzazjL Lpk9j1YLiMSa770HcQOe1vWvCW5mkA/bNagMsYXWCtJmUpzExNgyzcwv6 U9sI+UYi6myfdz47dNU5sKzjSBD3D4+nlyp/ngcYjLQ7KlNCtm+hzDHof o8pygqHxEju2tmVzXb4gO71H7Mbc0og597Kwp3Q3TtMoAvtjM68hnz2lT g==; X-CSE-ConnectionGUID: q0uZBQezQqecX8VxHE95Iw== X-CSE-MsgGUID: U4dhAq/RTTCxDf3v6TVRpg== X-IronPort-AV: E=McAfee;i="6700,10204,11200"; a="25269808" X-IronPort-AV: E=Sophos;i="6.10,243,1719903600"; d="scan'208";a="25269808" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Sep 2024 20:20:09 -0700 X-CSE-ConnectionGUID: Vtwj4PwwRFSC3k2/f3dT6g== X-CSE-MsgGUID: xhuX+ypRTXm6a/PQ3KKFjw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,243,1719903600"; d="scan'208";a="69746194" Received: from relo-linux-5.jf.intel.com ([10.165.21.152]) by fmviesa007.fm.intel.com with ESMTP; 19 Sep 2024 20:20:09 -0700 From: John.C.Harrison@Intel.com To: Intel-GFX@Lists.FreeDesktop.Org Cc: DRI-Devel@Lists.FreeDesktop.Org, John Harrison , Julia Filipchuk Subject: [PATCH v8 09/11] drm/xe/guc: Dump entire CTB on errors Date: Thu, 19 Sep 2024 20:20:04 -0700 Message-ID: <20240920032007.629624-10-John.C.Harrison@Intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20240920032007.629624-1-John.C.Harrison@Intel.com> References: <20240920032007.629624-1-John.C.Harrison@Intel.com> MIME-Version: 1.0 Organization: Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: John Harrison The dump of the CT buffers was only showing the unprocessed data which is not generally useful for saying why a hang occurred - because it was probably caused by the commands that were just processed. So save and dump the entire buffer but in a more compact dump format. Also zero fill it on allocation to avoid confusion over uninitialised data in the dump. v2: Add kerneldoc - review feedback from Michal W. v3: Fix kerneldoc. v4: Use ascii85 instead of hexdump (review feedback from Matthew B). v5: Dump the entire CTB object rather than separately dumping just the H2G and G2H sections. That way it includes the full header info. Signed-off-by: John Harrison Reviewed-by: Julia Filipchuk --- drivers/gpu/drm/xe/xe_guc_ct.c | 94 ++++++++++------------------ drivers/gpu/drm/xe/xe_guc_ct.h | 8 +-- drivers/gpu/drm/xe/xe_guc_ct_types.h | 6 +- 3 files changed, 41 insertions(+), 67 deletions(-) diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c index 90701fd465c9..5546d4f87ebb 100644 --- a/drivers/gpu/drm/xe/xe_guc_ct.c +++ b/drivers/gpu/drm/xe/xe_guc_ct.c @@ -17,6 +17,7 @@ #include "abi/guc_actions_sriov_abi.h" #include "abi/guc_klvs_abi.h" #include "xe_bo.h" +#include "xe_devcoredump.h" #include "xe_device.h" #include "xe_gt.h" #include "xe_gt_pagefault.h" @@ -435,6 +436,7 @@ int xe_guc_ct_enable(struct xe_guc_ct *ct) xe_gt_assert(gt, !xe_guc_ct_enabled(ct)); + xe_map_memset(xe, &ct->bo->vmap, 0, 0, ct->bo->size); guc_ct_ctb_h2g_init(xe, &ct->ctbs.h2g, &ct->bo->vmap); guc_ct_ctb_g2h_init(xe, &ct->ctbs.g2h, &ct->bo->vmap); @@ -1562,48 +1564,33 @@ static void g2h_worker_func(struct work_struct *w) receive_g2h(ct); } -static void guc_ctb_snapshot_capture(struct xe_device *xe, struct guc_ctb *ctb, - struct guc_ctb_snapshot *snapshot, - bool atomic) +struct xe_guc_ct_snapshot *xe_guc_ct_snapshot_alloc(struct xe_guc_ct *ct, bool atomic) { - u32 head, tail; + struct xe_guc_ct_snapshot *snapshot; - xe_map_memcpy_from(xe, &snapshot->desc, &ctb->desc, 0, - sizeof(struct guc_ct_buffer_desc)); - memcpy(&snapshot->info, &ctb->info, sizeof(struct guc_ctb_info)); + snapshot = kzalloc(sizeof(*snapshot), atomic ? GFP_ATOMIC : GFP_KERNEL); + if (!snapshot) + return NULL; - snapshot->cmds = kmalloc_array(ctb->info.size, sizeof(u32), - atomic ? GFP_ATOMIC : GFP_KERNEL); - if (!snapshot->cmds) { - drm_err(&xe->drm, "Skipping CTB commands snapshot. Only CT info will be available.\n"); - return; + if (ct->bo) { + snapshot->ctb_size = ct->bo->size; + snapshot->ctb = kmalloc(snapshot->ctb_size, atomic ? GFP_ATOMIC : GFP_KERNEL); } - head = snapshot->desc.head; - tail = snapshot->desc.tail; - - if (head != tail) { - struct iosys_map map = - IOSYS_MAP_INIT_OFFSET(&ctb->cmds, head * sizeof(u32)); - - while (head != tail) { - snapshot->cmds[head] = xe_map_rd(xe, &map, 0, u32); - ++head; - if (head == ctb->info.size) { - head = 0; - map = ctb->cmds; - } else { - iosys_map_incr(&map, sizeof(u32)); - } - } - } + return snapshot; +} + +static void guc_ctb_snapshot_capture(struct xe_device *xe, struct guc_ctb *ctb, + struct guc_ctb_snapshot *snapshot) +{ + xe_map_memcpy_from(xe, &snapshot->desc, &ctb->desc, 0, + sizeof(struct guc_ct_buffer_desc)); + memcpy(&snapshot->info, &ctb->info, sizeof(struct guc_ctb_info)); } static void guc_ctb_snapshot_print(struct guc_ctb_snapshot *snapshot, struct drm_printer *p) { - u32 head, tail; - drm_printf(p, "\tsize: %d\n", snapshot->info.size); drm_printf(p, "\tresv_space: %d\n", snapshot->info.resv_space); drm_printf(p, "\thead: %d\n", snapshot->info.head); @@ -1613,25 +1600,6 @@ static void guc_ctb_snapshot_print(struct guc_ctb_snapshot *snapshot, drm_printf(p, "\thead (memory): %d\n", snapshot->desc.head); drm_printf(p, "\ttail (memory): %d\n", snapshot->desc.tail); drm_printf(p, "\tstatus (memory): 0x%x\n", snapshot->desc.status); - - if (!snapshot->cmds) - return; - - head = snapshot->desc.head; - tail = snapshot->desc.tail; - - while (head != tail) { - drm_printf(p, "\tcmd[%d]: 0x%08x\n", head, - snapshot->cmds[head]); - ++head; - if (head == snapshot->info.size) - head = 0; - } -} - -static void guc_ctb_snapshot_free(struct guc_ctb_snapshot *snapshot) -{ - kfree(snapshot->cmds); } /** @@ -1652,9 +1620,7 @@ struct xe_guc_ct_snapshot *xe_guc_ct_snapshot_capture(struct xe_guc_ct *ct, struct xe_device *xe = ct_to_xe(ct); struct xe_guc_ct_snapshot *snapshot; - snapshot = kzalloc(sizeof(*snapshot), - atomic ? GFP_ATOMIC : GFP_KERNEL); - + snapshot = xe_guc_ct_snapshot_alloc(ct, atomic); if (!snapshot) { xe_gt_err(ct_to_gt(ct), "Skipping CTB snapshot entirely.\n"); return NULL; @@ -1663,12 +1629,13 @@ struct xe_guc_ct_snapshot *xe_guc_ct_snapshot_capture(struct xe_guc_ct *ct, if (xe_guc_ct_enabled(ct) || ct->state == XE_GUC_CT_STATE_STOPPED) { snapshot->ct_enabled = true; snapshot->g2h_outstanding = READ_ONCE(ct->g2h_outstanding); - guc_ctb_snapshot_capture(xe, &ct->ctbs.h2g, - &snapshot->h2g, atomic); - guc_ctb_snapshot_capture(xe, &ct->ctbs.g2h, - &snapshot->g2h, atomic); + guc_ctb_snapshot_capture(xe, &ct->ctbs.h2g, &snapshot->h2g); + guc_ctb_snapshot_capture(xe, &ct->ctbs.g2h, &snapshot->g2h); } + if (ct->bo && snapshot->ctb) + xe_map_memcpy_from(xe, snapshot->ctb, &ct->bo->vmap, 0, snapshot->ctb_size); + return snapshot; } @@ -1691,9 +1658,15 @@ void xe_guc_ct_snapshot_print(struct xe_guc_ct_snapshot *snapshot, drm_puts(p, "G2H CTB (all sizes in DW):\n"); guc_ctb_snapshot_print(&snapshot->g2h, p); - drm_printf(p, "\tg2h outstanding: %d\n", snapshot->g2h_outstanding); + + if (snapshot->ctb) { + xe_print_blob_ascii85(p, "CTB data", snapshot->ctb, 0, snapshot->ctb_size); + } else { + drm_printf(p, "CTB snapshot missing!\n"); + return; + } } else { drm_puts(p, "CT disabled\n"); } @@ -1711,8 +1684,7 @@ void xe_guc_ct_snapshot_free(struct xe_guc_ct_snapshot *snapshot) if (!snapshot) return; - guc_ctb_snapshot_free(&snapshot->h2g); - guc_ctb_snapshot_free(&snapshot->g2h); + kfree(snapshot->ctb); kfree(snapshot); } diff --git a/drivers/gpu/drm/xe/xe_guc_ct.h b/drivers/gpu/drm/xe/xe_guc_ct.h index 293041bed7ed..338f0b75d29f 100644 --- a/drivers/gpu/drm/xe/xe_guc_ct.h +++ b/drivers/gpu/drm/xe/xe_guc_ct.h @@ -9,6 +9,7 @@ #include "xe_guc_ct_types.h" struct drm_printer; +struct xe_device; int xe_guc_ct_init(struct xe_guc_ct *ct); int xe_guc_ct_enable(struct xe_guc_ct *ct); @@ -16,10 +17,9 @@ void xe_guc_ct_disable(struct xe_guc_ct *ct); void xe_guc_ct_stop(struct xe_guc_ct *ct); void xe_guc_ct_fast_path(struct xe_guc_ct *ct); -struct xe_guc_ct_snapshot * -xe_guc_ct_snapshot_capture(struct xe_guc_ct *ct, bool atomic); -void xe_guc_ct_snapshot_print(struct xe_guc_ct_snapshot *snapshot, - struct drm_printer *p); +struct xe_guc_ct_snapshot *xe_guc_ct_snapshot_alloc(struct xe_guc_ct *ct, bool atomic); +struct xe_guc_ct_snapshot *xe_guc_ct_snapshot_capture(struct xe_guc_ct *ct, bool atomic); +void xe_guc_ct_snapshot_print(struct xe_guc_ct_snapshot *snapshot, struct drm_printer *p); void xe_guc_ct_snapshot_free(struct xe_guc_ct_snapshot *snapshot); void xe_guc_ct_print(struct xe_guc_ct *ct, struct drm_printer *p); diff --git a/drivers/gpu/drm/xe/xe_guc_ct_types.h b/drivers/gpu/drm/xe/xe_guc_ct_types.h index 85e127ec91d7..8e1b9d981d61 100644 --- a/drivers/gpu/drm/xe/xe_guc_ct_types.h +++ b/drivers/gpu/drm/xe/xe_guc_ct_types.h @@ -52,8 +52,6 @@ struct guc_ctb { struct guc_ctb_snapshot { /** @desc: snapshot of the CTB descriptor */ struct guc_ct_buffer_desc desc; - /** @cmds: snapshot of the CTB commands */ - u32 *cmds; /** @info: snapshot of the CTB info */ struct guc_ctb_info info; }; @@ -70,6 +68,10 @@ struct xe_guc_ct_snapshot { struct guc_ctb_snapshot g2h; /** @h2g: H2G CTB snapshot */ struct guc_ctb_snapshot h2g; + /** @ctb_size: size of the snapshot of the CTB */ + size_t ctb_size; + /** @ctb: snapshot of the entire CTB */ + u32 *ctb; }; /** From patchwork Fri Sep 20 03:20:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Harrison X-Patchwork-Id: 13808093 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id BE12FCF58C3 for ; Fri, 20 Sep 2024 03:20:23 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id A41CC10E791; Fri, 20 Sep 2024 03:20:12 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="RVFnUV0M"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) by gabe.freedesktop.org (Postfix) with ESMTPS id 63F8A10E77E; Fri, 20 Sep 2024 03:20:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1726802409; x=1758338409; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ZX08GJ+B7S70HH6Udtlg1KJsdNtorenXuuQa0d4IjK4=; b=RVFnUV0M2j0o152v1cn7tUhEQxHCzKQnPBrSQgj1Pl7LPcLNk/jhFC04 vrGqzmaj7V84E85KsjiyU+9EZOCqdmU6GLoFsMGCiOfiRDOKRDt6YQ1BQ ZJYDMSE+FZWtp2CHBu4mbTttSFs/y0iEMLBdgEC/2SBlQ4W5NoY4gl8vB VtWsQn1Qjq6xohCNNCrljYFSVhocw/R11hh6zYiQ4h6SrkblTQ/LTecmg l6hjVeAMtA+7Y2eQLLrHOYwseaNBPrrXHouY4LKWS9L+urpo10nnH09AM SaOziOUr2ikTbqyhKEq/cjvZ/KdCtecQiX5npzL+uMp0eKFIqRkx64hMG g==; X-CSE-ConnectionGUID: GJjmQ0/pSQmnFrpYR9MdEg== X-CSE-MsgGUID: fsuL/8zvRTabyjp/OjGIIw== X-IronPort-AV: E=McAfee;i="6700,10204,11200"; a="25269809" X-IronPort-AV: E=Sophos;i="6.10,243,1719903600"; d="scan'208";a="25269809" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Sep 2024 20:20:09 -0700 X-CSE-ConnectionGUID: NDf72ZJoRG+HFgsk0o1QZg== X-CSE-MsgGUID: DJkHyfMwQ2e33GYgJSTfWw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,243,1719903600"; d="scan'208";a="69746197" Received: from relo-linux-5.jf.intel.com ([10.165.21.152]) by fmviesa007.fm.intel.com with ESMTP; 19 Sep 2024 20:20:09 -0700 From: John.C.Harrison@Intel.com To: Intel-GFX@Lists.FreeDesktop.Org Cc: DRI-Devel@Lists.FreeDesktop.Org, John Harrison , Julia Filipchuk Subject: [PATCH v8 10/11] drm/xe/guc: Add GuC log to devcoredump captures Date: Thu, 19 Sep 2024 20:20:05 -0700 Message-ID: <20240920032007.629624-11-John.C.Harrison@Intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20240920032007.629624-1-John.C.Harrison@Intel.com> References: <20240920032007.629624-1-John.C.Harrison@Intel.com> MIME-Version: 1.0 Organization: Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: John Harrison Include the GuC log in devcoredump captures because they can be useful with debugging certain types of bug. v2: Fix kerneldoc v3: Drop module parameter as now using more compact ascii85 encoding rather than hexdump (although still not compressed) (review feedback from Matthew B). Rebase onto recent refactoring of devcoredump code. v4: Don't move the submission snapshot inside the GuC internals structure 'cos it really doesn't belong there. Signed-off-by: John Harrison Reviewed-by: Julia Filipchuk --- drivers/gpu/drm/xe/xe_devcoredump.c | 16 ++++++++++++---- drivers/gpu/drm/xe/xe_devcoredump_types.h | 10 +++++++--- 2 files changed, 19 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c b/drivers/gpu/drm/xe/xe_devcoredump.c index 0884c49942fe..a2d816713489 100644 --- a/drivers/gpu/drm/xe/xe_devcoredump.c +++ b/drivers/gpu/drm/xe/xe_devcoredump.c @@ -18,8 +18,10 @@ #include "xe_gt.h" #include "xe_gt_printk.h" #include "xe_guc_ct.h" +#include "xe_guc_log.h" #include "xe_guc_submit.h" #include "xe_hw_engine.h" +#include "xe_module.h" #include "xe_sched_job.h" #include "xe_vm.h" @@ -100,8 +102,10 @@ static ssize_t __xe_devcoredump_read(char *buffer, size_t count, drm_printf(&p, "\n**** GT #%d ****\n", ss->gt->info.id); drm_printf(&p, "\tTile: %d\n", ss->gt->tile->id); + drm_puts(&p, "\n**** GuC Log ****\n"); + xe_guc_log_snapshot_print(ss->guc.log, &p); drm_puts(&p, "\n**** GuC CT ****\n"); - xe_guc_ct_snapshot_print(ss->ct, &p); + xe_guc_ct_snapshot_print(ss->guc.ct, &p); drm_puts(&p, "\n**** Contexts ****\n"); xe_guc_exec_queue_snapshot_print(ss->ge, &p); @@ -124,8 +128,11 @@ static void xe_devcoredump_snapshot_free(struct xe_devcoredump_snapshot *ss) { int i; - xe_guc_ct_snapshot_free(ss->ct); - ss->ct = NULL; + xe_guc_log_snapshot_free(ss->guc.log); + ss->guc.log = NULL; + + xe_guc_ct_snapshot_free(ss->guc.ct); + ss->guc.ct = NULL; xe_guc_exec_queue_snapshot_free(ss->ge); ss->ge = NULL; @@ -253,7 +260,8 @@ static void devcoredump_snapshot(struct xe_devcoredump *coredump, if (xe_force_wake_get(gt_to_fw(q->gt), XE_FORCEWAKE_ALL)) xe_gt_info(ss->gt, "failed to get forcewake for coredump capture\n"); - ss->ct = xe_guc_ct_snapshot_capture(&guc->ct, true); + ss->guc.log = xe_guc_log_snapshot_capture(&guc->log, true); + ss->guc.ct = xe_guc_ct_snapshot_capture(&guc->ct, true); ss->ge = xe_guc_exec_queue_snapshot_capture(q); ss->job = xe_sched_job_snapshot_capture(job); ss->vm = xe_vm_snapshot_capture(q->vm); diff --git a/drivers/gpu/drm/xe/xe_devcoredump_types.h b/drivers/gpu/drm/xe/xe_devcoredump_types.h index 3cc2f095fdfb..06ac75ce63dd 100644 --- a/drivers/gpu/drm/xe/xe_devcoredump_types.h +++ b/drivers/gpu/drm/xe/xe_devcoredump_types.h @@ -34,9 +34,13 @@ struct xe_devcoredump_snapshot { /** @work: Workqueue for deferred capture outside of signaling context */ struct work_struct work; - /* GuC snapshots */ - /** @ct: GuC CT snapshot */ - struct xe_guc_ct_snapshot *ct; + /** @guc: GuC snapshots */ + struct { + /** @guc.ct: GuC CT snapshot */ + struct xe_guc_ct_snapshot *ct; + /** @guc.log: GuC log snapshot */ + struct xe_guc_log_snapshot *log; + } guc; /** @ge: GuC Submission Engine snapshot */ struct xe_guc_submit_exec_queue_snapshot *ge; From patchwork Fri Sep 20 03:20:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Harrison X-Patchwork-Id: 13808092 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D9750CF58C0 for ; Fri, 20 Sep 2024 03:20:21 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 487D810E78B; Fri, 20 Sep 2024 03:20:12 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="ITxe1YUI"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) by gabe.freedesktop.org (Postfix) with ESMTPS id 830A410E77B; Fri, 20 Sep 2024 03:20:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1726802409; x=1758338409; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=svjqtNycWUjcFbHa53MfFoJaAUhKPxpQkMolZrtJ/WM=; b=ITxe1YUItpSKY1+uGE4L8bUXcxmo5uoTy8dYyADcVNnkXsfYq8CwB25V kE/S/gJsYzyZ03S0qPre+7fraMwNRAsT1c2lWBe2lBeM0IFNHGiRS5Dkl p2NJJcNO47NmyjzEPZA+b6AgsmpPyIbZxXvN2g1xM5oLFGGm+maoKhaht TLppt95LFoyGBq2k0M18xO8ywlVBmU5FvVqn5bxKIVpxnrTFnjuFMVGPe B2rHY/RiCNhcZPhEKoPL2B3EReXH+yU1uOl3GK8TffvLWAW6rvZxOfQjG 5Np4vb/1hMgw6E+bW/CDFj/6ut1ZHK/J2iCFqGw130ba+jHSjOtdg0jPi g==; X-CSE-ConnectionGUID: woF2EBtQSCOYaXrbAzGXSQ== X-CSE-MsgGUID: /9KuSlPURZaIVdM8irdVSQ== X-IronPort-AV: E=McAfee;i="6700,10204,11200"; a="25269810" X-IronPort-AV: E=Sophos;i="6.10,243,1719903600"; d="scan'208";a="25269810" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Sep 2024 20:20:09 -0700 X-CSE-ConnectionGUID: a6VtKzd7SDeRdLQdVI5DXw== X-CSE-MsgGUID: Qv6b1u+tT7a94gM+qECc4g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,243,1719903600"; d="scan'208";a="69746200" Received: from relo-linux-5.jf.intel.com ([10.165.21.152]) by fmviesa007.fm.intel.com with ESMTP; 19 Sep 2024 20:20:09 -0700 From: John.C.Harrison@Intel.com To: Intel-GFX@Lists.FreeDesktop.Org Cc: DRI-Devel@Lists.FreeDesktop.Org, John Harrison , Julia Filipchuk Subject: [PATCH v8 11/11] drm/xe/guc: Add a helper function for dumping GuC log to dmesg Date: Thu, 19 Sep 2024 20:20:06 -0700 Message-ID: <20240920032007.629624-12-John.C.Harrison@Intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20240920032007.629624-1-John.C.Harrison@Intel.com> References: <20240920032007.629624-1-John.C.Harrison@Intel.com> MIME-Version: 1.0 Organization: Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: John Harrison Create a helper function that can be used to dump the GuC log to dmesg in a manner that is reliable for extraction and decode. The intention is that calls to this can be added by developers when debugging specific issues that require a GuC log but do not allow easy capture of the log - e.g. failures in selftests and failues that lead to kernel hangs. Also note that this is really a temporary stop-gap. The aim is to allow on demand creation and dumping of devcoredump captures (which includes the GuC log and much more). Currently this is not possible as much of the devcoredump code requires a 'struct xe_sched_job' and those are not available at many places that might want to do the dump. v2: Add kerneldoc - review feedback from Michal W. Signed-off-by: John Harrison Reviewed-by: Julia Filipchuk --- drivers/gpu/drm/xe/xe_guc_log.c | 18 ++++++++++++++++++ drivers/gpu/drm/xe/xe_guc_log.h | 1 + 2 files changed, 19 insertions(+) diff --git a/drivers/gpu/drm/xe/xe_guc_log.c b/drivers/gpu/drm/xe/xe_guc_log.c index 24564624e91e..d7be69f20af4 100644 --- a/drivers/gpu/drm/xe/xe_guc_log.c +++ b/drivers/gpu/drm/xe/xe_guc_log.c @@ -214,6 +214,24 @@ void xe_guc_log_snapshot_print(struct xe_guc_log_snapshot *snapshot, struct drm_ } } +/** + * xe_guc_log_print_dmesg - dump a copy of the GuC log to dmesg + * @log: GuC log structure + */ +void xe_guc_log_print_dmesg(struct xe_guc_log *log) +{ + struct xe_gt *gt = log_to_gt(log); + static int g_count; + struct drm_printer ip = xe_gt_info_printer(gt); + struct drm_printer lp = drm_line_printer(&ip, "Capture", ++g_count); + + drm_printf(&lp, "Dumping GuC log for %ps...\n", __builtin_return_address(0)); + + xe_guc_log_print(log, &lp); + + drm_printf(&lp, "Done.\n"); +} + /** * xe_guc_log_print - dump a copy of the GuC log to some useful location * @log: GuC log structure diff --git a/drivers/gpu/drm/xe/xe_guc_log.h b/drivers/gpu/drm/xe/xe_guc_log.h index 949d2c98343d..1fb2fae1f4e1 100644 --- a/drivers/gpu/drm/xe/xe_guc_log.h +++ b/drivers/gpu/drm/xe/xe_guc_log.h @@ -39,6 +39,7 @@ struct xe_device; int xe_guc_log_init(struct xe_guc_log *log); void xe_guc_log_print(struct xe_guc_log *log, struct drm_printer *p); +void xe_guc_log_print_dmesg(struct xe_guc_log *log); struct xe_guc_log_snapshot *xe_guc_log_snapshot_capture(struct xe_guc_log *log, bool atomic); void xe_guc_log_snapshot_print(struct xe_guc_log_snapshot *snapshot, struct drm_printer *p); void xe_guc_log_snapshot_free(struct xe_guc_log_snapshot *snapshot);