From patchwork Fri Oct 20 15:58:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Aravind Iddamsetty X-Patchwork-Id: 13430907 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0ED83C0032E for ; Fri, 20 Oct 2023 15:56:11 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 8178210E0F7; Fri, 20 Oct 2023 15:56:08 +0000 (UTC) Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id 3B88E10E5D3; Fri, 20 Oct 2023 15:56:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1697817362; x=1729353362; h=from:to:subject:date:message-id:in-reply-to:references: mime-version:content-transfer-encoding; bh=Vy/ukkrKY3Bee42OE70DtgvDrpgAoAmgVghNJ2iRz3Y=; b=fCqeU9HSrQFSDAWfgHih3Q5RSKasElsnWc8YgmzRrZoaTgBrEyAhot6S YxYJ1FjXJzVTbCDVRjQXsMZqKi0bu8G3FX2TPLmfktOurlurveJF5A6Ku +4OZXp+uJ1b2ec24ozbrrzSJWVEYk7qHJpQ6eU7SZxfuOx1X1n5SgpCNm v1Zfgf4y3UwEs9LbI/enDoGRITIALbeMAIBnhZZ5Es6ZABJbzQkebfIom dp8L8WqNKdP+70vb2dLQdJWSL8d3MQpGaoTHgd2Xt0mRvbWjoJ8fwL2fU o5PcntxNGqo3iFPgkJCy2AKm1AOnNsZygKX+cm9neJxCMLpv9xphtkpeE A==; X-IronPort-AV: E=McAfee;i="6600,9927,10869"; a="452999993" X-IronPort-AV: E=Sophos;i="6.03,239,1694761200"; d="scan'208";a="452999993" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Oct 2023 08:56:01 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10869"; a="1088778561" X-IronPort-AV: E=Sophos;i="6.03,239,1694761200"; d="scan'208";a="1088778561" Received: from aravind-dev.iind.intel.com ([10.145.162.146]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Oct 2023 08:55:57 -0700 From: Aravind Iddamsetty To: intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org, alexander.deucher@amd.com, airlied@gmail.com, daniel@ffwll.ch, joonas.lahtinen@linux.intel.com, ogabbay@kernel.org, ttayar@habana.ai, Hawking.Zhang@amd.com, Harish.Kasiviswanathan@amd.com, Felix.Kuehling@amd.com, Luben.Tuikov@amd.com, michael.j.ruhl@intel.com Subject: [RFC v4 1/5] drm/netlink: Add netlink infrastructure Date: Fri, 20 Oct 2023 21:28:31 +0530 Message-Id: <20231020155835.1295524-2-aravind.iddamsetty@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231020155835.1295524-1-aravind.iddamsetty@linux.intel.com> References: <20231020155835.1295524-1-aravind.iddamsetty@linux.intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Define the netlink registration interface and commands, attributes that can be commonly used across by drm drivers. This patch intends to use the generic netlink family to expose various stats of device. At present it defines some commands that shall be used to expose RAS error counters. v2: define common interfaces to genl netlink subsystem that all drm drivers can leverage.(Tomer Tayar) v3: drop DRIVER_NETLINK flag and use the driver_genl_ops structure to register to netlink subsystem (Daniel Vetter) v4:(Michael J. Ruhl) 1. rename drm_genl_send to drm_genl_reply 2. catch error from xa_store and handle appropriately Cc: Tomer Tayar Cc: Daniel Vetter Cc: Michael J. Ruhl Signed-off-by: Aravind Iddamsetty Reviewed-by: Michael J. Ruhl --- drivers/gpu/drm/Makefile | 1 + drivers/gpu/drm/drm_drv.c | 7 ++ drivers/gpu/drm/drm_netlink.c | 188 +++++++++++++++++++++++++++++++++ include/drm/drm_device.h | 8 ++ include/drm/drm_drv.h | 7 ++ include/drm/drm_netlink.h | 30 ++++++ include/uapi/drm/drm_netlink.h | 83 +++++++++++++++ 7 files changed, 324 insertions(+) create mode 100644 drivers/gpu/drm/drm_netlink.c create mode 100644 include/drm/drm_netlink.h create mode 100644 include/uapi/drm/drm_netlink.h diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile index ee64c51274ad..60864369adaa 100644 --- a/drivers/gpu/drm/Makefile +++ b/drivers/gpu/drm/Makefile @@ -35,6 +35,7 @@ drm-y := \ drm_mode_object.o \ drm_modes.o \ drm_modeset_lock.o \ + drm_netlink.o \ drm_plane.o \ drm_prime.o \ drm_print.o \ diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c index 535f16e7882e..31f55c1c7524 100644 --- a/drivers/gpu/drm/drm_drv.c +++ b/drivers/gpu/drm/drm_drv.c @@ -937,6 +937,12 @@ int drm_dev_register(struct drm_device *dev, unsigned long flags) if (ret) goto err_minors; + if (driver->genl_ops) { + ret = drm_genl_register(dev); + if (ret) + goto err_minors; + } + ret = create_compat_control_link(dev); if (ret) goto err_minors; @@ -1074,6 +1080,7 @@ static void drm_core_exit(void) { drm_privacy_screen_lookup_exit(); accel_core_exit(); + drm_genl_exit(); unregister_chrdev(DRM_MAJOR, "drm"); debugfs_remove(drm_debugfs_root); drm_sysfs_destroy(); diff --git a/drivers/gpu/drm/drm_netlink.c b/drivers/gpu/drm/drm_netlink.c new file mode 100644 index 000000000000..8add249c1da3 --- /dev/null +++ b/drivers/gpu/drm/drm_netlink.c @@ -0,0 +1,188 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2023 Intel Corporation + */ + +#include +#include +#include +#include +#include +#include + +DEFINE_XARRAY(drm_dev_xarray); + +/** + * drm_genl_reply - response to a request + * @msg: socket buffer + * @info: receiver information + * @usrhdr: pointer to user specific header in the message buffer + * + * RETURNS: + * 0 on success and negative error code on failure + */ +int drm_genl_reply(struct sk_buff *msg, struct genl_info *info, void *usrhdr) +{ + int ret; + + genlmsg_end(msg, usrhdr); + + ret = genlmsg_reply(msg, info); + if (ret) + nlmsg_free(msg); + + return ret; +} +EXPORT_SYMBOL(drm_genl_reply); + +/** + * drm_genl_alloc_msg - allocate genl message buffer + * @dev: drm_device for which the message is being allocated + * @info: receiver information + * @usrhdr: pointer to user specific header in the message buffer + * + * RETURNS: + * pointer to new allocated buffer on success, NULL on failure + */ +struct sk_buff * +drm_genl_alloc_msg(struct drm_device *dev, + struct genl_info *info, + size_t msg_size, void **usrhdr) +{ + struct sk_buff *new_msg; + + new_msg = genlmsg_new(msg_size, GFP_KERNEL); + if (!new_msg) + return new_msg; + + *usrhdr = genlmsg_put_reply(new_msg, info, &dev->drm_genl_family, 0, info->genlhdr->cmd); + if (!*usrhdr) { + nlmsg_free(new_msg); + new_msg = NULL; + } + + return new_msg; +} +EXPORT_SYMBOL(drm_genl_alloc_msg); + +static struct drm_device *genl_to_dev(struct genl_info *info) +{ + return xa_load(&drm_dev_xarray, info->nlhdr->nlmsg_type); +} + +static int drm_genl_list_errors(struct sk_buff *msg, struct genl_info *info) +{ + struct drm_device *dev = genl_to_dev(info); + + if (GENL_REQ_ATTR_CHECK(info, DRM_RAS_ATTR_REQUEST)) + return -EINVAL; + + if (WARN_ON(!dev->driver->genl_ops[info->genlhdr->cmd].doit)) + return -EOPNOTSUPP; + + return dev->driver->genl_ops[info->genlhdr->cmd].doit(dev, msg, info); +} + +static int drm_genl_read_error(struct sk_buff *msg, struct genl_info *info) +{ + struct drm_device *dev = genl_to_dev(info); + + if (GENL_REQ_ATTR_CHECK(info, DRM_RAS_ATTR_ERROR_ID)) + return -EINVAL; + + if (WARN_ON(!dev->driver->genl_ops[info->genlhdr->cmd].doit)) + return -EOPNOTSUPP; + + return dev->driver->genl_ops[info->genlhdr->cmd].doit(dev, msg, info); +} + +/* attribute policies */ +static const struct nla_policy drm_attr_policy_query[DRM_ATTR_MAX + 1] = { + [DRM_RAS_ATTR_REQUEST] = { .type = NLA_U8 }, +}; + +static const struct nla_policy drm_attr_policy_read_one[DRM_ATTR_MAX + 1] = { + [DRM_RAS_ATTR_ERROR_ID] = { .type = NLA_U64 }, +}; + +/* drm genl operations definition */ +const struct genl_ops drm_genl_ops[] = { + { + .cmd = DRM_RAS_CMD_QUERY, + .doit = drm_genl_list_errors, + .policy = drm_attr_policy_query, + }, + { + .cmd = DRM_RAS_CMD_READ_ONE, + .doit = drm_genl_read_error, + .policy = drm_attr_policy_read_one, + }, + { + .cmd = DRM_RAS_CMD_READ_ALL, + .doit = drm_genl_list_errors, + .policy = drm_attr_policy_query, + }, +}; + +static void drm_genl_family_init(struct drm_device *dev) +{ + /* Use drm primary node name eg: card0 to name the genl family */ + snprintf(dev->drm_genl_family.name, sizeof(dev->drm_genl_family.name), "%s", dev->primary->kdev->kobj.name); + dev->drm_genl_family.version = DRM_GENL_VERSION; + dev->drm_genl_family.parallel_ops = true; + dev->drm_genl_family.ops = drm_genl_ops; + dev->drm_genl_family.n_ops = ARRAY_SIZE(drm_genl_ops); + dev->drm_genl_family.maxattr = DRM_ATTR_MAX; + dev->drm_genl_family.module = dev->dev->driver->owner; +} + +static void drm_genl_deregister(struct drm_device *dev, void *arg) +{ + drm_dbg_driver(dev, "unregistering genl family %s\n", dev->drm_genl_family.name); + + xa_erase(&drm_dev_xarray, dev->drm_genl_family.id); + + genl_unregister_family(&dev->drm_genl_family); +} + +/** + * drm_genl_register - Register genl family + * @dev: drm_device for which genl family needs to be registered + * + * RETURNS: + * 0 on success and negative error code on failure + */ +int drm_genl_register(struct drm_device *dev) +{ + int ret; + + drm_genl_family_init(dev); + + ret = genl_register_family(&dev->drm_genl_family); + if (ret < 0) { + drm_warn(dev, "genl family registration failed\n"); + return ret; + } + + drm_dbg_driver(dev, "genl family id %d and name %s\n", dev->drm_genl_family.id, dev->drm_genl_family.name); + + ret = xa_err(xa_store(&drm_dev_xarray, dev->drm_genl_family.id, dev, GFP_KERNEL)); + if (ret) + goto genl_unregister; + + ret = drmm_add_action_or_reset(dev, drm_genl_deregister, NULL); + + return ret; + +genl_unregister: + genl_unregister_family(&dev->drm_genl_family); + return ret; +} + +/** + * drm_genl_exit: destroy drm_dev_xarray + */ +void drm_genl_exit(void) +{ + xa_destroy(&drm_dev_xarray); +} diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h index c490977ee250..d3ae91b7714d 100644 --- a/include/drm/drm_device.h +++ b/include/drm/drm_device.h @@ -8,6 +8,7 @@ #include #include +#include struct drm_driver; struct drm_minor; @@ -318,6 +319,13 @@ struct drm_device { */ struct dentry *debugfs_root; + /** + * @drm_genl_family: + * + * Generic netlink family registration structure. + */ + struct genl_family drm_genl_family; + /* Everything below here is for legacy driver, never use! */ /* private: */ #if IS_ENABLED(CONFIG_DRM_LEGACY) diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h index e2640dc64e08..ebdb7850d235 100644 --- a/include/drm/drm_drv.h +++ b/include/drm/drm_drv.h @@ -434,6 +434,13 @@ struct drm_driver { */ const struct file_operations *fops; + /** + * @genl_ops: + * + * Drivers private callback to genl commands + */ + const struct driver_genl_ops *genl_ops; + #ifdef CONFIG_DRM_LEGACY /* Everything below here is for legacy driver, never use! */ /* private: */ diff --git a/include/drm/drm_netlink.h b/include/drm/drm_netlink.h new file mode 100644 index 000000000000..54527dae7847 --- /dev/null +++ b/include/drm/drm_netlink.h @@ -0,0 +1,30 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2023 Intel Corporation + */ + +#ifndef __DRM_NETLINK_H__ +#define __DRM_NETLINK_H__ + +#include +#include +#include +#include + +struct drm_device; + +struct driver_genl_ops { + int (*doit)(struct drm_device *dev, + struct sk_buff *skb, + struct genl_info *info); +}; + +int drm_genl_register(struct drm_device *dev); +void drm_genl_exit(void); +int drm_genl_reply(struct sk_buff *msg, struct genl_info *info, void *usrhdr); +struct sk_buff * +drm_genl_alloc_msg(struct drm_device *dev, + struct genl_info *info, + size_t msg_size, void **usrhdr); +#endif + diff --git a/include/uapi/drm/drm_netlink.h b/include/uapi/drm/drm_netlink.h new file mode 100644 index 000000000000..aab42147a20e --- /dev/null +++ b/include/uapi/drm/drm_netlink.h @@ -0,0 +1,83 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright 2023 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * VA LINUX SYSTEMS AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + */ + +#ifndef _DRM_NETLINK_H_ +#define _DRM_NETLINK_H_ + +#define DRM_GENL_VERSION 1 + +#if defined(__cplusplus) +extern "C" { +#endif + +/** + * enum drm_genl_error_cmds - Supported error commands + * + */ +enum drm_genl_error_cmds { + DRM_CMD_UNSPEC, + /** @DRM_RAS_CMD_QUERY: Command to list all errors names with config-id */ + DRM_RAS_CMD_QUERY, + /** @DRM_RAS_CMD_READ_ONE: Command to get a counter for a specific error */ + DRM_RAS_CMD_READ_ONE, + /** @DRM_RAS_CMD_READ_ALL: Command to get counters of all errors */ + DRM_RAS_CMD_READ_ALL, + + __DRM_CMD_MAX, + DRM_CMD_MAX = __DRM_CMD_MAX - 1, +}; + +/** + * enum drm_error_attr - Attributes to use with drm_genl_error_cmds + * + */ +enum drm_error_attr { + DRM_ATTR_UNSPEC, + DRM_ATTR_PAD = DRM_ATTR_UNSPEC, + /** + * @DRM_RAS_ATTR_REQUEST: Should be used with DRM_RAS_CMD_QUERY, + * DRM_RAS_CMD_READ_ALL + */ + DRM_RAS_ATTR_REQUEST, /* NLA_U8 */ + /** + * @DRM_RAS_ATTR_QUERY_REPLY: First Nested attributed sent as a + * response to DRM_RAS_CMD_QUERY, DRM_RAS_CMD_READ_ALL commands. + */ + DRM_RAS_ATTR_QUERY_REPLY, /*NLA_NESTED*/ + /** @DRM_RAS_ATTR_ERROR_NAME: Used to pass error name */ + DRM_RAS_ATTR_ERROR_NAME, /* NLA_NUL_STRING */ + /** @DRM_RAS_ATTR_ERROR_ID: Used to pass error id */ + DRM_RAS_ATTR_ERROR_ID, /* NLA_U64 */ + /** @DRM_RAS_ATTR_ERROR_VALUE: Used to pass error value */ + DRM_RAS_ATTR_ERROR_VALUE, /* NLA_U64 */ + + __DRM_ATTR_MAX, + DRM_ATTR_MAX = __DRM_ATTR_MAX - 1, +}; + +#if defined(__cplusplus) +} +#endif + +#endif From patchwork Fri Oct 20 15:58:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Aravind Iddamsetty X-Patchwork-Id: 13430908 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id CA616CDB474 for ; Fri, 20 Oct 2023 15:56:14 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 4ED5C10E5D4; Fri, 20 Oct 2023 15:56:10 +0000 (UTC) Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id 426E310E0F7; Fri, 20 Oct 2023 15:56:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1697817366; x=1729353366; h=from:to:subject:date:message-id:in-reply-to:references: mime-version:content-transfer-encoding; bh=JqCT+/uFqs8dr/7DDg6QssbfdFXWgJeKvvet4LKFO2c=; b=PSjpyu+gru72wpcPPMsKtWjm+8W6maUdlxeu5DSkC70QL4fahCdMik7x JHcyO39InlTRsUZWG6lF7GHPNlYv+v6uxFyHD6NmnP/EI8WoslTGiSL2B mmCiIqUXOm+Emzmu2+8iW5MFGY47/YG2ecKPAYDUH2zOi3bANAeIC19YC hiIF/JDybxRuHOc0ZbM7G5VjisoWz1u5tY60YKjSYn8hAxpfi1LMXPmgc HD2dbVXzYCqio3jiQH9MqPTRzroctWtAEMS1XA8wTRkaxFyRlIXiaOA6A t9G7fC0LbLFXu775j6Hcl7fs/PTL+yZE+HjdW222MmrRMTTWJZA/oBVU3 Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10869"; a="453000007" X-IronPort-AV: E=Sophos;i="6.03,239,1694761200"; d="scan'208";a="453000007" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Oct 2023 08:56:05 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10869"; a="1088778587" X-IronPort-AV: E=Sophos;i="6.03,239,1694761200"; d="scan'208";a="1088778587" Received: from aravind-dev.iind.intel.com ([10.145.162.146]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Oct 2023 08:56:01 -0700 From: Aravind Iddamsetty To: intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org, alexander.deucher@amd.com, airlied@gmail.com, daniel@ffwll.ch, joonas.lahtinen@linux.intel.com, ogabbay@kernel.org, ttayar@habana.ai, Hawking.Zhang@amd.com, Harish.Kasiviswanathan@amd.com, Felix.Kuehling@amd.com, Luben.Tuikov@amd.com, michael.j.ruhl@intel.com Subject: [RFC v2 2/5] drm/xe/RAS: Register netlink capability Date: Fri, 20 Oct 2023 21:28:32 +0530 Message-Id: <20231020155835.1295524-3-aravind.iddamsetty@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231020155835.1295524-1-aravind.iddamsetty@linux.intel.com> References: <20231020155835.1295524-1-aravind.iddamsetty@linux.intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Register netlink capability with the DRM and register the driver callbacks to DRM RAS netlink commands. v2: Move the netlink registration parts to DRM susbsytem (Tomer Tayar) Cc: Tomer Tayar Signed-off-by: Aravind Iddamsetty Reviewed-by: Michael J. Ruhl --- drivers/gpu/drm/xe/Makefile | 1 + drivers/gpu/drm/xe/xe_device.c | 4 ++++ drivers/gpu/drm/xe/xe_device_types.h | 1 + drivers/gpu/drm/xe/xe_netlink.c | 22 ++++++++++++++++++++++ 4 files changed, 28 insertions(+) create mode 100644 drivers/gpu/drm/xe/xe_netlink.c diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile index ed772f440689..048f9a23e2f0 100644 --- a/drivers/gpu/drm/xe/Makefile +++ b/drivers/gpu/drm/xe/Makefile @@ -87,6 +87,7 @@ xe-y += xe_bb.o \ xe_mmio.o \ xe_mocs.o \ xe_module.o \ + xe_netlink.o \ xe_pat.o \ xe_pci.o \ xe_pcode.o \ diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c index 628cb46a2509..8c928719a537 100644 --- a/drivers/gpu/drm/xe/xe_device.c +++ b/drivers/gpu/drm/xe/xe_device.c @@ -151,6 +151,8 @@ static void xe_driver_release(struct drm_device *dev) pci_set_drvdata(to_pci_dev(xe->drm.dev), NULL); } +extern const struct driver_genl_ops xe_genl_ops[]; + static struct drm_driver driver = { /* Don't use MTRRs here; the Xserver or userspace app should * deal with them for Intel hardware. @@ -159,6 +161,7 @@ static struct drm_driver driver = { DRIVER_GEM | DRIVER_RENDER | DRIVER_SYNCOBJ | DRIVER_SYNCOBJ_TIMELINE | DRIVER_GEM_GPUVA, + .open = xe_file_open, .postclose = xe_file_close, @@ -170,6 +173,7 @@ static struct drm_driver driver = { .show_fdinfo = xe_drm_client_fdinfo, #endif .release = &xe_driver_release, + .genl_ops = xe_genl_ops, .ioctls = xe_ioctls, .num_ioctls = ARRAY_SIZE(xe_ioctls), diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h index a1bacf820d37..8201f3644b86 100644 --- a/drivers/gpu/drm/xe/xe_device_types.h +++ b/drivers/gpu/drm/xe/xe_device_types.h @@ -10,6 +10,7 @@ #include #include +#include #include #include "xe_devcoredump_types.h" diff --git a/drivers/gpu/drm/xe/xe_netlink.c b/drivers/gpu/drm/xe/xe_netlink.c new file mode 100644 index 000000000000..81d785455632 --- /dev/null +++ b/drivers/gpu/drm/xe/xe_netlink.c @@ -0,0 +1,22 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2023 Intel Corporation + */ +#include "xe_device.h" + +static int xe_genl_list_errors(struct drm_device *drm, struct sk_buff *msg, struct genl_info *info) +{ + return 0; +} + +static int xe_genl_read_error(struct drm_device *drm, struct sk_buff *msg, struct genl_info *info) +{ + return 0; +} + +/* driver callbacks to DRM netlink commands*/ +const struct driver_genl_ops xe_genl_ops[] = { + [DRM_RAS_CMD_QUERY] = { .doit = xe_genl_list_errors }, + [DRM_RAS_CMD_READ_ONE] = { .doit = xe_genl_read_error }, + [DRM_RAS_CMD_READ_ALL] = { .doit = xe_genl_list_errors, }, +}; From patchwork Fri Oct 20 15:58:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Aravind Iddamsetty X-Patchwork-Id: 13430910 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B6A91C0032E for ; Fri, 20 Oct 2023 15:56:17 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 5B46510E5D7; Fri, 20 Oct 2023 15:56:15 +0000 (UTC) Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id 601F210E5D6; Fri, 20 Oct 2023 15:56:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1697817371; x=1729353371; h=from:to:subject:date:message-id:in-reply-to:references: mime-version:content-transfer-encoding; bh=oC+cqMJGdm+RLq0fgOPetdBTTUMYcv+eD/sDVq4V1oU=; b=llopUwsyf45oJ/jj3S0yGXxhjnfNEJTN7jhafhQH1JFSyDzCzCpGJ8Om InkuvhdTQod8f2IGUwsugVbs+pmp0b8+0w1RNiOGzFOLv3ZVgjnplaSYp i/KNJwv4VwMZy9cLmeiIweXnWxUplk98LK9BmwDUxOL0ZTMJg+qNrqLde o4WaNTc5ULcrg7yq+SwRXZOh1MuYQ8aoOhiiwqJUyjm5mQ6D28Yu3AWbA pj8iRnoJOaqWPufYra+Qp512qTnom27JdcI6HC0z00bdeFrXPMjDH4irx poAF3X6UBREEPIp3Z+zYm1RGbApogE8ORzmEOpeMX4vWYIUrcjoXwBAPx Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10869"; a="453000017" X-IronPort-AV: E=Sophos;i="6.03,239,1694761200"; d="scan'208";a="453000017" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Oct 2023 08:56:10 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10869"; a="1088778598" X-IronPort-AV: E=Sophos;i="6.03,239,1694761200"; d="scan'208";a="1088778598" Received: from aravind-dev.iind.intel.com ([10.145.162.146]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Oct 2023 08:56:06 -0700 From: Aravind Iddamsetty To: intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org, alexander.deucher@amd.com, airlied@gmail.com, daniel@ffwll.ch, joonas.lahtinen@linux.intel.com, ogabbay@kernel.org, ttayar@habana.ai, Hawking.Zhang@amd.com, Harish.Kasiviswanathan@amd.com, Felix.Kuehling@amd.com, Luben.Tuikov@amd.com, michael.j.ruhl@intel.com Subject: [RFC v3 3/5] drm/xe/RAS: Expose the error counters Date: Fri, 20 Oct 2023 21:28:33 +0530 Message-Id: <20231020155835.1295524-4-aravind.iddamsetty@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231020155835.1295524-1-aravind.iddamsetty@linux.intel.com> References: <20231020155835.1295524-1-aravind.iddamsetty@linux.intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" We expose the various error counters supported on a hardware via genl subsytem through the registered commands to userspace. The DRM_RAS_CMD_QUERY lists the error names with config id, DRM_RAD_CMD_READ_ONE returns the counter value for the requested config id and the DRM_RAS_CMD_READ_ALL lists the counters for all errors along with their names and config ids. v2: Rebase v3: 1. presently xe_list_errors fills blank data for IGFX, prevent it by having an early check of IS_DGFX (Michael J. Ruhl) 2. update errors from all sources Cc: Ruhl, Michael J Signed-off-by: Aravind Iddamsetty Reviewed-by: Michael J. Ruhl --- drivers/gpu/drm/xe/xe_netlink.c | 499 +++++++++++++++++++++++++++++++- include/uapi/drm/xe_drm.h | 81 ++++++ 2 files changed, 578 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/xe/xe_netlink.c b/drivers/gpu/drm/xe/xe_netlink.c index 81d785455632..3e4cdb5e4920 100644 --- a/drivers/gpu/drm/xe/xe_netlink.c +++ b/drivers/gpu/drm/xe/xe_netlink.c @@ -2,16 +2,511 @@ /* * Copyright © 2023 Intel Corporation */ +#include + #include "xe_device.h" -static int xe_genl_list_errors(struct drm_device *drm, struct sk_buff *msg, struct genl_info *info) +#define MAX_ERROR_NAME 100 + +static const char * const xe_hw_error_events[] = { + [XE_GENL_GT_ERROR_CORRECTABLE_L3_SNG] = "correctable-l3-sng", + [XE_GENL_GT_ERROR_CORRECTABLE_GUC] = "correctable-guc", + [XE_GENL_GT_ERROR_CORRECTABLE_SAMPLER] = "correctable-sampler", + [XE_GENL_GT_ERROR_CORRECTABLE_SLM] = "correctable-slm", + [XE_GENL_GT_ERROR_CORRECTABLE_EU_IC] = "correctable-eu-ic", + [XE_GENL_GT_ERROR_CORRECTABLE_EU_GRF] = "correctable-eu-grf", + [XE_GENL_GT_ERROR_FATAL_ARR_BIST] = "fatal-array-bist", + [XE_GENL_GT_ERROR_FATAL_L3_DOUB] = "fatal-l3-double", + [XE_GENL_GT_ERROR_FATAL_L3_ECC_CHK] = "fatal-l3-ecc-checker", + [XE_GENL_GT_ERROR_FATAL_GUC] = "fatal-guc", + [XE_GENL_GT_ERROR_FATAL_IDI_PAR] = "fatal-idi-parity", + [XE_GENL_GT_ERROR_FATAL_SQIDI] = "fatal-sqidi", + [XE_GENL_GT_ERROR_FATAL_SAMPLER] = "fatal-sampler", + [XE_GENL_GT_ERROR_FATAL_SLM] = "fatal-slm", + [XE_GENL_GT_ERROR_FATAL_EU_IC] = "fatal-eu-ic", + [XE_GENL_GT_ERROR_FATAL_EU_GRF] = "fatal-eu-grf", + [XE_GENL_GT_ERROR_FATAL_FPU] = "fatal-fpu", + [XE_GENL_GT_ERROR_FATAL_TLB] = "fatal-tlb", + [XE_GENL_GT_ERROR_FATAL_L3_FABRIC] = "fatal-l3-fabric", + [XE_GENL_GT_ERROR_CORRECTABLE_SUBSLICE] = "correctable-subslice", + [XE_GENL_GT_ERROR_CORRECTABLE_L3BANK] = "correctable-l3bank", + [XE_GENL_GT_ERROR_FATAL_SUBSLICE] = "fatal-subslice", + [XE_GENL_GT_ERROR_FATAL_L3BANK] = "fatal-l3bank", + [XE_GENL_SGUNIT_ERROR_CORRECTABLE] = "sgunit-correctable", + [XE_GENL_SGUNIT_ERROR_NONFATAL] = "sgunit-nonfatal", + [XE_GENL_SGUNIT_ERROR_FATAL] = "sgunit-fatal", + [XE_GENL_SOC_ERROR_NONFATAL_CSC_PSF_CMD] = "soc-nonfatal-csc-psf-cmd-parity", + [XE_GENL_SOC_ERROR_NONFATAL_CSC_PSF_CMP] = "soc-nonfatal-csc-psf-unexpected-completion", + [XE_GENL_SOC_ERROR_NONFATAL_CSC_PSF_REQ] = "soc-nonfatal-csc-psf-unsupported-request", + [XE_GENL_SOC_ERROR_NONFATAL_ANR_MDFI] = "soc-nonfatal-anr-mdfi", + [XE_GENL_SOC_ERROR_NONFATAL_MDFI_T2T] = "soc-nonfatal-mdfi-t2t", + [XE_GENL_SOC_ERROR_NONFATAL_MDFI_T2C] = "soc-nonfatal-mdfi-t2c", + [XE_GENL_SOC_ERROR_NONFATAL_HBM(0, 0)] = "soc-nonfatal-hbm-ss0-0", + [XE_GENL_SOC_ERROR_NONFATAL_HBM(0, 1)] = "soc-nonfatal-hbm-ss0-1", + [XE_GENL_SOC_ERROR_NONFATAL_HBM(0, 2)] = "soc-nonfatal-hbm-ss0-2", + [XE_GENL_SOC_ERROR_NONFATAL_HBM(0, 3)] = "soc-nonfatal-hbm-ss0-3", + [XE_GENL_SOC_ERROR_NONFATAL_HBM(0, 4)] = "soc-nonfatal-hbm-ss0-4", + [XE_GENL_SOC_ERROR_NONFATAL_HBM(0, 5)] = "soc-nonfatal-hbm-ss0-5", + [XE_GENL_SOC_ERROR_NONFATAL_HBM(0, 6)] = "soc-nonfatal-hbm-ss0-6", + [XE_GENL_SOC_ERROR_NONFATAL_HBM(0, 7)] = "soc-nonfatal-hbm-ss0-7", + [XE_GENL_SOC_ERROR_NONFATAL_HBM(0, 8)] = "soc-nonfatal-hbm-ss1-0", + [XE_GENL_SOC_ERROR_NONFATAL_HBM(0, 9)] = "soc-nonfatal-hbm-ss1-1", + [XE_GENL_SOC_ERROR_NONFATAL_HBM(0, 10)] = "soc-nonfatal-hbm-ss1-2", + [XE_GENL_SOC_ERROR_NONFATAL_HBM(0, 11)] = "soc-nonfatal-hbm-ss1-3", + [XE_GENL_SOC_ERROR_NONFATAL_HBM(0, 12)] = "soc-nonfatal-hbm-ss1-4", + [XE_GENL_SOC_ERROR_NONFATAL_HBM(0, 13)] = "soc-nonfatal-hbm-ss1-5", + [XE_GENL_SOC_ERROR_NONFATAL_HBM(0, 14)] = "soc-nonfatal-hbm-ss1-6", + [XE_GENL_SOC_ERROR_NONFATAL_HBM(0, 15)] = "soc-nonfatal-hbm-ss1-7", + [XE_GENL_SOC_ERROR_NONFATAL_HBM(1, 0)] = "soc-nonfatal-hbm-ss2-0", + [XE_GENL_SOC_ERROR_NONFATAL_HBM(1, 1)] = "soc-nonfatal-hbm-ss2-1", + [XE_GENL_SOC_ERROR_NONFATAL_HBM(1, 2)] = "soc-nonfatal-hbm-ss2-2", + [XE_GENL_SOC_ERROR_NONFATAL_HBM(1, 3)] = "soc-nonfatal-hbm-ss2-3", + [XE_GENL_SOC_ERROR_NONFATAL_HBM(1, 4)] = "soc-nonfatal-hbm-ss2-4", + [XE_GENL_SOC_ERROR_NONFATAL_HBM(1, 5)] = "soc-nonfatal-hbm-ss2-5", + [XE_GENL_SOC_ERROR_NONFATAL_HBM(1, 6)] = "soc-nonfatal-hbm-ss2-6", + [XE_GENL_SOC_ERROR_NONFATAL_HBM(1, 7)] = "soc-nonfatal-hbm-ss2-7", + [XE_GENL_SOC_ERROR_NONFATAL_HBM(1, 8)] = "soc-nonfatal-hbm-ss3-0", + [XE_GENL_SOC_ERROR_NONFATAL_HBM(1, 9)] = "soc-nonfatal-hbm-ss3-1", + [XE_GENL_SOC_ERROR_NONFATAL_HBM(1, 10)] = "soc-nonfatal-hbm-ss3-2", + [XE_GENL_SOC_ERROR_NONFATAL_HBM(1, 11)] = "soc-nonfatal-hbm-ss3-3", + [XE_GENL_SOC_ERROR_NONFATAL_HBM(1, 12)] = "soc-nonfatal-hbm-ss3-4", + [XE_GENL_SOC_ERROR_NONFATAL_HBM(1, 13)] = "soc-nonfatal-hbm-ss3-5", + [XE_GENL_SOC_ERROR_NONFATAL_HBM(1, 14)] = "soc-nonfatal-hbm-ss3-6", + [XE_GENL_SOC_ERROR_NONFATAL_HBM(1, 15)] = "soc-nonfatal-hbm-ss3-7", + [XE_GENL_SOC_ERROR_FATAL_CSC_PSF_CMD] = "soc-fatal-csc-psf-cmd-parity", + [XE_GENL_SOC_ERROR_FATAL_CSC_PSF_CMP] = "soc-fatal-csc-psf-unexpected-completion", + [XE_GENL_SOC_ERROR_FATAL_CSC_PSF_REQ] = "soc-fatal-csc-psf-unsupported-request", + [XE_GENL_SOC_ERROR_FATAL_PUNIT] = "soc-fatal-punit", + [XE_GENL_SOC_ERROR_FATAL_PCIE_PSF_CMD] = "soc-fatal-pcie-psf-command-parity", + [XE_GENL_SOC_ERROR_FATAL_PCIE_PSF_CMP] = "soc-fatal-pcie-psf-unexpected-completion", + [XE_GENL_SOC_ERROR_FATAL_PCIE_PSF_REQ] = "soc-fatal-pcie-psf-unsupported-request", + [XE_GENL_SOC_ERROR_FATAL_ANR_MDFI] = "soc-fatal-anr-mdfi", + [XE_GENL_SOC_ERROR_FATAL_MDFI_T2T] = "soc-fatal-mdfi-t2t", + [XE_GENL_SOC_ERROR_FATAL_MDFI_T2C] = "soc-fatal-mdfi-t2c", + [XE_GENL_SOC_ERROR_FATAL_PCIE_AER] = "soc-fatal-malformed-pcie-aer", + [XE_GENL_SOC_ERROR_FATAL_PCIE_ERR] = "soc-fatal-malformed-pcie-err", + [XE_GENL_SOC_ERROR_FATAL_UR_COND] = "soc-fatal-ur-condition-ieh", + [XE_GENL_SOC_ERROR_FATAL_SERR_SRCS] = "soc-fatal-from-serr-sources", + [XE_GENL_SOC_ERROR_FATAL_HBM(0, 0)] = "soc-fatal-hbm-ss0-0", + [XE_GENL_SOC_ERROR_FATAL_HBM(0, 1)] = "soc-fatal-hbm-ss0-1", + [XE_GENL_SOC_ERROR_FATAL_HBM(0, 2)] = "soc-fatal-hbm-ss0-2", + [XE_GENL_SOC_ERROR_FATAL_HBM(0, 3)] = "soc-fatal-hbm-ss0-3", + [XE_GENL_SOC_ERROR_FATAL_HBM(0, 4)] = "soc-fatal-hbm-ss0-4", + [XE_GENL_SOC_ERROR_FATAL_HBM(0, 5)] = "soc-fatal-hbm-ss0-5", + [XE_GENL_SOC_ERROR_FATAL_HBM(0, 6)] = "soc-fatal-hbm-ss0-6", + [XE_GENL_SOC_ERROR_FATAL_HBM(0, 7)] = "soc-fatal-hbm-ss0-7", + [XE_GENL_SOC_ERROR_FATAL_HBM(0, 8)] = "soc-fatal-hbm-ss1-0", + [XE_GENL_SOC_ERROR_FATAL_HBM(0, 9)] = "soc-fatal-hbm-ss1-1", + [XE_GENL_SOC_ERROR_FATAL_HBM(0, 10)] = "soc-fatal-hbm-ss1-2", + [XE_GENL_SOC_ERROR_FATAL_HBM(0, 11)] = "soc-fatal-hbm-ss1-3", + [XE_GENL_SOC_ERROR_FATAL_HBM(0, 12)] = "soc-fatal-hbm-ss1-4", + [XE_GENL_SOC_ERROR_FATAL_HBM(0, 13)] = "soc-fatal-hbm-ss1-5", + [XE_GENL_SOC_ERROR_FATAL_HBM(0, 14)] = "soc-fatal-hbm-ss1-6", + [XE_GENL_SOC_ERROR_FATAL_HBM(0, 15)] = "soc-fatal-hbm-ss1-7", + [XE_GENL_SOC_ERROR_FATAL_HBM(1, 0)] = "soc-fatal-hbm-ss2-0", + [XE_GENL_SOC_ERROR_FATAL_HBM(1, 1)] = "soc-fatal-hbm-ss2-1", + [XE_GENL_SOC_ERROR_FATAL_HBM(1, 2)] = "soc-fatal-hbm-ss2-2", + [XE_GENL_SOC_ERROR_FATAL_HBM(1, 3)] = "soc-fatal-hbm-ss2-3", + [XE_GENL_SOC_ERROR_FATAL_HBM(1, 4)] = "soc-fatal-hbm-ss2-4", + [XE_GENL_SOC_ERROR_FATAL_HBM(1, 5)] = "soc-fatal-hbm-ss2-5", + [XE_GENL_SOC_ERROR_FATAL_HBM(1, 6)] = "soc-fatal-hbm-ss2-6", + [XE_GENL_SOC_ERROR_FATAL_HBM(1, 7)] = "soc-fatal-hbm-ss2-7", + [XE_GENL_SOC_ERROR_FATAL_HBM(1, 8)] = "soc-fatal-hbm-ss3-0", + [XE_GENL_SOC_ERROR_FATAL_HBM(1, 9)] = "soc-fatal-hbm-ss3-1", + [XE_GENL_SOC_ERROR_FATAL_HBM(1, 10)] = "soc-fatal-hbm-ss3-2", + [XE_GENL_SOC_ERROR_FATAL_HBM(1, 11)] = "soc-fatal-hbm-ss3-3", + [XE_GENL_SOC_ERROR_FATAL_HBM(1, 12)] = "soc-fatal-hbm-ss3-4", + [XE_GENL_SOC_ERROR_FATAL_HBM(1, 13)] = "soc-fatal-hbm-ss3-5", + [XE_GENL_SOC_ERROR_FATAL_HBM(1, 14)] = "soc-fatal-hbm-ss3-6", + [XE_GENL_SOC_ERROR_FATAL_HBM(1, 15)] = "soc-fatal-hbm-ss3-7", + [XE_GENL_GSC_ERROR_CORRECTABLE_SRAM_ECC] = "gsc-correctable-sram-ecc", + [XE_GENL_GSC_ERROR_NONFATAL_MIA_SHUTDOWN] = "gsc-nonfatal-mia-shutdown", + [XE_GENL_GSC_ERROR_NONFATAL_MIA_INTERNAL] = "gsc-nonfatal-mia-internal", + [XE_GENL_GSC_ERROR_NONFATAL_SRAM_ECC] = "gsc-nonfatal-sram-ecc", + [XE_GENL_GSC_ERROR_NONFATAL_WDG_TIMEOUT] = "gsc-nonfatal-wdg-timeout", + [XE_GENL_GSC_ERROR_NONFATAL_ROM_PARITY] = "gsc-nonfatal-rom-parity", + [XE_GENL_GSC_ERROR_NONFATAL_UCODE_PARITY] = "gsc-nonfatal-ucode-parity", + [XE_GENL_GSC_ERROR_NONFATAL_VLT_GLITCH] = "gsc-nonfatal-vlt-glitch", + [XE_GENL_GSC_ERROR_NONFATAL_FUSE_PULL] = "gsc-nonfatal-fuse-pull", + [XE_GENL_GSC_ERROR_NONFATAL_FUSE_CRC_CHECK] = "gsc-nonfatal-fuse-crc-check", + [XE_GENL_GSC_ERROR_NONFATAL_SELF_MBIST] = "gsc-nonfatal-self-mbist", + [XE_GENL_GSC_ERROR_NONFATAL_AON_RF_PARITY] = "gsc-nonfatal-aon-parity", + [XE_GENL_SGGI_ERROR_NONFATAL] = "sggi-nonfatal-data-parity", + [XE_GENL_SGLI_ERROR_NONFATAL] = "sgli-nonfatal-data-parity", + [XE_GENL_SGCI_ERROR_NONFATAL] = "sgci-nonfatal-data-parity", + [XE_GENL_MERT_ERROR_NONFATAL] = "mert-nonfatal-data-parity", + [XE_GENL_SGGI_ERROR_FATAL] = "sggi-fatal-data-parity", + [XE_GENL_SGLI_ERROR_FATAL] = "sgli-fatal-data-parity", + [XE_GENL_SGCI_ERROR_FATAL] = "sgci-fatal-data-parity", + [XE_GENL_MERT_ERROR_FATAL] = "mert-nonfatal-data-parity", +}; + +static const unsigned long xe_hw_error_map[] = { + [XE_GENL_GT_ERROR_CORRECTABLE_L3_SNG] = XE_HW_ERR_GT_CORR_L3_SNG, + [XE_GENL_GT_ERROR_CORRECTABLE_GUC] = XE_HW_ERR_GT_CORR_GUC, + [XE_GENL_GT_ERROR_CORRECTABLE_SAMPLER] = XE_HW_ERR_GT_CORR_SAMPLER, + [XE_GENL_GT_ERROR_CORRECTABLE_SLM] = XE_HW_ERR_GT_CORR_SLM, + [XE_GENL_GT_ERROR_CORRECTABLE_EU_IC] = XE_HW_ERR_GT_CORR_EU_IC, + [XE_GENL_GT_ERROR_CORRECTABLE_EU_GRF] = XE_HW_ERR_GT_CORR_EU_GRF, + [XE_GENL_GT_ERROR_FATAL_ARR_BIST] = XE_HW_ERR_GT_FATAL_ARR_BIST, + [XE_GENL_GT_ERROR_FATAL_L3_DOUB] = XE_HW_ERR_GT_FATAL_L3_DOUB, + [XE_GENL_GT_ERROR_FATAL_L3_ECC_CHK] = XE_HW_ERR_GT_FATAL_L3_ECC_CHK, + [XE_GENL_GT_ERROR_FATAL_GUC] = XE_HW_ERR_GT_FATAL_GUC, + [XE_GENL_GT_ERROR_FATAL_IDI_PAR] = XE_HW_ERR_GT_FATAL_IDI_PAR, + [XE_GENL_GT_ERROR_FATAL_SQIDI] = XE_HW_ERR_GT_FATAL_SQIDI, + [XE_GENL_GT_ERROR_FATAL_SAMPLER] = XE_HW_ERR_GT_FATAL_SAMPLER, + [XE_GENL_GT_ERROR_FATAL_SLM] = XE_HW_ERR_GT_FATAL_SLM, + [XE_GENL_GT_ERROR_FATAL_EU_IC] = XE_HW_ERR_GT_FATAL_EU_IC, + [XE_GENL_GT_ERROR_FATAL_EU_GRF] = XE_HW_ERR_GT_FATAL_EU_GRF, + [XE_GENL_GT_ERROR_FATAL_FPU] = XE_HW_ERR_GT_FATAL_FPU, + [XE_GENL_GT_ERROR_FATAL_TLB] = XE_HW_ERR_GT_FATAL_TLB, + [XE_GENL_GT_ERROR_FATAL_L3_FABRIC] = XE_HW_ERR_GT_FATAL_L3_FABRIC, + [XE_GENL_GT_ERROR_CORRECTABLE_SUBSLICE] = XE_HW_ERR_GT_CORR_SUBSLICE, + [XE_GENL_GT_ERROR_CORRECTABLE_L3BANK] = XE_HW_ERR_GT_CORR_L3BANK, + [XE_GENL_GT_ERROR_FATAL_SUBSLICE] = XE_HW_ERR_GT_FATAL_SUBSLICE, + [XE_GENL_GT_ERROR_FATAL_L3BANK] = XE_HW_ERR_GT_FATAL_L3BANK, + [XE_GENL_SGUNIT_ERROR_CORRECTABLE] = XE_HW_ERR_TILE_CORR_SGUNIT, + [XE_GENL_SGUNIT_ERROR_NONFATAL] = XE_HW_ERR_TILE_NONFATAL_SGUNIT, + [XE_GENL_SGUNIT_ERROR_FATAL] = XE_HW_ERR_TILE_FATAL_SGUNIT, + [XE_GENL_SOC_ERROR_NONFATAL_CSC_PSF_CMD] = XE_HW_ERR_SOC_NONFATAL_CSC_PSF_CMD, + [XE_GENL_SOC_ERROR_NONFATAL_CSC_PSF_CMP] = XE_HW_ERR_SOC_NONFATAL_CSC_PSF_CMP, + [XE_GENL_SOC_ERROR_NONFATAL_CSC_PSF_REQ] = XE_HW_ERR_SOC_NONFATAL_CSC_PSF_REQ, + [XE_GENL_SOC_ERROR_NONFATAL_ANR_MDFI] = XE_HW_ERR_SOC_NONFATAL_ANR_MDFI, + [XE_GENL_SOC_ERROR_NONFATAL_MDFI_T2T] = XE_HW_ERR_SOC_NONFATAL_MDFI_T2T, + [XE_GENL_SOC_ERROR_NONFATAL_MDFI_T2C] = XE_HW_ERR_SOC_NONFATAL_MDFI_T2C, + [XE_GENL_SOC_ERROR_NONFATAL_HBM(0, 0)] = XE_HW_ERR_SOC_NONFATAL_HBM0_CHNL0, + [XE_GENL_SOC_ERROR_NONFATAL_HBM(0, 1)] = XE_HW_ERR_SOC_NONFATAL_HBM0_CHNL1, + [XE_GENL_SOC_ERROR_NONFATAL_HBM(0, 2)] = XE_HW_ERR_SOC_NONFATAL_HBM0_CHNL2, + [XE_GENL_SOC_ERROR_NONFATAL_HBM(0, 3)] = XE_HW_ERR_SOC_NONFATAL_HBM0_CHNL3, + [XE_GENL_SOC_ERROR_NONFATAL_HBM(0, 4)] = XE_HW_ERR_SOC_NONFATAL_HBM0_CHNL4, + [XE_GENL_SOC_ERROR_NONFATAL_HBM(0, 5)] = XE_HW_ERR_SOC_NONFATAL_HBM0_CHNL5, + [XE_GENL_SOC_ERROR_NONFATAL_HBM(0, 6)] = XE_HW_ERR_SOC_NONFATAL_HBM0_CHNL6, + [XE_GENL_SOC_ERROR_NONFATAL_HBM(0, 7)] = XE_HW_ERR_SOC_NONFATAL_HBM0_CHNL7, + [XE_GENL_SOC_ERROR_NONFATAL_HBM(0, 8)] = XE_HW_ERR_SOC_NONFATAL_HBM1_CHNL0, + [XE_GENL_SOC_ERROR_NONFATAL_HBM(0, 9)] = XE_HW_ERR_SOC_NONFATAL_HBM1_CHNL1, + [XE_GENL_SOC_ERROR_NONFATAL_HBM(0, 10)] = XE_HW_ERR_SOC_NONFATAL_HBM1_CHNL2, + [XE_GENL_SOC_ERROR_NONFATAL_HBM(0, 11)] = XE_HW_ERR_SOC_NONFATAL_HBM1_CHNL3, + [XE_GENL_SOC_ERROR_NONFATAL_HBM(0, 12)] = XE_HW_ERR_SOC_NONFATAL_HBM1_CHNL4, + [XE_GENL_SOC_ERROR_NONFATAL_HBM(0, 13)] = XE_HW_ERR_SOC_NONFATAL_HBM1_CHNL5, + [XE_GENL_SOC_ERROR_NONFATAL_HBM(0, 14)] = XE_HW_ERR_SOC_NONFATAL_HBM1_CHNL6, + [XE_GENL_SOC_ERROR_NONFATAL_HBM(0, 15)] = XE_HW_ERR_SOC_NONFATAL_HBM1_CHNL7, + [XE_GENL_SOC_ERROR_NONFATAL_HBM(1, 0)] = XE_HW_ERR_SOC_NONFATAL_HBM2_CHNL0, + [XE_GENL_SOC_ERROR_NONFATAL_HBM(1, 1)] = XE_HW_ERR_SOC_NONFATAL_HBM2_CHNL1, + [XE_GENL_SOC_ERROR_NONFATAL_HBM(1, 2)] = XE_HW_ERR_SOC_NONFATAL_HBM2_CHNL2, + [XE_GENL_SOC_ERROR_NONFATAL_HBM(1, 3)] = XE_HW_ERR_SOC_NONFATAL_HBM2_CHNL3, + [XE_GENL_SOC_ERROR_NONFATAL_HBM(1, 4)] = XE_HW_ERR_SOC_NONFATAL_HBM2_CHNL4, + [XE_GENL_SOC_ERROR_NONFATAL_HBM(1, 5)] = XE_HW_ERR_SOC_NONFATAL_HBM2_CHNL5, + [XE_GENL_SOC_ERROR_NONFATAL_HBM(1, 6)] = XE_HW_ERR_SOC_NONFATAL_HBM2_CHNL6, + [XE_GENL_SOC_ERROR_NONFATAL_HBM(1, 7)] = XE_HW_ERR_SOC_NONFATAL_HBM2_CHNL7, + [XE_GENL_SOC_ERROR_NONFATAL_HBM(1, 8)] = XE_HW_ERR_SOC_NONFATAL_HBM3_CHNL0, + [XE_GENL_SOC_ERROR_NONFATAL_HBM(1, 9)] = XE_HW_ERR_SOC_NONFATAL_HBM3_CHNL1, + [XE_GENL_SOC_ERROR_NONFATAL_HBM(1, 10)] = XE_HW_ERR_SOC_NONFATAL_HBM3_CHNL2, + [XE_GENL_SOC_ERROR_NONFATAL_HBM(1, 11)] = XE_HW_ERR_SOC_NONFATAL_HBM3_CHNL3, + [XE_GENL_SOC_ERROR_NONFATAL_HBM(1, 12)] = XE_HW_ERR_SOC_NONFATAL_HBM3_CHNL4, + [XE_GENL_SOC_ERROR_NONFATAL_HBM(1, 13)] = XE_HW_ERR_SOC_NONFATAL_HBM3_CHNL5, + [XE_GENL_SOC_ERROR_NONFATAL_HBM(1, 14)] = XE_HW_ERR_SOC_NONFATAL_HBM3_CHNL6, + [XE_GENL_SOC_ERROR_NONFATAL_HBM(1, 15)] = XE_HW_ERR_SOC_NONFATAL_HBM3_CHNL7, + [XE_GENL_SOC_ERROR_FATAL_CSC_PSF_CMD] = XE_HW_ERR_SOC_FATAL_CSC_PSF_CMD, + [XE_GENL_SOC_ERROR_FATAL_CSC_PSF_CMP] = XE_HW_ERR_SOC_FATAL_CSC_PSF_CMP, + [XE_GENL_SOC_ERROR_FATAL_CSC_PSF_REQ] = XE_HW_ERR_SOC_FATAL_CSC_PSF_REQ, + [XE_GENL_SOC_ERROR_FATAL_PUNIT] = XE_HW_ERR_SOC_FATAL_PUNIT, + [XE_GENL_SOC_ERROR_FATAL_PCIE_PSF_CMD] = XE_HW_ERR_SOC_FATAL_PCIE_PSF_CMD, + [XE_GENL_SOC_ERROR_FATAL_PCIE_PSF_CMP] = XE_HW_ERR_SOC_FATAL_PCIE_PSF_CMP, + [XE_GENL_SOC_ERROR_FATAL_PCIE_PSF_REQ] = XE_HW_ERR_SOC_FATAL_PCIE_PSF_REQ, + [XE_GENL_SOC_ERROR_FATAL_ANR_MDFI] = XE_HW_ERR_SOC_FATAL_ANR_MDFI, + [XE_GENL_SOC_ERROR_FATAL_MDFI_T2T] = XE_HW_ERR_SOC_FATAL_MDFI_T2T, + [XE_GENL_SOC_ERROR_FATAL_MDFI_T2C] = XE_HW_ERR_SOC_FATAL_MDFI_T2C, + [XE_GENL_SOC_ERROR_FATAL_PCIE_AER] = XE_HW_ERR_SOC_FATAL_PCIE_AER, + [XE_GENL_SOC_ERROR_FATAL_PCIE_ERR] = XE_HW_ERR_SOC_FATAL_PCIE_ERR, + [XE_GENL_SOC_ERROR_FATAL_UR_COND] = XE_HW_ERR_SOC_FATAL_UR_COND, + [XE_GENL_SOC_ERROR_FATAL_SERR_SRCS] = XE_HW_ERR_SOC_FATAL_SERR_SRCS, + [XE_GENL_SOC_ERROR_FATAL_HBM(0, 0)] = XE_HW_ERR_SOC_FATAL_HBM0_CHNL0, + [XE_GENL_SOC_ERROR_FATAL_HBM(0, 1)] = XE_HW_ERR_SOC_FATAL_HBM0_CHNL1, + [XE_GENL_SOC_ERROR_FATAL_HBM(0, 2)] = XE_HW_ERR_SOC_FATAL_HBM0_CHNL2, + [XE_GENL_SOC_ERROR_FATAL_HBM(0, 3)] = XE_HW_ERR_SOC_FATAL_HBM0_CHNL3, + [XE_GENL_SOC_ERROR_FATAL_HBM(0, 4)] = XE_HW_ERR_SOC_FATAL_HBM0_CHNL4, + [XE_GENL_SOC_ERROR_FATAL_HBM(0, 5)] = XE_HW_ERR_SOC_FATAL_HBM0_CHNL5, + [XE_GENL_SOC_ERROR_FATAL_HBM(0, 6)] = XE_HW_ERR_SOC_FATAL_HBM0_CHNL6, + [XE_GENL_SOC_ERROR_FATAL_HBM(0, 7)] = XE_HW_ERR_SOC_FATAL_HBM0_CHNL7, + [XE_GENL_SOC_ERROR_FATAL_HBM(0, 8)] = XE_HW_ERR_SOC_FATAL_HBM1_CHNL0, + [XE_GENL_SOC_ERROR_FATAL_HBM(0, 9)] = XE_HW_ERR_SOC_FATAL_HBM1_CHNL1, + [XE_GENL_SOC_ERROR_FATAL_HBM(0, 10)] = XE_HW_ERR_SOC_FATAL_HBM1_CHNL2, + [XE_GENL_SOC_ERROR_FATAL_HBM(0, 11)] = XE_HW_ERR_SOC_FATAL_HBM1_CHNL3, + [XE_GENL_SOC_ERROR_FATAL_HBM(0, 12)] = XE_HW_ERR_SOC_FATAL_HBM1_CHNL4, + [XE_GENL_SOC_ERROR_FATAL_HBM(0, 13)] = XE_HW_ERR_SOC_FATAL_HBM1_CHNL5, + [XE_GENL_SOC_ERROR_FATAL_HBM(0, 14)] = XE_HW_ERR_SOC_FATAL_HBM1_CHNL6, + [XE_GENL_SOC_ERROR_FATAL_HBM(0, 15)] = XE_HW_ERR_SOC_FATAL_HBM1_CHNL7, + [XE_GENL_SOC_ERROR_FATAL_HBM(1, 0)] = XE_HW_ERR_SOC_FATAL_HBM2_CHNL0, + [XE_GENL_SOC_ERROR_FATAL_HBM(1, 1)] = XE_HW_ERR_SOC_FATAL_HBM2_CHNL1, + [XE_GENL_SOC_ERROR_FATAL_HBM(1, 2)] = XE_HW_ERR_SOC_FATAL_HBM2_CHNL2, + [XE_GENL_SOC_ERROR_FATAL_HBM(1, 3)] = XE_HW_ERR_SOC_FATAL_HBM2_CHNL3, + [XE_GENL_SOC_ERROR_FATAL_HBM(1, 4)] = XE_HW_ERR_SOC_FATAL_HBM2_CHNL4, + [XE_GENL_SOC_ERROR_FATAL_HBM(1, 5)] = XE_HW_ERR_SOC_FATAL_HBM2_CHNL5, + [XE_GENL_SOC_ERROR_FATAL_HBM(1, 6)] = XE_HW_ERR_SOC_FATAL_HBM2_CHNL6, + [XE_GENL_SOC_ERROR_FATAL_HBM(1, 7)] = XE_HW_ERR_SOC_FATAL_HBM2_CHNL7, + [XE_GENL_SOC_ERROR_FATAL_HBM(1, 8)] = XE_HW_ERR_SOC_FATAL_HBM3_CHNL0, + [XE_GENL_SOC_ERROR_FATAL_HBM(1, 9)] = XE_HW_ERR_SOC_FATAL_HBM3_CHNL1, + [XE_GENL_SOC_ERROR_FATAL_HBM(1, 10)] = XE_HW_ERR_SOC_FATAL_HBM3_CHNL2, + [XE_GENL_SOC_ERROR_FATAL_HBM(1, 11)] = XE_HW_ERR_SOC_FATAL_HBM3_CHNL3, + [XE_GENL_SOC_ERROR_FATAL_HBM(1, 12)] = XE_HW_ERR_SOC_FATAL_HBM3_CHNL4, + [XE_GENL_SOC_ERROR_FATAL_HBM(1, 13)] = XE_HW_ERR_SOC_FATAL_HBM3_CHNL5, + [XE_GENL_SOC_ERROR_FATAL_HBM(1, 14)] = XE_HW_ERR_SOC_FATAL_HBM3_CHNL6, + [XE_GENL_SOC_ERROR_FATAL_HBM(1, 15)] = XE_HW_ERR_SOC_FATAL_HBM3_CHNL7, + [XE_GENL_GSC_ERROR_CORRECTABLE_SRAM_ECC] = XE_HW_ERR_GSC_CORR_SRAM, + [XE_GENL_GSC_ERROR_NONFATAL_MIA_SHUTDOWN] = XE_HW_ERR_GSC_NONFATAL_MIA_SHUTDOWN, + [XE_GENL_GSC_ERROR_NONFATAL_MIA_INTERNAL] = XE_HW_ERR_GSC_NONFATAL_MIA_INTERNAL, + [XE_GENL_GSC_ERROR_NONFATAL_SRAM_ECC] = XE_HW_ERR_GSC_NONFATAL_SRAM, + [XE_GENL_GSC_ERROR_NONFATAL_WDG_TIMEOUT] = XE_HW_ERR_GSC_NONFATAL_WDG, + [XE_GENL_GSC_ERROR_NONFATAL_ROM_PARITY] = XE_HW_ERR_GSC_NONFATAL_ROM_PARITY, + [XE_GENL_GSC_ERROR_NONFATAL_UCODE_PARITY] = XE_HW_ERR_GSC_NONFATAL_UCODE_PARITY, + [XE_GENL_GSC_ERROR_NONFATAL_VLT_GLITCH] = XE_HW_ERR_GSC_NONFATAL_VLT_GLITCH, + [XE_GENL_GSC_ERROR_NONFATAL_FUSE_PULL] = XE_HW_ERR_GSC_NONFATAL_FUSE_PULL, + [XE_GENL_GSC_ERROR_NONFATAL_FUSE_CRC_CHECK] = XE_HW_ERR_GSC_NONFATAL_FUSE_CRC, + [XE_GENL_GSC_ERROR_NONFATAL_SELF_MBIST] = XE_HW_ERR_GSC_NONFATAL_SELF_MBIST, + [XE_GENL_GSC_ERROR_NONFATAL_AON_RF_PARITY] = XE_HW_ERR_GSC_NONFATAL_AON_RF_PARITY, + [XE_GENL_SGGI_ERROR_NONFATAL] = XE_HW_ERR_TILE_NONFATAL_SGGI, + [XE_GENL_SGLI_ERROR_NONFATAL] = XE_HW_ERR_TILE_NONFATAL_SGLI, + [XE_GENL_SGCI_ERROR_NONFATAL] = XE_HW_ERR_TILE_NONFATAL_SGCI, + [XE_GENL_MERT_ERROR_NONFATAL] = XE_HW_ERR_TILE_NONFATAL_MERT, + [XE_GENL_SGGI_ERROR_FATAL] = XE_HW_ERR_TILE_FATAL_SGGI, + [XE_GENL_SGLI_ERROR_FATAL] = XE_HW_ERR_TILE_FATAL_SGLI, + [XE_GENL_SGCI_ERROR_FATAL] = XE_HW_ERR_TILE_FATAL_SGCI, + [XE_GENL_MERT_ERROR_FATAL] = XE_HW_ERR_TILE_FATAL_MERT, +}; + +static unsigned int config_gt_id(const u64 config) +{ + return config >> __XE_PMU_GT_SHIFT; +} + +static u64 config_counter(const u64 config) { + return config & ~(~0ULL << __XE_PMU_GT_SHIFT); +} + +static bool is_gt_error(const u64 config) +{ + unsigned int error; + + error = config_counter(config); + if (error <= XE_GENL_GT_ERROR_FATAL_FPU) + return true; + + return false; +} + +static bool is_gt_vector_error(const u64 config) +{ + unsigned int error; + + error = config_counter(config); + if (error >= XE_GENL_GT_ERROR_FATAL_TLB && + error <= XE_GENL_GT_ERROR_FATAL_L3BANK) + return true; + + return false; +} + +static bool is_pvc_invalid_gt_errors(const u64 config) +{ + switch (config_counter(config)) { + case XE_GENL_GT_ERROR_CORRECTABLE_L3_SNG: + case XE_GENL_GT_ERROR_CORRECTABLE_SAMPLER: + case XE_GENL_GT_ERROR_FATAL_ARR_BIST: + case XE_GENL_GT_ERROR_FATAL_L3_DOUB: + case XE_GENL_GT_ERROR_FATAL_L3_ECC_CHK: + case XE_GENL_GT_ERROR_FATAL_IDI_PAR: + case XE_GENL_GT_ERROR_FATAL_SQIDI: + case XE_GENL_GT_ERROR_FATAL_SAMPLER: + case XE_GENL_GT_ERROR_FATAL_EU_IC: + return true; + default: + return false; + } +} + +static bool is_gsc_hw_error(const u64 config) +{ + if (config_counter(config) >= XE_GENL_GSC_ERROR_CORRECTABLE_SRAM_ECC && + config_counter(config) <= XE_GENL_GSC_ERROR_NONFATAL_AON_RF_PARITY) + return true; + + return false; +} + +static bool is_soc_error(const u64 config) +{ + if (config_counter(config) >= XE_GENL_SOC_ERROR_NONFATAL_CSC_PSF_CMD && + config_counter(config) <= XE_GENL_SOC_ERROR_FATAL_HBM(1, 15)) + return true; + + return false; +} + +static int +config_status(struct xe_device *xe, u64 config) +{ + unsigned int gt_id = config_gt_id(config); + struct xe_gt *gt = xe_device_get_gt(xe, gt_id); + + if (!IS_DGFX(xe)) + return -ENODEV; + + if (gt->info.type == XE_GT_TYPE_UNINITIALIZED) + return -ENOENT; + + /* GSC HW ERRORS are present on root tile of + * platform supporting MEMORY SPARING only + */ + if (is_gsc_hw_error(config) && !(xe->info.platform == XE_PVC && !gt_id)) + return -ENODEV; + + /* GT vectors error are valid on Platforms supporting error vectors only */ + if (is_gt_vector_error(config) && xe->info.platform != XE_PVC) + return -ENODEV; + + /* Skip gt errors not supported on pvc */ + if (is_pvc_invalid_gt_errors(config) && xe->info.platform == XE_PVC) + return -ENODEV; + + /* FATAL FPU error is valid on PVC only */ + if (config_counter(config) == XE_GENL_GT_ERROR_FATAL_FPU && + !(xe->info.platform == XE_PVC)) + return -ENODEV; + + if (is_soc_error(config) && !(xe->info.platform == XE_PVC)) + return -ENODEV; + + return (config_counter(config) >= + ARRAY_SIZE(xe_hw_error_map)) ? -ENOENT : 0; +} + +static u64 get_counter_value(struct xe_device *xe, u64 config) +{ + const unsigned int gt_id = config_gt_id(config); + struct xe_gt *gt = xe_device_get_gt(xe, gt_id); + unsigned int id = config_counter(config); + + if (is_gt_error(config) || is_gt_vector_error(config)) + return xa_to_value(xa_load(>->errors.hw_error, xe_hw_error_map[id])); + + return xa_to_value(xa_load(>->tile->errors.hw_error, xe_hw_error_map[id])); +} + +int fill_error_details(struct xe_device *xe, struct genl_info *info, struct sk_buff *new_msg) +{ + struct nlattr *entry_attr; + bool counter = false; + struct xe_gt *gt; + int i, j; + + BUILD_BUG_ON(ARRAY_SIZE(xe_hw_error_events) != + ARRAY_SIZE(xe_hw_error_map)); + + if (info->genlhdr->cmd == DRM_RAS_CMD_READ_ALL) + counter = true; + + entry_attr = nla_nest_start(new_msg, DRM_RAS_ATTR_QUERY_REPLY); + if (!entry_attr) + return -EMSGSIZE; + + for_each_gt(gt, xe, j) { + char str[MAX_ERROR_NAME]; + u64 val; + + for (i = 0; i < ARRAY_SIZE(xe_hw_error_events); i++) { + u64 config = XE_HW_ERROR(j, i); + + if (config_status(xe, config)) + continue; + + /* should this be cleared everytime */ + snprintf(str, sizeof(str), "error-gt%d-%s", j, xe_hw_error_events[i]); + + if (nla_put_string(new_msg, DRM_RAS_ATTR_ERROR_NAME, str)) + goto err; + if (nla_put_u64_64bit(new_msg, DRM_RAS_ATTR_ERROR_ID, config, DRM_ATTR_PAD)) + goto err; + if (counter) { + val = get_counter_value(xe, config); + if (nla_put_u64_64bit(new_msg, DRM_RAS_ATTR_ERROR_VALUE, val, DRM_ATTR_PAD)) + goto err; + } + } + } + + nla_nest_end(new_msg, entry_attr); + return 0; +err: + drm_dbg_driver(&xe->drm, "msg buff is small\n"); + nla_nest_cancel(new_msg, entry_attr); + nlmsg_free(new_msg); + + return -EMSGSIZE; +} + +static int xe_genl_list_errors(struct drm_device *drm, struct sk_buff *msg, struct genl_info *info) +{ + struct xe_device *xe = to_xe_device(drm); + size_t msg_size = NLMSG_DEFAULT_SIZE; + struct sk_buff *new_msg; + int retries = 2; + void *usrhdr; + int ret = 0; + + if (!IS_DGFX(xe)) + return -ENODEV; + + do { + new_msg = drm_genl_alloc_msg(drm, info, msg_size, &usrhdr); + if (!new_msg) + return -ENOMEM; + + ret = fill_error_details(xe, info, new_msg); + if (!ret) + break; + + msg_size += NLMSG_DEFAULT_SIZE; + } while (retries--); + + if (!ret) + ret = drm_genl_reply(new_msg, info, usrhdr); + + return ret; } static int xe_genl_read_error(struct drm_device *drm, struct sk_buff *msg, struct genl_info *info) { - return 0; + struct xe_device *xe = to_xe_device(drm); + size_t msg_size = NLMSG_DEFAULT_SIZE; + struct sk_buff *new_msg; + void *usrhdr; + int ret = 0; + int retries = 2; + u64 config, val; + + config = nla_get_u64(info->attrs[DRM_RAS_ATTR_ERROR_ID]); + ret = config_status(xe, config); + if (ret) + return ret; + do { + new_msg = drm_genl_alloc_msg(drm, info, msg_size, &usrhdr); + if (!new_msg) + return -ENOMEM; + + val = get_counter_value(xe, config); + if (nla_put_u64_64bit(new_msg, DRM_RAS_ATTR_ERROR_VALUE, val, DRM_ATTR_PAD)) { + msg_size += NLMSG_DEFAULT_SIZE; + continue; + } + + break; + } while (retries--); + + ret = drm_genl_reply(new_msg, info, usrhdr); + + return ret; } /* driver callbacks to DRM netlink commands*/ diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h index 60cc6418d9a7..dbb3f1afba5f 100644 --- a/include/uapi/drm/xe_drm.h +++ b/include/uapi/drm/xe_drm.h @@ -1087,6 +1087,87 @@ struct drm_xe_vm_madvise { #define XE_PMU_MEDIA_GROUP_BUSY(gt) ___XE_PMU_OTHER(gt, 3) #define XE_PMU_ANY_ENGINE_GROUP_BUSY(gt) ___XE_PMU_OTHER(gt, 4) +/** + * DOC: XE GENL netlink event IDs + * TODO: Add more details + */ +#define XE_HW_ERROR(gt, id) \ + ((id) | ((__u64)(gt) << __XE_PMU_GT_SHIFT)) + +#define XE_GENL_GT_ERROR_CORRECTABLE_L3_SNG (0) +#define XE_GENL_GT_ERROR_CORRECTABLE_GUC (1) +#define XE_GENL_GT_ERROR_CORRECTABLE_SAMPLER (2) +#define XE_GENL_GT_ERROR_CORRECTABLE_SLM (3) +#define XE_GENL_GT_ERROR_CORRECTABLE_EU_IC (4) +#define XE_GENL_GT_ERROR_CORRECTABLE_EU_GRF (5) +#define XE_GENL_GT_ERROR_FATAL_ARR_BIST (6) +#define XE_GENL_GT_ERROR_FATAL_L3_DOUB (7) +#define XE_GENL_GT_ERROR_FATAL_L3_ECC_CHK (8) +#define XE_GENL_GT_ERROR_FATAL_GUC (9) +#define XE_GENL_GT_ERROR_FATAL_IDI_PAR (10) +#define XE_GENL_GT_ERROR_FATAL_SQIDI (11) +#define XE_GENL_GT_ERROR_FATAL_SAMPLER (12) +#define XE_GENL_GT_ERROR_FATAL_SLM (13) +#define XE_GENL_GT_ERROR_FATAL_EU_IC (14) +#define XE_GENL_GT_ERROR_FATAL_EU_GRF (15) +#define XE_GENL_GT_ERROR_FATAL_FPU (16) +#define XE_GENL_GT_ERROR_FATAL_TLB (17) +#define XE_GENL_GT_ERROR_FATAL_L3_FABRIC (18) +#define XE_GENL_GT_ERROR_CORRECTABLE_SUBSLICE (19) +#define XE_GENL_GT_ERROR_CORRECTABLE_L3BANK (20) +#define XE_GENL_GT_ERROR_FATAL_SUBSLICE (21) +#define XE_GENL_GT_ERROR_FATAL_L3BANK (22) +#define XE_GENL_SGUNIT_ERROR_CORRECTABLE (23) +#define XE_GENL_SGUNIT_ERROR_NONFATAL (24) +#define XE_GENL_SGUNIT_ERROR_FATAL (25) +#define XE_GENL_SOC_ERROR_NONFATAL_CSC_PSF_CMD (26) +#define XE_GENL_SOC_ERROR_NONFATAL_CSC_PSF_CMP (27) +#define XE_GENL_SOC_ERROR_NONFATAL_CSC_PSF_REQ (28) +#define XE_GENL_SOC_ERROR_NONFATAL_ANR_MDFI (29) +#define XE_GENL_SOC_ERROR_NONFATAL_MDFI_T2T (30) +#define XE_GENL_SOC_ERROR_NONFATAL_MDFI_T2C (31) +#define XE_GENL_SOC_ERROR_FATAL_CSC_PSF_CMD (32) +#define XE_GENL_SOC_ERROR_FATAL_CSC_PSF_CMP (33) +#define XE_GENL_SOC_ERROR_FATAL_CSC_PSF_REQ (34) +#define XE_GENL_SOC_ERROR_FATAL_PUNIT (35) +#define XE_GENL_SOC_ERROR_FATAL_PCIE_PSF_CMD (36) +#define XE_GENL_SOC_ERROR_FATAL_PCIE_PSF_CMP (37) +#define XE_GENL_SOC_ERROR_FATAL_PCIE_PSF_REQ (38) +#define XE_GENL_SOC_ERROR_FATAL_ANR_MDFI (39) +#define XE_GENL_SOC_ERROR_FATAL_MDFI_T2T (40) +#define XE_GENL_SOC_ERROR_FATAL_MDFI_T2C (41) +#define XE_GENL_SOC_ERROR_FATAL_PCIE_AER (42) +#define XE_GENL_SOC_ERROR_FATAL_PCIE_ERR (43) +#define XE_GENL_SOC_ERROR_FATAL_UR_COND (44) +#define XE_GENL_SOC_ERROR_FATAL_SERR_SRCS (45) + +#define XE_GENL_SOC_ERROR_NONFATAL_HBM(ss, n)\ + (XE_GENL_SOC_ERROR_FATAL_SERR_SRCS + 0x1 + (ss) * 0x10 + (n)) +#define XE_GENL_SOC_ERROR_FATAL_HBM(ss, n)\ + (XE_GENL_SOC_ERROR_NONFATAL_HBM(1, 15) + 0x1 + (ss) * 0x10 + (n)) + +/* 109 is the last ID used by SOC errors */ +#define XE_GENL_GSC_ERROR_CORRECTABLE_SRAM_ECC (110) +#define XE_GENL_GSC_ERROR_NONFATAL_MIA_SHUTDOWN (111) +#define XE_GENL_GSC_ERROR_NONFATAL_MIA_INTERNAL (112) +#define XE_GENL_GSC_ERROR_NONFATAL_SRAM_ECC (113) +#define XE_GENL_GSC_ERROR_NONFATAL_WDG_TIMEOUT (114) +#define XE_GENL_GSC_ERROR_NONFATAL_ROM_PARITY (115) +#define XE_GENL_GSC_ERROR_NONFATAL_UCODE_PARITY (116) +#define XE_GENL_GSC_ERROR_NONFATAL_VLT_GLITCH (117) +#define XE_GENL_GSC_ERROR_NONFATAL_FUSE_PULL (118) +#define XE_GENL_GSC_ERROR_NONFATAL_FUSE_CRC_CHECK (119) +#define XE_GENL_GSC_ERROR_NONFATAL_SELF_MBIST (120) +#define XE_GENL_GSC_ERROR_NONFATAL_AON_RF_PARITY (121) +#define XE_GENL_SGGI_ERROR_NONFATAL (122) +#define XE_GENL_SGLI_ERROR_NONFATAL (123) +#define XE_GENL_SGCI_ERROR_NONFATAL (124) +#define XE_GENL_MERT_ERROR_NONFATAL (125) +#define XE_GENL_SGGI_ERROR_FATAL (126) +#define XE_GENL_SGLI_ERROR_FATAL (127) +#define XE_GENL_SGCI_ERROR_FATAL (128) +#define XE_GENL_MERT_ERROR_FATAL (129) + #if defined(__cplusplus) } #endif From patchwork Fri Oct 20 15:58:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aravind Iddamsetty X-Patchwork-Id: 13430909 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 32CADCDB474 for ; Fri, 20 Oct 2023 15:56:20 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 17FD410E0F5; Fri, 20 Oct 2023 15:56:18 +0000 (UTC) Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id 46B9D10E0F5; Fri, 20 Oct 2023 15:56:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1697817375; x=1729353375; h=from:to:subject:date:message-id:in-reply-to:references: mime-version:content-transfer-encoding; bh=T7l4uyziph6sz0NMKx+fmL3JwY0/VxDeGTVi9nauYxY=; b=CWcqancDREXBw9f/CGpflGdbq1ywyap6zzrrP2d+nefD62rZWXqppSdQ 3geU0HHhcf2aeQX/s2Krsgh4nIh4+rJFaTZ7V3Nd6NTrOuRZP+zByAWsM WvufOqaSPYIOPe/aUEy47THm8b3C5tbKbF6c5yA9wE8MSaMrActDUWF4F Ai5Oo3g0wqC337oVozTRMS1QTg/+x2WrLnheXBstS1uFNEsmEQUjCe6Ss 0hJ+ahiOolQGmMWAumhzQYgMZpeHCEr2Ph0ChO7eYkMfwCZ2Qp42/Xst4 YJehIor6EYjww+h8OUDp8xi+/LcTWzK+horcQKcUVR4wyIznp1xnWJb/y w==; X-IronPort-AV: E=McAfee;i="6600,9927,10869"; a="453000036" X-IronPort-AV: E=Sophos;i="6.03,239,1694761200"; d="scan'208";a="453000036" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Oct 2023 08:56:15 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10869"; a="1088778614" X-IronPort-AV: E=Sophos;i="6.03,239,1694761200"; d="scan'208";a="1088778614" Received: from aravind-dev.iind.intel.com ([10.145.162.146]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Oct 2023 08:56:10 -0700 From: Aravind Iddamsetty To: intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org, alexander.deucher@amd.com, airlied@gmail.com, daniel@ffwll.ch, joonas.lahtinen@linux.intel.com, ogabbay@kernel.org, ttayar@habana.ai, Hawking.Zhang@amd.com, Harish.Kasiviswanathan@amd.com, Felix.Kuehling@amd.com, Luben.Tuikov@amd.com, michael.j.ruhl@intel.com Subject: [RFC 4/5] drm/netlink: Define multicast groups Date: Fri, 20 Oct 2023 21:28:34 +0530 Message-Id: <20231020155835.1295524-5-aravind.iddamsetty@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231020155835.1295524-1-aravind.iddamsetty@linux.intel.com> References: <20231020155835.1295524-1-aravind.iddamsetty@linux.intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Netlink subsystem supports event notifications to userspace. we define two multicast groups for correctable and uncorrectable errors to which userspace can subscribe and be notified when any of those errors happen. The group names are local to the driver's genl netlink family. Signed-off-by: Aravind Iddamsetty Reviewed-by: Michael J. Ruhl --- drivers/gpu/drm/drm_netlink.c | 7 +++++++ include/drm/drm_netlink.h | 5 +++++ include/uapi/drm/drm_netlink.h | 4 ++++ 3 files changed, 16 insertions(+) diff --git a/drivers/gpu/drm/drm_netlink.c b/drivers/gpu/drm/drm_netlink.c index 8add249c1da3..425a7355a573 100644 --- a/drivers/gpu/drm/drm_netlink.c +++ b/drivers/gpu/drm/drm_netlink.c @@ -12,6 +12,11 @@ DEFINE_XARRAY(drm_dev_xarray); +static const struct genl_multicast_group drm_event_mcgrps[] = { + [DRM_GENL_MCAST_CORR_ERR] = { .name = DRM_GENL_MCAST_GROUP_NAME_CORR_ERR, }, + [DRM_GENL_MCAST_UNCORR_ERR] = { .name = DRM_GENL_MCAST_GROUP_NAME_UNCORR_ERR, }, +}; + /** * drm_genl_reply - response to a request * @msg: socket buffer @@ -133,6 +138,8 @@ static void drm_genl_family_init(struct drm_device *dev) dev->drm_genl_family.ops = drm_genl_ops; dev->drm_genl_family.n_ops = ARRAY_SIZE(drm_genl_ops); dev->drm_genl_family.maxattr = DRM_ATTR_MAX; + dev->drm_genl_family.mcgrps = drm_event_mcgrps; + dev->drm_genl_family.n_mcgrps = ARRAY_SIZE(drm_event_mcgrps); dev->drm_genl_family.module = dev->dev->driver->owner; } diff --git a/include/drm/drm_netlink.h b/include/drm/drm_netlink.h index 54527dae7847..758239643c17 100644 --- a/include/drm/drm_netlink.h +++ b/include/drm/drm_netlink.h @@ -13,6 +13,11 @@ struct drm_device; +enum mcgrps_events { + DRM_GENL_MCAST_CORR_ERR, + DRM_GENL_MCAST_UNCORR_ERR, +}; + struct driver_genl_ops { int (*doit)(struct drm_device *dev, struct sk_buff *skb, diff --git a/include/uapi/drm/drm_netlink.h b/include/uapi/drm/drm_netlink.h index aab42147a20e..c7a0ce5b4624 100644 --- a/include/uapi/drm/drm_netlink.h +++ b/include/uapi/drm/drm_netlink.h @@ -26,6 +26,8 @@ #define _DRM_NETLINK_H_ #define DRM_GENL_VERSION 1 +#define DRM_GENL_MCAST_GROUP_NAME_CORR_ERR "drm_corr_err" +#define DRM_GENL_MCAST_GROUP_NAME_UNCORR_ERR "drm_uncorr_err" #if defined(__cplusplus) extern "C" { @@ -43,6 +45,8 @@ enum drm_genl_error_cmds { DRM_RAS_CMD_READ_ONE, /** @DRM_RAS_CMD_READ_ALL: Command to get counters of all errors */ DRM_RAS_CMD_READ_ALL, + /** @DRM_RAS_CMD_ERROR_EVENT: Command sent as part of multicast event */ + DRM_RAS_CMD_ERROR_EVENT, __DRM_CMD_MAX, DRM_CMD_MAX = __DRM_CMD_MAX - 1, From patchwork Fri Oct 20 15:58:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aravind Iddamsetty X-Patchwork-Id: 13430911 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id ECE9ECDB474 for ; Fri, 20 Oct 2023 15:56:26 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 25A0E10E5D6; Fri, 20 Oct 2023 15:56:26 +0000 (UTC) Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id 8887F10E5DA; Fri, 20 Oct 2023 15:56:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1697817379; x=1729353379; h=from:to:subject:date:message-id:in-reply-to:references: mime-version:content-transfer-encoding; bh=CHmOEA7LHes6zRhZJc6PG3j+zI4CzTMkHzHxbBQXHvY=; b=ZDdaJpd1bXFCjGOdO+dxjH48qeIE+vnh7Qcg69d0ut8dFclDrqlVXSKL vLm9MfZ/nq67yInqOCkZroJJSCqg8hqYKbi3vs/YaLxx1fPBAGFTqZBzg ZBMzUcYEptbToLPuxOUng3Ag5HmrcZO2QVHf72kvHnKnOlotdxw1pss/j K3qZvYfpFWLP+LiO5291NvAGE9a1OpV85tMjcOjMY3/AwhNi1BcXsO00L IONl08lN+/qte7AB84kf3Jx2sTE4gDyavwu+JibG6QXB8ecoYAgqCjjSQ ToHypSEWIXP0SlwfbRlvzefcI7vNmWQzAtKSbGyGRNjeWtdZteksy4B+I A==; X-IronPort-AV: E=McAfee;i="6600,9927,10869"; a="453000047" X-IronPort-AV: E=Sophos;i="6.03,239,1694761200"; d="scan'208";a="453000047" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Oct 2023 08:56:19 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10869"; a="1088778671" X-IronPort-AV: E=Sophos;i="6.03,239,1694761200"; d="scan'208";a="1088778671" Received: from aravind-dev.iind.intel.com ([10.145.162.146]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Oct 2023 08:56:15 -0700 From: Aravind Iddamsetty To: intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org, alexander.deucher@amd.com, airlied@gmail.com, daniel@ffwll.ch, joonas.lahtinen@linux.intel.com, ogabbay@kernel.org, ttayar@habana.ai, Hawking.Zhang@amd.com, Harish.Kasiviswanathan@amd.com, Felix.Kuehling@amd.com, Luben.Tuikov@amd.com, michael.j.ruhl@intel.com Subject: [RFC v2 5/5] drm/xe/RAS: send multicast event on occurrence of an error Date: Fri, 20 Oct 2023 21:28:35 +0530 Message-Id: <20231020155835.1295524-6-aravind.iddamsetty@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231020155835.1295524-1-aravind.iddamsetty@linux.intel.com> References: <20231020155835.1295524-1-aravind.iddamsetty@linux.intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Whenever a correctable or an uncorrectable error happens an event is sent to the corresponding listeners of these groups. v2: Rebase Signed-off-by: Aravind Iddamsetty Reviewed-by: Michael J. Ruhl --- drivers/gpu/drm/xe/xe_hw_error.c | 33 ++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/drivers/gpu/drm/xe/xe_hw_error.c b/drivers/gpu/drm/xe/xe_hw_error.c index bab6d4cf0b69..b0befb5e01cb 100644 --- a/drivers/gpu/drm/xe/xe_hw_error.c +++ b/drivers/gpu/drm/xe/xe_hw_error.c @@ -786,6 +786,37 @@ xe_soc_hw_error_handler(struct xe_tile *tile, const enum hardware_error hw_err) (HARDWARE_ERROR_MAX << 1) + 1); } +static void +generate_netlink_event(struct xe_device *xe, const enum hardware_error hw_err) +{ + struct sk_buff *msg; + void *hdr; + + if (!xe->drm.drm_genl_family.module) + return; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_ATOMIC); + if (!msg) { + drm_dbg_driver(&xe->drm, "couldn't allocate memory for error multicast event\n"); + return; + } + + hdr = genlmsg_put(msg, 0, 0, &xe->drm.drm_genl_family, 0, DRM_RAS_CMD_ERROR_EVENT); + if (!hdr) { + drm_dbg_driver(&xe->drm, "mutlicast msg buffer is small\n"); + nlmsg_free(msg); + return; + } + + genlmsg_end(msg, hdr); + + genlmsg_multicast(&xe->drm.drm_genl_family, msg, 0, + hw_err ? + DRM_GENL_MCAST_UNCORR_ERR + : DRM_GENL_MCAST_CORR_ERR, + GFP_ATOMIC); +} + static void xe_hw_error_source_handler(struct xe_tile *tile, const enum hardware_error hw_err) { @@ -849,6 +880,8 @@ xe_hw_error_source_handler(struct xe_tile *tile, const enum hardware_error hw_er } xe_mmio_write32(gt, DEV_ERR_STAT_REG(hw_err), errsrc); + + generate_netlink_event(tile_to_xe(tile), hw_err); unlock: spin_unlock_irqrestore(&tile_to_xe(tile)->irq.lock, flags); }