From patchwork Sun Oct 8 09:26:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Aravind Iddamsetty X-Patchwork-Id: 13412665 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E593EE95A67 for ; Sun, 8 Oct 2023 09:23:53 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 62C8510E0C5; Sun, 8 Oct 2023 09:23:50 +0000 (UTC) Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.151]) by gabe.freedesktop.org (Postfix) with ESMTPS id 4145910E09B; Sun, 8 Oct 2023 09:23:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1696757028; x=1728293028; h=from:to:subject:date:message-id:in-reply-to:references: mime-version:content-transfer-encoding; bh=bWB3H867LCJ8Tbg+UUrNk4TzHb8Ud0UeqrZnIcpy3Wk=; b=ZcNOfw5OkuyX7BeqGVnzptcuEFic+yQvk87DihIX3nwLN4FIpoYnLkrs rPElpFxMfft0yboSOVSZze+OARrcYdJOvIuwPnvWTpESl9gKKO9MRtzVP 7vXp6i8LDQhep4Y9owRYQEZfVnX+e/GEk4YzjZwStSBx5dHpUNspcWse7 qmqULAbaO9fyDgky344PUBkeVTPNkMJAJh4nSFFOzAB2fZIs/lxR1ZN96 PWzdxENVdXzyMIHza6BTs8LKjeWh4awHN1XyNQcDALHBgYOv4C26oKS5+ KdX0lBiqw3gXSbBx4yUBrTHMbPGzrFu7Dtfu0Iaif6ZdvBH4VlHsndBJ9 w==; X-IronPort-AV: E=McAfee;i="6600,9927,10856"; a="364278965" X-IronPort-AV: E=Sophos;i="6.03,207,1694761200"; d="scan'208";a="364278965" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Oct 2023 02:23:48 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10856"; a="1084007081" X-IronPort-AV: E=Sophos;i="6.03,207,1694761200"; d="scan'208";a="1084007081" Received: from aravind-dev.iind.intel.com ([10.145.162.146]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Oct 2023 02:23:44 -0700 From: Aravind Iddamsetty To: intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org, alexander.deucher@amd.com, airlied@gmail.com, daniel@ffwll.ch, joonas.lahtinen@linux.intel.com, ogabbay@kernel.org, ttayar@habana.ai, Hawking.Zhang@amd.com, Harish.Kasiviswanathan@amd.com, Felix.Kuehling@amd.com, Luben.Tuikov@amd.com Subject: [RFC v3 1/5] drm/netlink: Add netlink infrastructure Date: Sun, 8 Oct 2023 14:56:12 +0530 Message-Id: <20231008092616.1457995-2-aravind.iddamsetty@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231008092616.1457995-1-aravind.iddamsetty@linux.intel.com> References: <20231008092616.1457995-1-aravind.iddamsetty@linux.intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Define the netlink registration interface and commands, attributes that can be commonly used across by drm drivers. This patch intends to use the generic netlink family to expose various stats of device. At present it defines some commands that shall be used to expose RAS error counters. v2: define common interfaces to genl netlink subsystem that all drm drivers can leverage.(Tomer Tayar) v3: drop DRIVER_NETLINK flag and use the driver_genl_ops structure to register to netlink subsystem (Daniel Vetter) Cc: Tomer Tayar Cc: Daniel Vetter Signed-off-by: Aravind Iddamsetty --- drivers/gpu/drm/Makefile | 1 + drivers/gpu/drm/drm_drv.c | 7 ++ drivers/gpu/drm/drm_netlink.c | 175 +++++++++++++++++++++++++++++++++ include/drm/drm_device.h | 8 ++ include/drm/drm_drv.h | 7 ++ include/drm/drm_netlink.h | 39 ++++++++ include/uapi/drm/drm_netlink.h | 83 ++++++++++++++++ 7 files changed, 320 insertions(+) create mode 100644 drivers/gpu/drm/drm_netlink.c create mode 100644 include/drm/drm_netlink.h create mode 100644 include/uapi/drm/drm_netlink.h diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile index 1ad88efb1752..d0d8dc730c43 100644 --- a/drivers/gpu/drm/Makefile +++ b/drivers/gpu/drm/Makefile @@ -35,6 +35,7 @@ drm-y := \ drm_mode_object.o \ drm_modes.o \ drm_modeset_lock.o \ + drm_netlink.o \ drm_plane.o \ drm_prime.o \ drm_print.o \ diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c index 3eda026ffac6..c10c955186ce 100644 --- a/drivers/gpu/drm/drm_drv.c +++ b/drivers/gpu/drm/drm_drv.c @@ -928,6 +928,12 @@ int drm_dev_register(struct drm_device *dev, unsigned long flags) if (ret) goto err_minors; + if (driver->genl_ops) { + ret = drm_genl_register(dev); + if (ret) + goto err_minors; + } + ret = create_compat_control_link(dev); if (ret) goto err_minors; @@ -1064,6 +1070,7 @@ static void drm_core_exit(void) { drm_privacy_screen_lookup_exit(); accel_core_exit(); + drm_genl_exit(); unregister_chrdev(DRM_MAJOR, "drm"); debugfs_remove(drm_debugfs_root); drm_sysfs_destroy(); diff --git a/drivers/gpu/drm/drm_netlink.c b/drivers/gpu/drm/drm_netlink.c new file mode 100644 index 000000000000..843b2919b6ed --- /dev/null +++ b/drivers/gpu/drm/drm_netlink.c @@ -0,0 +1,175 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2023 Intel Corporation + */ + +#include +#include +#include +#include +#include +#include + +DEFINE_XARRAY(drm_dev_xarray); + +/** + * drm_genl_send - Send genl command + * @msg: socket buffer + * @info: receiver information + * @usrhdr: pointer to user specific header in the message buffer + * + * RETURNS: + * 0 on success and negative error code on failure + */ +int drm_genl_send(struct sk_buff *msg, struct genl_info *info, void *usrhdr) +{ + int ret; + + genlmsg_end(msg, usrhdr); + + ret = genlmsg_reply(msg, info); + if (ret) + nlmsg_free(msg); + + return ret; +} +EXPORT_SYMBOL(drm_genl_send); + +/** + * drm_genl_alloc_msg - allocate genl message buffer + * @dev: drm_device for which the message is being allocated + * @info: receiver information + * @usrhdr: pointer to user specific header in the message buffer + * + * RETURNS: + * pointer to new allocated buffer on success, NULL on failure + */ +struct sk_buff * +drm_genl_alloc_msg(struct drm_device *dev, + struct genl_info *info, + size_t msg_size, void **usrhdr) +{ + struct sk_buff *new_msg; + + new_msg = genlmsg_new(msg_size, GFP_KERNEL); + if (!new_msg) + return new_msg; + + *usrhdr = genlmsg_put_reply(new_msg, info, &dev->drm_genl_family, 0, info->genlhdr->cmd); + if (!*usrhdr) { + nlmsg_free(new_msg); + new_msg = NULL; + } + + return new_msg; +} +EXPORT_SYMBOL(drm_genl_alloc_msg); + +static struct drm_device *genl_to_dev(struct genl_info *info) +{ + return xa_load(&drm_dev_xarray, info->nlhdr->nlmsg_type); +} + +static int drm_genl_list_errors(struct sk_buff *msg, struct genl_info *info) +{ + struct drm_device *dev = genl_to_dev(info); + + if (GENL_REQ_ATTR_CHECK(info, DRM_RAS_ATTR_REQUEST)) + return -EINVAL; + + if (WARN_ON(!dev->driver->genl_ops[info->genlhdr->cmd].doit)) + return -EOPNOTSUPP; + + return dev->driver->genl_ops[info->genlhdr->cmd].doit(dev, msg, info); +} + +static int drm_genl_read_error(struct sk_buff *msg, struct genl_info *info) +{ + struct drm_device *dev = genl_to_dev(info); + + if (GENL_REQ_ATTR_CHECK(info, DRM_RAS_ATTR_ERROR_ID)) + return -EINVAL; + + if (WARN_ON(!dev->driver->genl_ops[info->genlhdr->cmd].doit)) + return -EOPNOTSUPP; + + return dev->driver->genl_ops[info->genlhdr->cmd].doit(dev, msg, info); +} + +/* drm genl operations definition */ +const struct genl_ops drm_genl_ops[] = { + { + .cmd = DRM_RAS_CMD_QUERY, + .doit = drm_genl_list_errors, + .policy = drm_attr_policy_query, + }, + { + .cmd = DRM_RAS_CMD_READ_ONE, + .doit = drm_genl_read_error, + .policy = drm_attr_policy_read_one, + }, + { + .cmd = DRM_RAS_CMD_READ_ALL, + .doit = drm_genl_list_errors, + .policy = drm_attr_policy_query, + }, +}; + +static void drm_genl_family_init(struct drm_device *dev) +{ + /* Use drm primary node name eg: card0 to name the genl family */ + snprintf(dev->drm_genl_family.name, sizeof(dev->drm_genl_family.name), "%s", dev->primary->kdev->kobj.name); + dev->drm_genl_family.version = DRM_GENL_VERSION; + dev->drm_genl_family.parallel_ops = true; + dev->drm_genl_family.ops = drm_genl_ops; + dev->drm_genl_family.n_ops = ARRAY_SIZE(drm_genl_ops); + dev->drm_genl_family.maxattr = DRM_ATTR_MAX; + dev->drm_genl_family.module = dev->dev->driver->owner; +} + +static void drm_genl_deregister(struct drm_device *dev, void *arg) +{ + drm_dbg_driver(dev, "unregistering genl family %s\n", dev->drm_genl_family.name); + + xa_erase(&drm_dev_xarray, dev->drm_genl_family.id); + + genl_unregister_family(&dev->drm_genl_family); +} + +/** + * drm_genl_register - Register genl family + * @dev: drm_device for which genl family needs to be registered + * + * RETURNS: + * 0 on success and negative error code on failure + */ +int drm_genl_register(struct drm_device *dev) +{ + int ret; + + BUG_ON(!dev->driver->genl_ops); + + drm_genl_family_init(dev); + + ret = genl_register_family(&dev->drm_genl_family); + if (ret < 0) { + drm_warn(dev, "genl family registration failed\n"); + return ret; + } + + drm_dbg_driver(dev, "genl family id %d and name %s\n", dev->drm_genl_family.id, dev->drm_genl_family.name); + + xa_store(&drm_dev_xarray, dev->drm_genl_family.id, dev, GFP_KERNEL); + + ret = drmm_add_action_or_reset(dev, drm_genl_deregister, NULL); + + return ret; +} + +/** + * drm_genl_exit: destroy drm_dev_xarray + */ +void drm_genl_exit(void) +{ + xa_destroy(&drm_dev_xarray); +} diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h index 7cf4afae2e79..a984f34a11a2 100644 --- a/include/drm/drm_device.h +++ b/include/drm/drm_device.h @@ -8,6 +8,7 @@ #include #include +#include struct drm_driver; struct drm_minor; @@ -326,6 +327,13 @@ struct drm_device { */ struct list_head debugfs_list; + /** + * @drm_genl_family: + * + * Generic netlink family registration structure. + */ + struct genl_family drm_genl_family; + /* Everything below here is for legacy driver, never use! */ /* private: */ #if IS_ENABLED(CONFIG_DRM_LEGACY) diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h index 9813fa759b75..480510714605 100644 --- a/include/drm/drm_drv.h +++ b/include/drm/drm_drv.h @@ -434,6 +434,13 @@ struct drm_driver { */ const struct file_operations *fops; + /** + * @genl_ops: + * + * Drivers private callback to genl commands + */ + const struct driver_genl_ops *genl_ops; + #ifdef CONFIG_DRM_LEGACY /* Everything below here is for legacy driver, never use! */ /* private: */ diff --git a/include/drm/drm_netlink.h b/include/drm/drm_netlink.h new file mode 100644 index 000000000000..60f7ee39ae0a --- /dev/null +++ b/include/drm/drm_netlink.h @@ -0,0 +1,39 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2023 Intel Corporation + */ + +#ifndef __DRM_NETLINK_H__ +#define __DRM_NETLINK_H__ + +#include +#include +#include +#include + +struct drm_device; + +struct driver_genl_ops { + int (*doit)(struct drm_device *dev, + struct sk_buff *skb, + struct genl_info *info); +}; + +/* attribute policies */ +static const struct nla_policy drm_attr_policy_query[DRM_ATTR_MAX + 1] = { + [DRM_RAS_ATTR_REQUEST] = { .type = NLA_U8 }, +}; + +static const struct nla_policy drm_attr_policy_read_one[DRM_ATTR_MAX + 1] = { + [DRM_RAS_ATTR_ERROR_ID] = { .type = NLA_U64 }, +}; + +int drm_genl_register(struct drm_device *dev); +void drm_genl_exit(void); +int drm_genl_send(struct sk_buff *msg, struct genl_info *info, void *usrhdr); +struct sk_buff * +drm_genl_alloc_msg(struct drm_device *dev, + struct genl_info *info, + size_t msg_size, void **usrhdr); +#endif + diff --git a/include/uapi/drm/drm_netlink.h b/include/uapi/drm/drm_netlink.h new file mode 100644 index 000000000000..aab42147a20e --- /dev/null +++ b/include/uapi/drm/drm_netlink.h @@ -0,0 +1,83 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright 2023 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * VA LINUX SYSTEMS AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + */ + +#ifndef _DRM_NETLINK_H_ +#define _DRM_NETLINK_H_ + +#define DRM_GENL_VERSION 1 + +#if defined(__cplusplus) +extern "C" { +#endif + +/** + * enum drm_genl_error_cmds - Supported error commands + * + */ +enum drm_genl_error_cmds { + DRM_CMD_UNSPEC, + /** @DRM_RAS_CMD_QUERY: Command to list all errors names with config-id */ + DRM_RAS_CMD_QUERY, + /** @DRM_RAS_CMD_READ_ONE: Command to get a counter for a specific error */ + DRM_RAS_CMD_READ_ONE, + /** @DRM_RAS_CMD_READ_ALL: Command to get counters of all errors */ + DRM_RAS_CMD_READ_ALL, + + __DRM_CMD_MAX, + DRM_CMD_MAX = __DRM_CMD_MAX - 1, +}; + +/** + * enum drm_error_attr - Attributes to use with drm_genl_error_cmds + * + */ +enum drm_error_attr { + DRM_ATTR_UNSPEC, + DRM_ATTR_PAD = DRM_ATTR_UNSPEC, + /** + * @DRM_RAS_ATTR_REQUEST: Should be used with DRM_RAS_CMD_QUERY, + * DRM_RAS_CMD_READ_ALL + */ + DRM_RAS_ATTR_REQUEST, /* NLA_U8 */ + /** + * @DRM_RAS_ATTR_QUERY_REPLY: First Nested attributed sent as a + * response to DRM_RAS_CMD_QUERY, DRM_RAS_CMD_READ_ALL commands. + */ + DRM_RAS_ATTR_QUERY_REPLY, /*NLA_NESTED*/ + /** @DRM_RAS_ATTR_ERROR_NAME: Used to pass error name */ + DRM_RAS_ATTR_ERROR_NAME, /* NLA_NUL_STRING */ + /** @DRM_RAS_ATTR_ERROR_ID: Used to pass error id */ + DRM_RAS_ATTR_ERROR_ID, /* NLA_U64 */ + /** @DRM_RAS_ATTR_ERROR_VALUE: Used to pass error value */ + DRM_RAS_ATTR_ERROR_VALUE, /* NLA_U64 */ + + __DRM_ATTR_MAX, + DRM_ATTR_MAX = __DRM_ATTR_MAX - 1, +}; + +#if defined(__cplusplus) +} +#endif + +#endif From patchwork Sun Oct 8 09:26:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Aravind Iddamsetty X-Patchwork-Id: 13412666 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 21933E95A8E for ; Sun, 8 Oct 2023 09:23:57 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 05E1510E0C8; Sun, 8 Oct 2023 09:23:54 +0000 (UTC) Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.151]) by gabe.freedesktop.org (Postfix) with ESMTPS id DF6E910E0C8; Sun, 8 Oct 2023 09:23:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1696757031; x=1728293031; h=from:to:subject:date:message-id:in-reply-to:references: mime-version:content-transfer-encoding; bh=+woAk0TS50+MxZ4BmKLHK3OHQzUJ/1acgxT8qxwTyQw=; b=Id8Y7HzOOa+Hd3daQVyxaWsAfTfSwFKLL6UZIVpN+Zxy4/T1R09Kucb6 L0qGSPMZ/CgrI/WxZtQdt1cJTo+r4C60r8aOEn959DymWc6plV09crpAc Nby3YF5IlW7gV2xPLpq+Yk79ZeSZbMaAhM7dEsRMfBg+rn/q2t8waoY/a sfcGA63th6vUyGx8fVa55BFdx7vWT74FIYcSnLnYyispF1gV2WpUZjtRx 0SVaj6iy66QEXQ46zRjJZjaWZB9No3RVUTbTXRudqF/QQf+mCjFoQS7+P BjjkN1L9OAfEzYXAuNoYGbzZZPAIFfkJY+m7OWgIXRnwILhrvKrVphuHw A==; X-IronPort-AV: E=McAfee;i="6600,9927,10856"; a="364278971" X-IronPort-AV: E=Sophos;i="6.03,207,1694761200"; d="scan'208";a="364278971" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Oct 2023 02:23:51 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10856"; a="1084007091" X-IronPort-AV: E=Sophos;i="6.03,207,1694761200"; d="scan'208";a="1084007091" Received: from aravind-dev.iind.intel.com ([10.145.162.146]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Oct 2023 02:23:48 -0700 From: Aravind Iddamsetty To: intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org, alexander.deucher@amd.com, airlied@gmail.com, daniel@ffwll.ch, joonas.lahtinen@linux.intel.com, ogabbay@kernel.org, ttayar@habana.ai, Hawking.Zhang@amd.com, Harish.Kasiviswanathan@amd.com, Felix.Kuehling@amd.com, Luben.Tuikov@amd.com Subject: [RFC v2 2/5] drm/xe/RAS: Register netlink capability Date: Sun, 8 Oct 2023 14:56:13 +0530 Message-Id: <20231008092616.1457995-3-aravind.iddamsetty@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231008092616.1457995-1-aravind.iddamsetty@linux.intel.com> References: <20231008092616.1457995-1-aravind.iddamsetty@linux.intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Register netlink capability with the DRM and register the driver callbacks to DRM RAS netlink commands. v2: Move the netlink registration parts to DRM susbsytem (Tomer Tayar) Cc: Tomer Tayar Signed-off-by: Aravind Iddamsetty --- drivers/gpu/drm/xe/Makefile | 1 + drivers/gpu/drm/xe/xe_device.c | 4 ++++ drivers/gpu/drm/xe/xe_device_types.h | 1 + drivers/gpu/drm/xe/xe_netlink.c | 22 ++++++++++++++++++++++ 4 files changed, 28 insertions(+) create mode 100644 drivers/gpu/drm/xe/xe_netlink.c diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile index be1f3afec3dc..6bdea5ffb41b 100644 --- a/drivers/gpu/drm/xe/Makefile +++ b/drivers/gpu/drm/xe/Makefile @@ -86,6 +86,7 @@ xe-y += xe_bb.o \ xe_mmio.o \ xe_mocs.o \ xe_module.o \ + xe_netlink.o \ xe_pat.o \ xe_pci.o \ xe_pcode.o \ diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c index 687dc3d79a66..21bc304af70b 100644 --- a/drivers/gpu/drm/xe/xe_device.c +++ b/drivers/gpu/drm/xe/xe_device.c @@ -150,6 +150,8 @@ static void xe_driver_release(struct drm_device *dev) pci_set_drvdata(to_pci_dev(xe->drm.dev), NULL); } +extern const struct driver_genl_ops xe_genl_ops[]; + static struct drm_driver driver = { /* Don't use MTRRs here; the Xserver or userspace app should * deal with them for Intel hardware. @@ -158,6 +160,7 @@ static struct drm_driver driver = { DRIVER_GEM | DRIVER_RENDER | DRIVER_SYNCOBJ | DRIVER_SYNCOBJ_TIMELINE | DRIVER_GEM_GPUVA, + .open = xe_file_open, .postclose = xe_file_close, @@ -169,6 +172,7 @@ static struct drm_driver driver = { .show_fdinfo = xe_drm_client_fdinfo, #endif .release = &xe_driver_release, + .genl_ops = xe_genl_ops, .ioctls = xe_ioctls, .num_ioctls = ARRAY_SIZE(xe_ioctls), diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h index ff476a167be4..fc6e132954e1 100644 --- a/drivers/gpu/drm/xe/xe_device_types.h +++ b/drivers/gpu/drm/xe/xe_device_types.h @@ -10,6 +10,7 @@ #include #include +#include #include #include "xe_devcoredump_types.h" diff --git a/drivers/gpu/drm/xe/xe_netlink.c b/drivers/gpu/drm/xe/xe_netlink.c new file mode 100644 index 000000000000..81d785455632 --- /dev/null +++ b/drivers/gpu/drm/xe/xe_netlink.c @@ -0,0 +1,22 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2023 Intel Corporation + */ +#include "xe_device.h" + +static int xe_genl_list_errors(struct drm_device *drm, struct sk_buff *msg, struct genl_info *info) +{ + return 0; +} + +static int xe_genl_read_error(struct drm_device *drm, struct sk_buff *msg, struct genl_info *info) +{ + return 0; +} + +/* driver callbacks to DRM netlink commands*/ +const struct driver_genl_ops xe_genl_ops[] = { + [DRM_RAS_CMD_QUERY] = { .doit = xe_genl_list_errors }, + [DRM_RAS_CMD_READ_ONE] = { .doit = xe_genl_read_error }, + [DRM_RAS_CMD_READ_ALL] = { .doit = xe_genl_list_errors, }, +}; From patchwork Sun Oct 8 09:26:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Aravind Iddamsetty X-Patchwork-Id: 13412667 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D1BEFE95A8E for ; Sun, 8 Oct 2023 09:23:59 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 51F9510E0DE; Sun, 8 Oct 2023 09:23:58 +0000 (UTC) Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.151]) by gabe.freedesktop.org (Postfix) with ESMTPS id E8A0810E0D5; Sun, 8 Oct 2023 09:23:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1696757035; x=1728293035; h=from:to:subject:date:message-id:in-reply-to:references: mime-version:content-transfer-encoding; bh=1TS1CcHWU69URe0KGu9VJgQSP2r8y5eoTdMEV5KQEPA=; b=lFxWq9AQnYQVnfSGiU5kFK2BFZum4DXJzL6lb/y35Mfsya3iSBrmQto8 lfJp6JHTbtek/LBpjYYcfWeiHC3w+sR/M2sUM+9J0heisz4q3wYmju2XR 2EOO0MdWOx3UHzlIUqvFE0eSB0wF1fPVLfWAUN3mGwheGHPZoGpvqRSLp ujI8IMkRiajl2BRhfpZIe6HIt2UOjAEsf0t4mbVHy5hOIYDxHWBJkAn0c GbYZ8xtufbvWJsWhhqc+geTru9etRHJQevH+UMcuLIIZG4D5ME2thUH44 nLIpGkZ0iEwrtRi1PXBqf9J5ADZlAXhaJMqN1BQnc7ACptg07pCiL0nIf w==; X-IronPort-AV: E=McAfee;i="6600,9927,10856"; a="364278983" X-IronPort-AV: E=Sophos;i="6.03,207,1694761200"; d="scan'208";a="364278983" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Oct 2023 02:23:55 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10856"; a="1084007100" X-IronPort-AV: E=Sophos;i="6.03,207,1694761200"; d="scan'208";a="1084007100" Received: from aravind-dev.iind.intel.com ([10.145.162.146]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Oct 2023 02:23:51 -0700 From: Aravind Iddamsetty To: intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org, alexander.deucher@amd.com, airlied@gmail.com, daniel@ffwll.ch, joonas.lahtinen@linux.intel.com, ogabbay@kernel.org, ttayar@habana.ai, Hawking.Zhang@amd.com, Harish.Kasiviswanathan@amd.com, Felix.Kuehling@amd.com, Luben.Tuikov@amd.com Subject: [RFC v2 3/5] drm/xe/RAS: Expose the error counters Date: Sun, 8 Oct 2023 14:56:14 +0530 Message-Id: <20231008092616.1457995-4-aravind.iddamsetty@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231008092616.1457995-1-aravind.iddamsetty@linux.intel.com> References: <20231008092616.1457995-1-aravind.iddamsetty@linux.intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" We expose the various error counters supported on a hardware via genl subsytem through the registered commands to userspace. The DRM_RAS_CMD_QUERY lists the error names with config id, DRM_RAD_CMD_READ_ONE returns the counter value for the requested config id and the DRM_RAS_CMD_READ_ALL lists the counters for all errors along with their names and config ids. v2: Rebase Signed-off-by: Aravind Iddamsetty --- drivers/gpu/drm/xe/xe_netlink.c | 396 +++++++++++++++++++++++++++++++- include/uapi/drm/xe_drm.h | 61 +++++ 2 files changed, 455 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/xe/xe_netlink.c b/drivers/gpu/drm/xe/xe_netlink.c index 81d785455632..a9d70dfe81a8 100644 --- a/drivers/gpu/drm/xe/xe_netlink.c +++ b/drivers/gpu/drm/xe/xe_netlink.c @@ -2,16 +2,408 @@ /* * Copyright © 2023 Intel Corporation */ +#include + #include "xe_device.h" -static int xe_genl_list_errors(struct drm_device *drm, struct sk_buff *msg, struct genl_info *info) +#define MAX_ERROR_NAME 50 + +static const char * const xe_hw_error_events[] = { + [XE_GENL_GT_ERROR_CORRECTABLE_L3_SNG] = "correctable-l3-sng", + [XE_GENL_GT_ERROR_CORRECTABLE_GUC] = "correctable-guc", + [XE_GENL_GT_ERROR_CORRECTABLE_SAMPLER] = "correctable-sampler", + [XE_GENL_GT_ERROR_CORRECTABLE_SLM] = "correctable-slm", + [XE_GENL_GT_ERROR_CORRECTABLE_EU_IC] = "correctable-eu-ic", + [XE_GENL_GT_ERROR_CORRECTABLE_EU_GRF] = "correctable-eu-grf", + [XE_GENL_GT_ERROR_FATAL_ARR_BIST] = "fatal-array-bist", + [XE_GENL_GT_ERROR_FATAL_L3_DOUB] = "fatal-l3-double", + [XE_GENL_GT_ERROR_FATAL_L3_ECC_CHK] = "fatal-l3-ecc-checker", + [XE_GENL_GT_ERROR_FATAL_GUC] = "fatal-guc", + [XE_GENL_GT_ERROR_FATAL_IDI_PAR] = "fatal-idi-parity", + [XE_GENL_GT_ERROR_FATAL_SQIDI] = "fatal-sqidi", + [XE_GENL_GT_ERROR_FATAL_SAMPLER] = "fatal-sampler", + [XE_GENL_GT_ERROR_FATAL_SLM] = "fatal-slm", + [XE_GENL_GT_ERROR_FATAL_EU_IC] = "fatal-eu-ic", + [XE_GENL_GT_ERROR_FATAL_EU_GRF] = "fatal-eu-grf", + [XE_GENL_GT_ERROR_FATAL_FPU] = "fatal-fpu", + [XE_GENL_GT_ERROR_FATAL_TLB] = "fatal-tlb", + [XE_GENL_GT_ERROR_FATAL_L3_FABRIC] = "fatal-l3-fabric", + [XE_GENL_GT_ERROR_CORRECTABLE_SUBSLICE] = "correctable-subslice", + [XE_GENL_GT_ERROR_CORRECTABLE_L3BANK] = "correctable-l3bank", + [XE_GENL_GT_ERROR_FATAL_SUBSLICE] = "fatal-subslice", + [XE_GENL_GT_ERROR_FATAL_L3BANK] = "fatal-l3bank", + [XE_GENL_SGUNIT_ERROR_CORRECTABLE] = "sgunit-correctable", + [XE_GENL_SGUNIT_ERROR_NONFATAL] = "sgunit-nonfatal", + [XE_GENL_SGUNIT_ERROR_FATAL] = "sgunit-fatal", + [XE_GENL_SOC_ERROR_FATAL_PSF_CSC_0] = "soc-fatal-psf-csc-0", + [XE_GENL_SOC_ERROR_FATAL_PSF_CSC_1] = "soc-fatal-psf-csc-1", + [XE_GENL_SOC_ERROR_FATAL_PSF_CSC_2] = "soc-fatal-psf-csc-2", + [XE_GENL_SOC_ERROR_FATAL_PUNIT] = "soc-fatal-punit", + [XE_GENL_PVC_SOC_ERROR_FATAL_PSF_0] = "soc-fatal-psf-0", + [XE_GENL_PVC_SOC_ERROR_FATAL_PSF_1] = "soc-fatal-psf-1", + [XE_GENL_PVC_SOC_ERROR_FATAL_PSF_2] = "soc-fatal-psf-2", + [XE_GENL_PVC_SOC_ERROR_FATAL_CD0_MDFI] = "soc-fatal-cd0-mdfi", + [XE_GENL_PVC_SOC_ERROR_FATAL_MDFI_EAST] = "soc-fatal-mdfi-east", + [XE_GENL_PVC_SOC_ERROR_FATAL_MDFI_SOUTH] = "soc-fatal-mdfi-south", + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(0, 0)] = "soc-fatal-hbm-ss0-0", + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(0, 1)] = "soc-fatal-hbm-ss0-1", + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(0, 2)] = "soc-fatal-hbm-ss0-2", + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(0, 3)] = "soc-fatal-hbm-ss0-3", + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(0, 4)] = "soc-fatal-hbm-ss0-4", + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(0, 5)] = "soc-fatal-hbm-ss0-5", + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(0, 6)] = "soc-fatal-hbm-ss0-6", + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(0, 7)] = "soc-fatal-hbm-ss0-7", + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(0, 8)] = "soc-fatal-hbm-ss1-0", + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(0, 9)] = "soc-fatal-hbm-ss1-1", + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(0, 10)] = "soc-fatal-hbm-ss1-2", + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(0, 11)] = "soc-fatal-hbm-ss1-3", + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(0, 12)] = "soc-fatal-hbm-ss1-4", + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(0, 13)] = "soc-fatal-hbm-ss1-5", + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(0, 14)] = "soc-fatal-hbm-ss1-6", + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(0, 15)] = "soc-fatal-hbm-ss1-7", + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(1, 0)] = "soc-fatal-hbm-ss2-0", + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(1, 1)] = "soc-fatal-hbm-ss2-1", + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(1, 2)] = "soc-fatal-hbm-ss2-2", + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(1, 3)] = "soc-fatal-hbm-ss2-3", + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(1, 4)] = "soc-fatal-hbm-ss2-4", + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(1, 5)] = "soc-fatal-hbm-ss2-5", + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(1, 6)] = "soc-fatal-hbm-ss2-6", + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(1, 7)] = "soc-fatal-hbm-ss2-7", + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(1, 8)] = "soc-fatal-hbm-ss3-0", + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(1, 9)] = "soc-fatal-hbm-ss3-1", + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(1, 10)] = "soc-fatal-hbm-ss3-2", + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(1, 11)] = "soc-fatal-hbm-ss3-3", + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(1, 12)] = "soc-fatal-hbm-ss3-4", + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(1, 13)] = "soc-fatal-hbm-ss3-5", + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(1, 14)] = "soc-fatal-hbm-ss3-6", + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(1, 15)] = "soc-fatal-hbm-ss3-7", + [XE_GENL_GSC_ERROR_CORRECTABLE_SRAM_ECC] = "gsc-correctable-sram-ecc", + [XE_GENL_GSC_ERROR_NONFATAL_MIA_SHUTDOWN] = "gsc-nonfatal-mia-shutdown", + [XE_GENL_GSC_ERROR_NONFATAL_MIA_INT] = "gsc-nonfatal-mia-int", + [XE_GENL_GSC_ERROR_NONFATAL_SRAM_ECC] = "gsc-nonfatal-sram-ecc", + [XE_GENL_GSC_ERROR_NONFATAL_WDG_TIMEOUT] = "gsc-nonfatal-wdg-timeout", + [XE_GENL_GSC_ERROR_NONFATAL_ROM_PARITY] = "gsc-nonfatal-rom-parity", + [XE_GENL_GSC_ERROR_NONFATAL_UCODE_PARITY] = "gsc-nonfatal-ucode-parity", + [XE_GENL_GSC_ERROR_NONFATAL_GLITCH_DET] = "gsc-nonfatal-glitch-det", + [XE_GENL_GSC_ERROR_NONFATAL_FUSE_PULL] = "gsc-nonfatal-fuse-pull", + [XE_GENL_GSC_ERROR_NONFATAL_FUSE_CRC_CHECK] = "gsc-nonfatal-fuse-crc-check", + [XE_GENL_GSC_ERROR_NONFATAL_FUSE_SELFMBIST] = "gsc-nonfatal-selfmbist", + [XE_GENL_GSC_ERROR_NONFATAL_AON_PARITY] = "gsc-nonfatal-aon-parity", +}; + +static const unsigned long xe_hw_error_map[] = { + [XE_GENL_GT_ERROR_CORRECTABLE_L3_SNG] = XE_GT_HW_ERR_L3_SNG_CORR, + [XE_GENL_GT_ERROR_CORRECTABLE_GUC] = XE_GT_HW_ERR_GUC_CORR, + [XE_GENL_GT_ERROR_CORRECTABLE_SAMPLER] = XE_GT_HW_ERR_SAMPLER_CORR, + [XE_GENL_GT_ERROR_CORRECTABLE_SLM] = XE_GT_HW_ERR_SLM_CORR, + [XE_GENL_GT_ERROR_CORRECTABLE_EU_IC] = XE_GT_HW_ERR_EU_IC_CORR, + [XE_GENL_GT_ERROR_CORRECTABLE_EU_GRF] = XE_GT_HW_ERR_EU_GRF_CORR, + [XE_GENL_GT_ERROR_FATAL_ARR_BIST] = XE_GT_HW_ERR_ARR_BIST_FATAL, + [XE_GENL_GT_ERROR_FATAL_L3_DOUB] = XE_GT_HW_ERR_L3_DOUB_FATAL, + [XE_GENL_GT_ERROR_FATAL_L3_ECC_CHK] = XE_GT_HW_ERR_L3_ECC_CHK_FATAL, + [XE_GENL_GT_ERROR_FATAL_GUC] = XE_GT_HW_ERR_GUC_FATAL, + [XE_GENL_GT_ERROR_FATAL_IDI_PAR] = XE_GT_HW_ERR_IDI_PAR_FATAL, + [XE_GENL_GT_ERROR_FATAL_SQIDI] = XE_GT_HW_ERR_SQIDI_FATAL, + [XE_GENL_GT_ERROR_FATAL_SAMPLER] = XE_GT_HW_ERR_SAMPLER_FATAL, + [XE_GENL_GT_ERROR_FATAL_SLM] = XE_GT_HW_ERR_SLM_FATAL, + [XE_GENL_GT_ERROR_FATAL_EU_IC] = XE_GT_HW_ERR_EU_IC_FATAL, + [XE_GENL_GT_ERROR_FATAL_EU_GRF] = XE_GT_HW_ERR_EU_GRF_FATAL, + [XE_GENL_GT_ERROR_FATAL_FPU] = XE_GT_HW_ERR_FPU_FATAL, + [XE_GENL_GT_ERROR_FATAL_TLB] = XE_GT_HW_ERR_TLB_FATAL, + [XE_GENL_GT_ERROR_FATAL_L3_FABRIC] = XE_GT_HW_ERR_L3_FABRIC_FATAL, + [XE_GENL_GT_ERROR_CORRECTABLE_SUBSLICE] = XE_GT_HW_ERR_SUBSLICE_CORR, + [XE_GENL_GT_ERROR_CORRECTABLE_L3BANK] = XE_GT_HW_ERR_L3BANK_CORR, + [XE_GENL_GT_ERROR_FATAL_SUBSLICE] = XE_GT_HW_ERR_SUBSLICE_FATAL, + [XE_GENL_GT_ERROR_FATAL_L3BANK] = XE_GT_HW_ERR_L3BANK_FATAL, + [XE_GENL_SGUNIT_ERROR_CORRECTABLE] = XE_TILE_HW_ERR_SGUNIT_CORR, + [XE_GENL_SGUNIT_ERROR_NONFATAL] = XE_TILE_HW_ERR_SGUNIT_NONFATAL, + [XE_GENL_SGUNIT_ERROR_FATAL] = XE_TILE_HW_ERR_SGUNIT_FATAL, + [XE_GENL_SOC_ERROR_FATAL_PSF_CSC_0] = XE_SOC_HW_ERR_CSC_PSF_CMD_FATAL, + [XE_GENL_SOC_ERROR_FATAL_PSF_CSC_1] = XE_SOC_HW_ERR_CSC_PSF_CMP_FATAL, + [XE_GENL_SOC_ERROR_FATAL_PSF_CSC_2] = XE_SOC_HW_ERR_CSC_PSF_REQ_FATAL, + [XE_GENL_SOC_ERROR_FATAL_PUNIT] = XE_SOC_HW_ERR_PUNIT_FATAL, + [XE_GENL_PVC_SOC_ERROR_FATAL_PSF_0] = XE_SOC_HW_ERR_PCIE_PSF_CMD_FATAL, + [XE_GENL_PVC_SOC_ERROR_FATAL_PSF_1] = XE_SOC_HW_ERR_PCIE_PSF_CMP_FATAL, + [XE_GENL_PVC_SOC_ERROR_FATAL_PSF_2] = XE_SOC_HW_ERR_PCIE_PSF_REQ_FATAL, + [XE_GENL_PVC_SOC_ERROR_FATAL_CD0_MDFI] = XE_SOC_HW_ERR_ANR_MDFI_FATAL, + [XE_GENL_PVC_SOC_ERROR_FATAL_MDFI_EAST] = XE_SOC_HW_ERR_MDFI_T2T_FATAL, + [XE_GENL_PVC_SOC_ERROR_FATAL_MDFI_SOUTH] = XE_SOC_HW_ERR_MDFI_T2C_FATAL, + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(0, 0)] = XE_SOC_HW_ERR_HBM0_CHNL0_FATAL, + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(0, 1)] = XE_SOC_HW_ERR_HBM0_CHNL1_FATAL, + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(0, 2)] = XE_SOC_HW_ERR_HBM0_CHNL2_FATAL, + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(0, 3)] = XE_SOC_HW_ERR_HBM0_CHNL3_FATAL, + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(0, 4)] = XE_SOC_HW_ERR_HBM0_CHNL4_FATAL, + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(0, 5)] = XE_SOC_HW_ERR_HBM0_CHNL5_FATAL, + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(0, 6)] = XE_SOC_HW_ERR_HBM0_CHNL6_FATAL, + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(0, 7)] = XE_SOC_HW_ERR_HBM0_CHNL7_FATAL, + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(0, 8)] = XE_SOC_HW_ERR_HBM1_CHNL0_FATAL, + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(0, 9)] = XE_SOC_HW_ERR_HBM1_CHNL1_FATAL, + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(0, 10)] = XE_SOC_HW_ERR_HBM1_CHNL2_FATAL, + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(0, 11)] = XE_SOC_HW_ERR_HBM1_CHNL3_FATAL, + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(0, 12)] = XE_SOC_HW_ERR_HBM1_CHNL4_FATAL, + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(0, 13)] = XE_SOC_HW_ERR_HBM1_CHNL5_FATAL, + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(0, 14)] = XE_SOC_HW_ERR_HBM1_CHNL6_FATAL, + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(0, 15)] = XE_SOC_HW_ERR_HBM1_CHNL7_FATAL, + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(1, 0)] = XE_SOC_HW_ERR_HBM2_CHNL0_FATAL, + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(1, 1)] = XE_SOC_HW_ERR_HBM2_CHNL1_FATAL, + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(1, 2)] = XE_SOC_HW_ERR_HBM2_CHNL2_FATAL, + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(1, 3)] = XE_SOC_HW_ERR_HBM2_CHNL3_FATAL, + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(1, 4)] = XE_SOC_HW_ERR_HBM2_CHNL4_FATAL, + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(1, 5)] = XE_SOC_HW_ERR_HBM2_CHNL5_FATAL, + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(1, 6)] = XE_SOC_HW_ERR_HBM2_CHNL6_FATAL, + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(1, 7)] = XE_SOC_HW_ERR_HBM2_CHNL7_FATAL, + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(1, 8)] = XE_SOC_HW_ERR_HBM3_CHNL0_FATAL, + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(1, 9)] = XE_SOC_HW_ERR_HBM3_CHNL1_FATAL, + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(1, 10)] = XE_SOC_HW_ERR_HBM3_CHNL2_FATAL, + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(1, 11)] = XE_SOC_HW_ERR_HBM3_CHNL3_FATAL, + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(1, 12)] = XE_SOC_HW_ERR_HBM3_CHNL4_FATAL, + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(1, 13)] = XE_SOC_HW_ERR_HBM3_CHNL5_FATAL, + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(1, 14)] = XE_SOC_HW_ERR_HBM3_CHNL6_FATAL, + [XE_GENL_PVC_SOC_ERROR_FATAL_HBM(1, 15)] = XE_SOC_HW_ERR_HBM3_CHNL7_FATAL, + [XE_GENL_GSC_ERROR_CORRECTABLE_SRAM_ECC] = XE_GSC_HW_ERR_SRAM_CORR, + [XE_GENL_GSC_ERROR_NONFATAL_MIA_SHUTDOWN] = XE_GSC_HW_ERR_MIA_SHUTDOWN_UNCOR, + [XE_GENL_GSC_ERROR_NONFATAL_MIA_INT] = XE_GSC_HW_ERR_MIA_INTERNAL_UNCOR, + [XE_GENL_GSC_ERROR_NONFATAL_SRAM_ECC] = XE_GSC_HW_ERR_SRAM_UNCOR, + [XE_GENL_GSC_ERROR_NONFATAL_WDG_TIMEOUT] = XE_GSC_HW_ERR_WDG_UNCOR, + [XE_GENL_GSC_ERROR_NONFATAL_ROM_PARITY] = XE_GSC_HW_ERR_ROM_PARITY_UNCOR, + [XE_GENL_GSC_ERROR_NONFATAL_UCODE_PARITY] = XE_GSC_HW_ERR_UCODE_PARITY_UNCOR, + [XE_GENL_GSC_ERROR_NONFATAL_GLITCH_DET] = XE_GSC_HW_ERR_VLT_GLITCH_UNCOR, + [XE_GENL_GSC_ERROR_NONFATAL_FUSE_PULL] = XE_GSC_HW_ERR_FUSE_PULL_UNCOR, + [XE_GENL_GSC_ERROR_NONFATAL_FUSE_CRC_CHECK] = XE_GSC_HW_ERR_FUSE_CRC_UNCOR, + [XE_GENL_GSC_ERROR_NONFATAL_FUSE_SELFMBIST] = XE_GSC_HW_ERR_SELF_MBIST_UNCOR, + [XE_GENL_GSC_ERROR_NONFATAL_AON_PARITY] = XE_GSC_HW_ERR_AON_RF_PARITY_UNCOR, +}; + +static unsigned int config_gt_id(const u64 config) +{ + return config >> __XE_PMU_GT_SHIFT; +} + +static u64 config_counter(const u64 config) +{ + return config & ~(~0ULL << __XE_PMU_GT_SHIFT); +} + +static bool is_gt_error(const u64 config) +{ + unsigned int error; + + error = config_counter(config); + if (error <= XE_GENL_GT_ERROR_FATAL_FPU) + return true; + + return false; +} + +static bool is_gt_vector_error(const u64 config) +{ + unsigned int error; + + error = config_counter(config); + if (error >= XE_GENL_GT_ERROR_FATAL_TLB && + error <= XE_GENL_GT_ERROR_FATAL_L3BANK) + return true; + + return false; +} + +static bool is_pvc_invalid_gt_errors(const u64 config) +{ + switch (config_counter(config)) { + case XE_GENL_GT_ERROR_CORRECTABLE_L3_SNG: + case XE_GENL_GT_ERROR_CORRECTABLE_SAMPLER: + case XE_GENL_GT_ERROR_FATAL_ARR_BIST: + case XE_GENL_GT_ERROR_FATAL_L3_DOUB: + case XE_GENL_GT_ERROR_FATAL_L3_ECC_CHK: + case XE_GENL_GT_ERROR_FATAL_IDI_PAR: + case XE_GENL_GT_ERROR_FATAL_SQIDI: + case XE_GENL_GT_ERROR_FATAL_SAMPLER: + case XE_GENL_GT_ERROR_FATAL_EU_IC: + return true; + default: + return false; + } +} + +static bool is_gsc_hw_error(const u64 config) +{ + if (config_counter(config) >= XE_GENL_GSC_ERROR_CORRECTABLE_SRAM_ECC && + config_counter(config) <= XE_GENL_GSC_ERROR_NONFATAL_AON_PARITY) + return true; + + return false; +} + +static bool is_soc_error(const u64 config) +{ + if (config_counter(config) >= XE_GENL_SOC_ERROR_FATAL_PSF_CSC_0 && + config_counter(config) <= XE_GENL_PVC_SOC_ERROR_FATAL_HBM(1, 15)) + return true; + + return false; +} + +static int +config_status(struct xe_device *xe, u64 config) +{ + unsigned int gt_id = config_gt_id(config); + struct xe_gt *gt = xe_device_get_gt(xe, gt_id); + + if (!IS_DGFX(xe)) + return -ENODEV; + + if (gt->info.type == XE_GT_TYPE_UNINITIALIZED) + return -ENOENT; + + /* GSC HW ERRORS are present on root tile of + * platform supporting MEMORY SPARING only + */ + if (is_gsc_hw_error(config) && !(xe->info.platform == XE_PVC && !gt_id)) + return -ENODEV; + + /* GT vectors error are valid on Platforms supporting error vectors only */ + if (is_gt_vector_error(config) && xe->info.platform != XE_PVC) + return -ENODEV; + + /* Skip gt errors not supported on pvc */ + if (is_pvc_invalid_gt_errors(config) && xe->info.platform == XE_PVC) + return -ENODEV; + + /* FATAL FPU error is valid on PVC only */ + if (config_counter(config) == XE_GENL_GT_ERROR_FATAL_FPU && + !(xe->info.platform == XE_PVC)) + return -ENODEV; + + if (is_soc_error(config) && !(xe->info.platform == XE_PVC)) + return -ENODEV; + + return (config_counter(config) >= + ARRAY_SIZE(xe_hw_error_map)) ? -ENOENT : 0; +} + +static u64 get_counter_value(struct xe_device *xe, u64 config) { + const unsigned int gt_id = config_gt_id(config); + struct xe_gt *gt = xe_device_get_gt(xe, gt_id); + unsigned int id = config_counter(config); + + if (is_gt_error(config) || is_gt_vector_error(config)) + return gt->errors.count[xe_hw_error_map[id]]; + + return gt->tile->errors.count[xe_hw_error_map[id]]; +} + +int fill_error_details(struct xe_device *xe, struct genl_info *info, struct sk_buff *new_msg) +{ + struct nlattr *entry_attr; + bool counter = false; + struct xe_gt *gt; + int i, j; + + BUILD_BUG_ON(ARRAY_SIZE(xe_hw_error_events) != + ARRAY_SIZE(xe_hw_error_map)); + + if (info->genlhdr->cmd == DRM_RAS_CMD_READ_ALL) + counter = true; + + entry_attr = nla_nest_start(new_msg, DRM_RAS_ATTR_QUERY_REPLY); + if (!entry_attr) + return -EMSGSIZE; + + for_each_gt(gt, xe, j) { + char str[MAX_ERROR_NAME]; + u64 val; + + for (i = 0; i < ARRAY_SIZE(xe_hw_error_events); i++) { + u64 config = XE_HW_ERROR(j, i); + + if (config_status(xe, config)) + continue; + + /* should this be cleared everytime */ + snprintf(str, sizeof(str), "error-gt%d-%s", j, xe_hw_error_events[i]); + + if (nla_put_string(new_msg, DRM_RAS_ATTR_ERROR_NAME, str)) + goto err; + if (nla_put_u64_64bit(new_msg, DRM_RAS_ATTR_ERROR_ID, config, DRM_ATTR_PAD)) + goto err; + if (counter) { + val = get_counter_value(xe, config); + if (nla_put_u64_64bit(new_msg, DRM_RAS_ATTR_ERROR_VALUE, val, DRM_ATTR_PAD)) + goto err; + } + } + } + + nla_nest_end(new_msg, entry_attr); + return 0; +err: + drm_dbg_driver(&xe->drm, "msg buff is small\n"); + nla_nest_cancel(new_msg, entry_attr); + nlmsg_free(new_msg); + + return -EMSGSIZE; +} + +static int xe_genl_list_errors(struct drm_device *drm, struct sk_buff *msg, struct genl_info *info) +{ + struct xe_device *xe = to_xe_device(drm); + size_t msg_size = NLMSG_DEFAULT_SIZE; + struct sk_buff *new_msg; + int retries = 2; + void *usrhdr; + int ret = 0; + + do { + new_msg = drm_genl_alloc_msg(drm, info, msg_size, &usrhdr); + if (!new_msg) + return -ENOMEM; + + ret = fill_error_details(xe, info, new_msg); + if (!ret) + break; + + msg_size += NLMSG_DEFAULT_SIZE; + } while (retries--); + + if (!ret) + ret = drm_genl_send(new_msg, info, usrhdr); + + return ret; } static int xe_genl_read_error(struct drm_device *drm, struct sk_buff *msg, struct genl_info *info) { - return 0; + struct xe_device *xe = to_xe_device(drm); + size_t msg_size = NLMSG_DEFAULT_SIZE; + struct sk_buff *new_msg; + void *usrhdr; + int ret = 0; + int retries = 2; + u64 config, val; + + config = nla_get_u64(info->attrs[DRM_RAS_ATTR_ERROR_ID]); + ret = config_status(xe, config); + if (ret) + return ret; + do { + new_msg = drm_genl_alloc_msg(drm, info, msg_size, &usrhdr); + if (!new_msg) + return -ENOMEM; + + val = get_counter_value(xe, config); + if (nla_put_u64_64bit(new_msg, DRM_RAS_ATTR_ERROR_VALUE, val, DRM_ATTR_PAD)) { + msg_size += NLMSG_DEFAULT_SIZE; + continue; + } + + break; + } while (retries--); + + ret = drm_genl_send(new_msg, info, usrhdr); + + return ret; } /* driver callbacks to DRM netlink commands*/ diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h index c45833defcc7..d9d45adde27c 100644 --- a/include/uapi/drm/xe_drm.h +++ b/include/uapi/drm/xe_drm.h @@ -1102,6 +1102,67 @@ struct drm_xe_vm_madvise { #define XE_PMU_MEDIA_GROUP_BUSY(gt) ___XE_PMU_OTHER(gt, 3) #define XE_PMU_ANY_ENGINE_GROUP_BUSY(gt) ___XE_PMU_OTHER(gt, 4) +/** + * DOC: XE GENL netlink event IDs + * + */ +#define XE_HW_ERROR(gt, id) \ + ((id) | ((__u64)(gt) << __XE_PMU_GT_SHIFT)) + +#define XE_GENL_GT_ERROR_CORRECTABLE_L3_SNG (0) +#define XE_GENL_GT_ERROR_CORRECTABLE_GUC (1) +#define XE_GENL_GT_ERROR_CORRECTABLE_SAMPLER (2) +#define XE_GENL_GT_ERROR_CORRECTABLE_SLM (3) +#define XE_GENL_GT_ERROR_CORRECTABLE_EU_IC (4) +#define XE_GENL_GT_ERROR_CORRECTABLE_EU_GRF (5) +#define XE_GENL_GT_ERROR_FATAL_ARR_BIST (6) +#define XE_GENL_GT_ERROR_FATAL_L3_DOUB (7) +#define XE_GENL_GT_ERROR_FATAL_L3_ECC_CHK (8) +#define XE_GENL_GT_ERROR_FATAL_GUC (9) +#define XE_GENL_GT_ERROR_FATAL_IDI_PAR (10) +#define XE_GENL_GT_ERROR_FATAL_SQIDI (11) +#define XE_GENL_GT_ERROR_FATAL_SAMPLER (12) +#define XE_GENL_GT_ERROR_FATAL_SLM (13) +#define XE_GENL_GT_ERROR_FATAL_EU_IC (14) +#define XE_GENL_GT_ERROR_FATAL_EU_GRF (15) +#define XE_GENL_GT_ERROR_FATAL_FPU (16) +#define XE_GENL_GT_ERROR_FATAL_TLB (17) +#define XE_GENL_GT_ERROR_FATAL_L3_FABRIC (18) +#define XE_GENL_GT_ERROR_CORRECTABLE_SUBSLICE (19) +#define XE_GENL_GT_ERROR_CORRECTABLE_L3BANK (20) +#define XE_GENL_GT_ERROR_FATAL_SUBSLICE (21) +#define XE_GENL_GT_ERROR_FATAL_L3BANK (22) +#define XE_GENL_SGUNIT_ERROR_CORRECTABLE (23) +#define XE_GENL_SGUNIT_ERROR_NONFATAL (24) +#define XE_GENL_SGUNIT_ERROR_FATAL (25) +#define XE_GENL_SOC_ERROR_FATAL_PSF_CSC_0 (26) +#define XE_GENL_SOC_ERROR_FATAL_PSF_CSC_1 (27) +#define XE_GENL_SOC_ERROR_FATAL_PSF_CSC_2 (28) +#define XE_GENL_SOC_ERROR_FATAL_PUNIT (29) +#define XE_GENL_PVC_SOC_ERROR_FATAL_PSF_0 (30) +#define XE_GENL_PVC_SOC_ERROR_FATAL_PSF_1 (31) +#define XE_GENL_PVC_SOC_ERROR_FATAL_PSF_2 (32) +#define XE_GENL_PVC_SOC_ERROR_FATAL_CD0_MDFI (33) +#define XE_GENL_PVC_SOC_ERROR_FATAL_MDFI_EAST (34) +#define XE_GENL_PVC_SOC_ERROR_FATAL_MDFI_SOUTH (35) + +#define XE_GENL_PVC_SOC_ERROR_FATAL_HBM(ss, n)\ + (XE_GENL_PVC_SOC_ERROR_FATAL_MDFI_SOUTH + 0x1 + (ss) * 0x10 + (n)) + +/* 67 is the last ID used by SOC errors */ +#define XE_GENL_GSC_ERROR_CORRECTABLE_SRAM_ECC (68) +#define XE_GENL_GSC_ERROR_NONFATAL_MIA_SHUTDOWN (69) +#define XE_GENL_GSC_ERROR_NONFATAL_MIA_INT (70) +#define XE_GENL_GSC_ERROR_NONFATAL_SRAM_ECC (71) +#define XE_GENL_GSC_ERROR_NONFATAL_WDG_TIMEOUT (72) +#define XE_GENL_GSC_ERROR_NONFATAL_ROM_PARITY (73) +#define XE_GENL_GSC_ERROR_NONFATAL_UCODE_PARITY (74) +#define XE_GENL_GSC_ERROR_NONFATAL_GLITCH_DET (75) +#define XE_GENL_GSC_ERROR_NONFATAL_FUSE_PULL (76) +#define XE_GENL_GSC_ERROR_NONFATAL_FUSE_CRC_CHECK (77) +#define XE_GENL_GSC_ERROR_NONFATAL_FUSE_SELFMBIST (78) +#define XE_GENL_GSC_ERROR_NONFATAL_AON_PARITY (79) + #if defined(__cplusplus) } #endif From patchwork Sun Oct 8 09:26:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aravind Iddamsetty X-Patchwork-Id: 13412668 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 80B73E95A67 for ; Sun, 8 Oct 2023 09:24:02 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 74BF010E102; Sun, 8 Oct 2023 09:24:01 +0000 (UTC) Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.151]) by gabe.freedesktop.org (Postfix) with ESMTPS id A168F10E102; Sun, 8 Oct 2023 09:23:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1696757039; x=1728293039; h=from:to:subject:date:message-id:in-reply-to:references: mime-version:content-transfer-encoding; bh=RCo+T55m+O3hw6gFXNvYg9Y6wQ1kVs74jscZ5+FkoRE=; b=BVYzXPwHHZjU5LZYSlzzPZdAFI3UMUjXppO30pmDdfZRkejRrc/IwQkV 7eEMyc/b6VwsQQxe7ry1JSq3LCIYonChtarGF5jAdq8N6SNiS2wowSPUP FFQ5hzHsdjCcnaxcQ8jzr/WrZdxuU7KNRBUPEa/nWi+VwOEW05U9sQ7eP +WJtF5yHgsaWEqVNiNhpoJqub5NJQZ0wk1/2p9f4wa76J3dFmiJqli5kc WOsrBUdaCpUgxGJuG2910h9DIX9lcA3QYETujIrG+gjShWVPN+x81L9H5 bLOvH+gYiaF6bMqvDAb96vDKve41qNyBsZPuU7LMWpVsceCbm4ia3BdkJ A==; X-IronPort-AV: E=McAfee;i="6600,9927,10856"; a="364278991" X-IronPort-AV: E=Sophos;i="6.03,207,1694761200"; d="scan'208";a="364278991" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Oct 2023 02:23:59 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10856"; a="1084007108" X-IronPort-AV: E=Sophos;i="6.03,207,1694761200"; d="scan'208";a="1084007108" Received: from aravind-dev.iind.intel.com ([10.145.162.146]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Oct 2023 02:23:55 -0700 From: Aravind Iddamsetty To: intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org, alexander.deucher@amd.com, airlied@gmail.com, daniel@ffwll.ch, joonas.lahtinen@linux.intel.com, ogabbay@kernel.org, ttayar@habana.ai, Hawking.Zhang@amd.com, Harish.Kasiviswanathan@amd.com, Felix.Kuehling@amd.com, Luben.Tuikov@amd.com Subject: [RFC v1 4/5] drm/netlink: Define multicast groups Date: Sun, 8 Oct 2023 14:56:15 +0530 Message-Id: <20231008092616.1457995-5-aravind.iddamsetty@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231008092616.1457995-1-aravind.iddamsetty@linux.intel.com> References: <20231008092616.1457995-1-aravind.iddamsetty@linux.intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Netlink subsystem supports event notifications to userspace. we define two multicast groups for correctable and uncorrectable errors to which userspace can subscribe and be notified when any of those errors happen. The group names are local to the driver's genl netlink family. Signed-off-by: Aravind Iddamsetty --- drivers/gpu/drm/drm_netlink.c | 2 ++ include/drm/drm_netlink.h | 10 ++++++++++ include/uapi/drm/drm_netlink.h | 4 ++++ 3 files changed, 16 insertions(+) diff --git a/drivers/gpu/drm/drm_netlink.c b/drivers/gpu/drm/drm_netlink.c index 843b2919b6ed..0c2dd62a9a8e 100644 --- a/drivers/gpu/drm/drm_netlink.c +++ b/drivers/gpu/drm/drm_netlink.c @@ -124,6 +124,8 @@ static void drm_genl_family_init(struct drm_device *dev) dev->drm_genl_family.ops = drm_genl_ops; dev->drm_genl_family.n_ops = ARRAY_SIZE(drm_genl_ops); dev->drm_genl_family.maxattr = DRM_ATTR_MAX; + dev->drm_genl_family.mcgrps = drm_event_mcgrps; + dev->drm_genl_family.n_mcgrps = ARRAY_SIZE(drm_event_mcgrps); dev->drm_genl_family.module = dev->dev->driver->owner; } diff --git a/include/drm/drm_netlink.h b/include/drm/drm_netlink.h index 60f7ee39ae0a..876f73087b18 100644 --- a/include/drm/drm_netlink.h +++ b/include/drm/drm_netlink.h @@ -28,6 +28,16 @@ static const struct nla_policy drm_attr_policy_read_one[DRM_ATTR_MAX + 1] = { [DRM_RAS_ATTR_ERROR_ID] = { .type = NLA_U64 }, }; +enum mcgrps_events { + DRM_GENL_MCAST_CORR_ERR, + DRM_GENL_MCAST_UNCORR_ERR, +}; + +static const struct genl_multicast_group drm_event_mcgrps[] = { + [DRM_GENL_MCAST_CORR_ERR] = { .name = DRM_GENL_MCAST_GROUP_NAME_CORR_ERR, }, + [DRM_GENL_MCAST_UNCORR_ERR] = { .name = DRM_GENL_MCAST_GROUP_NAME_UNCORR_ERR, }, +}; + int drm_genl_register(struct drm_device *dev); void drm_genl_exit(void); int drm_genl_send(struct sk_buff *msg, struct genl_info *info, void *usrhdr); diff --git a/include/uapi/drm/drm_netlink.h b/include/uapi/drm/drm_netlink.h index aab42147a20e..c7a0ce5b4624 100644 --- a/include/uapi/drm/drm_netlink.h +++ b/include/uapi/drm/drm_netlink.h @@ -26,6 +26,8 @@ #define _DRM_NETLINK_H_ #define DRM_GENL_VERSION 1 +#define DRM_GENL_MCAST_GROUP_NAME_CORR_ERR "drm_corr_err" +#define DRM_GENL_MCAST_GROUP_NAME_UNCORR_ERR "drm_uncorr_err" #if defined(__cplusplus) extern "C" { @@ -43,6 +45,8 @@ enum drm_genl_error_cmds { DRM_RAS_CMD_READ_ONE, /** @DRM_RAS_CMD_READ_ALL: Command to get counters of all errors */ DRM_RAS_CMD_READ_ALL, + /** @DRM_RAS_CMD_ERROR_EVENT: Command sent as part of multicast event */ + DRM_RAS_CMD_ERROR_EVENT, __DRM_CMD_MAX, DRM_CMD_MAX = __DRM_CMD_MAX - 1, From patchwork Sun Oct 8 09:26:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aravind Iddamsetty X-Patchwork-Id: 13412669 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 6FF61E95A67 for ; Sun, 8 Oct 2023 09:24:05 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 9F8CF10E09B; Sun, 8 Oct 2023 09:24:04 +0000 (UTC) Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.151]) by gabe.freedesktop.org (Postfix) with ESMTPS id 33E7110E0E7; Sun, 8 Oct 2023 09:24:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1696757043; x=1728293043; h=from:to:subject:date:message-id:in-reply-to:references: mime-version:content-transfer-encoding; bh=IAIHPpflRx8yyRlwFPvu2i4K3FV/bD3IHK6ZmbpZmf0=; b=MtDuRL4/YewwtVDJLO0DjO72ukhjTbFltl2C6udSzyCwdPMW0d8r9aqW TgR65aZ+8wOHsQYp61OltOZwcKVFGlOSdMdfLS8q8oPdli0ZGN7bxJMdz YHwI0f0GMtC0IRy8Uj+jmj0EdDo+G4Mvgg12PAaugf2Hen9tdUWFU7D6W FW4cCm9c/fv6af9hyqpUAe70HL8c3jDIAyRdGSe6HDyYZ+TTiqyXZUTfo 8XL9OI2npY27iNxwC4oOmMvthm4qqJY9DGrtmc4S9dkW6sOfp7hJYsF05 AjBoR1tTI8ElzhWE6g/9G3qrswi+G6T+ytUHL1qrs+XU5kFhtZf7683Lq A==; X-IronPort-AV: E=McAfee;i="6600,9927,10856"; a="364278999" X-IronPort-AV: E=Sophos;i="6.03,207,1694761200"; d="scan'208";a="364278999" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Oct 2023 02:24:03 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10856"; a="1084007117" X-IronPort-AV: E=Sophos;i="6.03,207,1694761200"; d="scan'208";a="1084007117" Received: from aravind-dev.iind.intel.com ([10.145.162.146]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Oct 2023 02:23:59 -0700 From: Aravind Iddamsetty To: intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org, alexander.deucher@amd.com, airlied@gmail.com, daniel@ffwll.ch, joonas.lahtinen@linux.intel.com, ogabbay@kernel.org, ttayar@habana.ai, Hawking.Zhang@amd.com, Harish.Kasiviswanathan@amd.com, Felix.Kuehling@amd.com, Luben.Tuikov@amd.com Subject: [RFC v2 5/5] drm/xe/RAS: send multicast event on occurrence of an error Date: Sun, 8 Oct 2023 14:56:16 +0530 Message-Id: <20231008092616.1457995-6-aravind.iddamsetty@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231008092616.1457995-1-aravind.iddamsetty@linux.intel.com> References: <20231008092616.1457995-1-aravind.iddamsetty@linux.intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Whenever a correctable or an uncorrectable error happens an event is sent to the corresponding listeners of these groups. v2: Rebase Signed-off-by: Aravind Iddamsetty --- drivers/gpu/drm/xe/xe_hw_error.c | 33 ++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/drivers/gpu/drm/xe/xe_hw_error.c b/drivers/gpu/drm/xe/xe_hw_error.c index a777c887a7be..8d52f051c36e 100644 --- a/drivers/gpu/drm/xe/xe_hw_error.c +++ b/drivers/gpu/drm/xe/xe_hw_error.c @@ -773,6 +773,37 @@ xe_soc_hw_error_handler(struct xe_tile *tile, const enum hardware_error hw_err) (HARDWARE_ERROR_MAX << 1) + 1); } +static void +generate_netlink_event(struct xe_device *xe, const enum hardware_error hw_err) +{ + struct sk_buff *msg; + void *hdr; + + if (!xe->drm.drm_genl_family.module) + return; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_ATOMIC); + if (!msg) { + drm_dbg_driver(&xe->drm, "couldn't allocate memory for error multicast event\n"); + return; + } + + hdr = genlmsg_put(msg, 0, 0, &xe->drm.drm_genl_family, 0, DRM_RAS_CMD_ERROR_EVENT); + if (!hdr) { + drm_dbg_driver(&xe->drm, "mutlicast msg buffer is small\n"); + nlmsg_free(msg); + return; + } + + genlmsg_end(msg, hdr); + + genlmsg_multicast(&xe->drm.drm_genl_family, msg, 0, + hw_err ? + DRM_GENL_MCAST_UNCORR_ERR + : DRM_GENL_MCAST_CORR_ERR, + GFP_ATOMIC); +} + static void xe_hw_error_source_handler(struct xe_tile *tile, const enum hardware_error hw_err) { @@ -826,6 +857,8 @@ xe_hw_error_source_handler(struct xe_tile *tile, const enum hardware_error hw_er } xe_mmio_write32(mmio, DEV_ERR_STAT_REG(hw_err), errsrc); + + generate_netlink_event(tile_to_xe(tile), hw_err); unlock: spin_unlock_irqrestore(&tile_to_xe(tile)->irq.lock, flags); }