From patchwork Fri Aug 25 11:55:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Aravind Iddamsetty X-Patchwork-Id: 13365487 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id CAE1BC3DA66 for ; Fri, 25 Aug 2023 11:51:45 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id AFFA610E65F; Fri, 25 Aug 2023 11:51:32 +0000 (UTC) Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.126]) by gabe.freedesktop.org (Postfix) with ESMTPS id 5039F10E652; Fri, 25 Aug 2023 11:51:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1692964285; x=1724500285; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Zsxo/o+PAYdojeeCJ2En9Lh104G0+nK28U2pv1lLqkk=; b=SC2ne8tY8Jpv/Wt283nVK/mNifMPWz5L01yklXIz3xXXQoWV9sgsY7Wl L6Wr+8Q/vb2DoFrbDajChwFLyJGc/m0Bq4NubH22YyEL3/6uSGlt7Zx4h 8dpVVSgKeqwCY6pyXoJqUxHzSd29PRRwWMxp+xOfUwkE5d62/nEDm8faf hBdGNyOLzc8JYa71oi+3+lKoBdGbaaAYbtRnvaq5sKB+4AtXzL6fUjMcV IVtV1Y5MyQ2XwOJsaXgK35acx9QKRB4dUT+9Y3trXSc619qkbvufWTrPq Y3WWWZ0UEiLb9WTB6UZlGb43cATZJ71B80O0XBDgP7GFrpbBdU9wAff+S w==; X-IronPort-AV: E=McAfee;i="6600,9927,10812"; a="359694715" X-IronPort-AV: E=Sophos;i="6.02,195,1688454000"; d="scan'208";a="359694715" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Aug 2023 04:51:25 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.01,202,1684825200"; d="scan'208";a="881153534" Received: from aravind-dev.iind.intel.com ([10.145.162.80]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Aug 2023 04:51:28 -0700 From: Aravind Iddamsetty To: intel-xe@lists.freedesktop.org Subject: [RFC v2 1/5] drm/netlink: Add netlink infrastructure Date: Fri, 25 Aug 2023 17:25:27 +0530 Message-Id: <20230825115531.800574-2-aravind.iddamsetty@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230825115531.800574-1-aravind.iddamsetty@linux.intel.com> References: <20230825115531.800574-1-aravind.iddamsetty@linux.intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Tomer Tayar , dri-devel@lists.freedesktop.org Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Define the netlink registration interface and commands, attributes that can be commonly used across by drm drivers. This patch intends to use the generic netlink family to expose various stats of device. At present it defines some commands that shall be used to expose RAS error counters. v2: define common interfaces to genl netlink subsystem that all drm drivers can leverage.(Tomer Tayar) Cc: Tomer Tayar Signed-off-by: Aravind Iddamsetty --- drivers/gpu/drm/Makefile | 1 + drivers/gpu/drm/drm_drv.c | 7 ++ drivers/gpu/drm/drm_netlink.c | 175 +++++++++++++++++++++++++++++++++ include/drm/drm_device.h | 8 ++ include/drm/drm_drv.h | 14 +++ include/drm/drm_netlink.h | 39 ++++++++ include/uapi/drm/drm_netlink.h | 63 ++++++++++++ 7 files changed, 307 insertions(+) create mode 100644 drivers/gpu/drm/drm_netlink.c create mode 100644 include/drm/drm_netlink.h create mode 100644 include/uapi/drm/drm_netlink.h diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile index 795abcc0ca7e..c91f947edf07 100644 --- a/drivers/gpu/drm/Makefile +++ b/drivers/gpu/drm/Makefile @@ -35,6 +35,7 @@ drm-y := \ drm_mode_object.o \ drm_modes.o \ drm_modeset_lock.o \ + drm_netlink.o \ drm_plane.o \ drm_prime.o \ drm_print.o \ diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c index cee0cc522ed9..c9cfbf76da64 100644 --- a/drivers/gpu/drm/drm_drv.c +++ b/drivers/gpu/drm/drm_drv.c @@ -928,6 +928,12 @@ int drm_dev_register(struct drm_device *dev, unsigned long flags) if (ret) goto err_minors; + if (drm_core_check_feature(dev, DRIVER_NETLINK)) { + ret = drm_genl_register(dev); + if (ret) + goto err_minors; + } + ret = create_compat_control_link(dev); if (ret) goto err_minors; @@ -1062,6 +1068,7 @@ static void drm_core_exit(void) { drm_privacy_screen_lookup_exit(); accel_core_exit(); + drm_genl_exit(); unregister_chrdev(DRM_MAJOR, "drm"); debugfs_remove(drm_debugfs_root); drm_sysfs_destroy(); diff --git a/drivers/gpu/drm/drm_netlink.c b/drivers/gpu/drm/drm_netlink.c new file mode 100644 index 000000000000..e41127963968 --- /dev/null +++ b/drivers/gpu/drm/drm_netlink.c @@ -0,0 +1,175 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2023 Intel Corporation + */ + +#include +#include +#include +#include +#include +#include + +DEFINE_XARRAY(drm_dev_xarray); + +/** + * drm_genl_send - Send genl command + * @msg: socket buffer + * @info: receiver information + * @usrhdr: pointer to user specific header in the message buffer + * + * RETURNS: + * 0 on success and negative error code on failure + */ +int drm_genl_send(struct sk_buff *msg, struct genl_info *info, void *usrhdr) +{ + int ret; + + genlmsg_end(msg, usrhdr); + + ret = genlmsg_reply(msg, info); + if (ret) + nlmsg_free(msg); + + return ret; +} +EXPORT_SYMBOL(drm_genl_send); + +/** + * drm_genl_alloc_msg - allocate genl message buffer + * @dev: drm_device for which the message is being allocated + * @info: receiver information + * @usrhdr: pointer to user specific header in the message buffer + * + * RETURNS: + * pointer to new allocated buffer on success, NULL on failure + */ +struct sk_buff * +drm_genl_alloc_msg(struct drm_device *dev, + struct genl_info *info, + size_t msg_size, void **usrhdr) +{ + struct sk_buff *new_msg; + + new_msg = genlmsg_new(msg_size, GFP_KERNEL); + if (!new_msg) + return new_msg; + + *usrhdr = genlmsg_put_reply(new_msg, info, &dev->drm_genl_family, 0, info->genlhdr->cmd); + if (!*usrhdr) { + nlmsg_free(new_msg); + new_msg = NULL; + } + + return new_msg; +} +EXPORT_SYMBOL(drm_genl_alloc_msg); + +static struct drm_device *genl_to_dev(struct genl_info *info) +{ + return xa_load(&drm_dev_xarray, info->nlhdr->nlmsg_type); +} + +static int drm_genl_list_errors(struct sk_buff *msg, struct genl_info *info) +{ + struct drm_device *dev = genl_to_dev(info); + + if (WARN_ON(!dev->driver->genl_ops[info->genlhdr->cmd].doit)) + return -EOPNOTSUPP; + + return dev->driver->genl_ops[info->genlhdr->cmd].doit(dev, msg, info); +} + +static int drm_genl_read_error(struct sk_buff *msg, struct genl_info *info) +{ + struct drm_device *dev = genl_to_dev(info); + + if (WARN_ON(!dev->driver->genl_ops[info->genlhdr->cmd].doit)) + return -EOPNOTSUPP; + + return dev->driver->genl_ops[info->genlhdr->cmd].doit(dev, msg, info); +} + +/* drm genl operations definition */ +const struct genl_ops drm_genl_ops[] = { + { + .cmd = DRM_RAS_CMD_QUERY, + .doit = drm_genl_list_errors, + .policy = drm_attr_policy_query, + }, + { + .cmd = DRM_RAS_CMD_READ_ONE, + .doit = drm_genl_read_error, + .policy = drm_attr_policy_read_one, + }, + { + .cmd = DRM_RAS_CMD_READ_ALL, + .doit = drm_genl_list_errors, + .policy = drm_attr_policy_query, + }, +}; + +static void drm_genl_family_init(struct drm_device *dev) +{ + /* Use drm primary node name eg: card0 to name the genl family */ + snprintf(dev->drm_genl_family.name, sizeof(dev->drm_genl_family.name), "%s", dev->primary->kdev->kobj.name); + dev->drm_genl_family.version = DRM_GENL_VERSION; + dev->drm_genl_family.parallel_ops = true; + dev->drm_genl_family.ops = drm_genl_ops; + dev->drm_genl_family.n_ops = ARRAY_SIZE(drm_genl_ops); + dev->drm_genl_family.maxattr = DRM_ATTR_MAX; + dev->drm_genl_family.module = dev->dev->driver->owner; +} + +static void drm_genl_deregister(struct drm_device *dev, void *arg) +{ + drm_dbg_driver(dev, "unregistering genl family %s\n", dev->drm_genl_family.name); + + xa_erase(&drm_dev_xarray, dev->drm_genl_family.id); + + genl_unregister_family(&dev->drm_genl_family); +} + +/** + * drm_genl_register - Register genl family + * @dev: drm_device for which genl family needs to be registered + * + * RETURNS: + * 0 on success and negative error code on failure + */ +int drm_genl_register(struct drm_device *dev) +{ + int ret; + + /* + * If driver_genl_ops is not set then it is expected that the driver + * would initialize drm_genl_family. + * + */ + if (dev->driver->genl_ops) + drm_genl_family_init(dev); + + BUG_ON(!dev->drm_genl_family.name); + + ret = genl_register_family(&dev->drm_genl_family); + if (ret < 0) { + drm_warn(dev, "genl family registration failed\n"); + return ret; + } + + drm_dbg_driver(dev, "genl family id %d and name %s\n", dev->drm_genl_family.id, dev->drm_genl_family.name); + + xa_store(&drm_dev_xarray, dev->drm_genl_family.id, dev, GFP_KERNEL); + + ret = drmm_add_action_or_reset(dev, drm_genl_deregister, NULL); + + return ret; +} + +/** + * drm_genl_exit: destroy drm_dev_xarray + */ +void drm_genl_exit(void) +{ + xa_destroy(&drm_dev_xarray); +} diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h index 7cf4afae2e79..a984f34a11a2 100644 --- a/include/drm/drm_device.h +++ b/include/drm/drm_device.h @@ -8,6 +8,7 @@ #include #include +#include struct drm_driver; struct drm_minor; @@ -326,6 +327,13 @@ struct drm_device { */ struct list_head debugfs_list; + /** + * @drm_genl_family: + * + * Generic netlink family registration structure. + */ + struct genl_family drm_genl_family; + /* Everything below here is for legacy driver, never use! */ /* private: */ #if IS_ENABLED(CONFIG_DRM_LEGACY) diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h index b419c59c4bef..df704b5d564a 100644 --- a/include/drm/drm_drv.h +++ b/include/drm/drm_drv.h @@ -104,6 +104,13 @@ enum drm_driver_feature { * acceleration should be handled by two drivers that are connected using auxiliary bus. */ DRIVER_COMPUTE_ACCEL = BIT(7), + /** + * @DRIVER_NETLINK + * + * Driver supports communicating some of the important device stats + * like errors to userspace via netlink messages. + */ + DRIVER_NETLINK = BIT(8), /* IMPORTANT: Below are all the legacy flags, add new ones above. */ @@ -443,6 +450,13 @@ struct drm_driver { */ const struct file_operations *fops; + /** + * @genl_ops: + * + * Drivers private callback to genl commands + */ + const struct driver_genl_ops *genl_ops; + #ifdef CONFIG_DRM_LEGACY /* Everything below here is for legacy driver, never use! */ /* private: */ diff --git a/include/drm/drm_netlink.h b/include/drm/drm_netlink.h new file mode 100644 index 000000000000..60f7ee39ae0a --- /dev/null +++ b/include/drm/drm_netlink.h @@ -0,0 +1,39 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2023 Intel Corporation + */ + +#ifndef __DRM_NETLINK_H__ +#define __DRM_NETLINK_H__ + +#include +#include +#include +#include + +struct drm_device; + +struct driver_genl_ops { + int (*doit)(struct drm_device *dev, + struct sk_buff *skb, + struct genl_info *info); +}; + +/* attribute policies */ +static const struct nla_policy drm_attr_policy_query[DRM_ATTR_MAX + 1] = { + [DRM_RAS_ATTR_REQUEST] = { .type = NLA_U8 }, +}; + +static const struct nla_policy drm_attr_policy_read_one[DRM_ATTR_MAX + 1] = { + [DRM_RAS_ATTR_ERROR_ID] = { .type = NLA_U64 }, +}; + +int drm_genl_register(struct drm_device *dev); +void drm_genl_exit(void); +int drm_genl_send(struct sk_buff *msg, struct genl_info *info, void *usrhdr); +struct sk_buff * +drm_genl_alloc_msg(struct drm_device *dev, + struct genl_info *info, + size_t msg_size, void **usrhdr); +#endif + diff --git a/include/uapi/drm/drm_netlink.h b/include/uapi/drm/drm_netlink.h new file mode 100644 index 000000000000..a3ccc68f782d --- /dev/null +++ b/include/uapi/drm/drm_netlink.h @@ -0,0 +1,63 @@ +/* + * Copyright 2023 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * VA LINUX SYSTEMS AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + */ + +#ifndef _DRM_NETLINK_H_ +#define _DRM_NETLINK_H_ + +#define DRM_GENL_VERSION 1 + +#if defined(__cplusplus) +extern "C" { +#endif + +enum error_cmds { + DRM_CMD_UNSPEC, + /* command to list all errors names with config-id */ + DRM_RAS_CMD_QUERY, + /* command to get a counter for a specific error */ + DRM_RAS_CMD_READ_ONE, + /* command to get counters of all errors */ + DRM_RAS_CMD_READ_ALL, + + __DRM_CMD_MAX, + DRM_CMD_MAX = __DRM_CMD_MAX - 1, +}; + +enum error_attr { + DRM_ATTR_UNSPEC, + DRM_ATTR_PAD = DRM_ATTR_UNSPEC, + DRM_RAS_ATTR_REQUEST, /* NLA_U8 */ + DRM_RAS_ATTR_QUERY_REPLY, /* NLA_NESTED */ + DRM_RAS_ATTR_ERROR_NAME, /* NLA_NUL_STRING */ + DRM_RAS_ATTR_ERROR_ID, /* NLA_U64 */ + DRM_RAS_ATTR_ERROR_VALUE, /* NLA_U64 */ + + __DRM_ATTR_MAX, + DRM_ATTR_MAX = __DRM_ATTR_MAX - 1, +}; + +#if defined(__cplusplus) +} +#endif + +#endif From patchwork Fri Aug 25 11:55:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Aravind Iddamsetty X-Patchwork-Id: 13365486 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 6091CC3DA66 for ; Fri, 25 Aug 2023 11:51:42 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 7D0F610E658; Fri, 25 Aug 2023 11:51:32 +0000 (UTC) Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.126]) by gabe.freedesktop.org (Postfix) with ESMTPS id E104910E657; Fri, 25 Aug 2023 11:51:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1692964287; x=1724500287; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=wKHZ+H0RX22mjKLCpVzDePMWx/0S/BQ56m7W+AHgUPY=; b=C/ysAkikwzA+Vd6KS/HuVaWvlz1bdywRERtRPiz2lHYrfp97y81lD761 6+JV/uR8ry9Ki6KhenuDZfeDLuO0FEPfGBX6GJqI7+01MYaPH7R+XvtCb ZefNw5L8B7XV7GJIqf2s/0EOoJCJqtmmANA/0Ep2+J0LNpytKfaCmfMUk u1g+r4aOE/LTSFVFg6IR6neHx8tXNwOQsw0+cZI9U6Z4071rOO/qN/v1u KNfD+jVL9zAT/SQfLWvZ2BIVKj8a+EjdPz3kx3PATEiLoifExEBO3wOWc xJwx7zSLyHtYgxqrtNs0yh3ejiXhOb1vfC+DYWB4wpwXtXrjhLoUpy/M0 A==; X-IronPort-AV: E=McAfee;i="6600,9927,10812"; a="359694724" X-IronPort-AV: E=Sophos;i="6.02,195,1688454000"; d="scan'208";a="359694724" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Aug 2023 04:51:26 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.01,202,1684825200"; d="scan'208";a="881153538" Received: from aravind-dev.iind.intel.com ([10.145.162.80]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Aug 2023 04:51:30 -0700 From: Aravind Iddamsetty To: intel-xe@lists.freedesktop.org Subject: [RFC v2 2/5] drm/xe/RAS: Register netlink capability Date: Fri, 25 Aug 2023 17:25:28 +0530 Message-Id: <20230825115531.800574-3-aravind.iddamsetty@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230825115531.800574-1-aravind.iddamsetty@linux.intel.com> References: <20230825115531.800574-1-aravind.iddamsetty@linux.intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Tomer Tayar , dri-devel@lists.freedesktop.org Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Register netlink capability with the DRM and register the driver callbacks to DRM RAS netlink commands. v2: Move the netlink registration parts to DRM susbsytem (Tomer Tayar) Cc: Tomer Tayar Signed-off-by: Aravind Iddamsetty --- drivers/gpu/drm/xe/Makefile | 1 + drivers/gpu/drm/xe/xe_device.c | 6 +++++- drivers/gpu/drm/xe/xe_device_types.h | 1 + drivers/gpu/drm/xe/xe_netlink.c | 22 ++++++++++++++++++++++ 4 files changed, 29 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/drm/xe/xe_netlink.c diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile index b84e191ba14f..2b42165bc824 100644 --- a/drivers/gpu/drm/xe/Makefile +++ b/drivers/gpu/drm/xe/Makefile @@ -67,6 +67,7 @@ xe-y += xe_bb.o \ xe_mmio.o \ xe_mocs.o \ xe_module.o \ + xe_netlink.o \ xe_pat.o \ xe_pci.o \ xe_pcode.o \ diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c index 323356a44e7f..42a3b50069b6 100644 --- a/drivers/gpu/drm/xe/xe_device.c +++ b/drivers/gpu/drm/xe/xe_device.c @@ -124,6 +124,8 @@ static void xe_driver_release(struct drm_device *dev) pci_set_drvdata(to_pci_dev(xe->drm.dev), NULL); } +extern const struct driver_genl_ops xe_genl_ops[]; + static struct drm_driver driver = { /* Don't use MTRRs here; the Xserver or userspace app should * deal with them for Intel hardware. @@ -131,7 +133,8 @@ static struct drm_driver driver = { .driver_features = DRIVER_GEM | DRIVER_RENDER | DRIVER_SYNCOBJ | - DRIVER_SYNCOBJ_TIMELINE, + DRIVER_SYNCOBJ_TIMELINE | + DRIVER_NETLINK, .open = xe_file_open, .postclose = xe_file_close, @@ -143,6 +146,7 @@ static struct drm_driver driver = { .dumb_create = xe_bo_dumb_create, .dumb_map_offset = drm_gem_ttm_dumb_map_offset, .release = &xe_driver_release, + .genl_ops = xe_genl_ops, .ioctls = xe_ioctls, .num_ioctls = ARRAY_SIZE(xe_ioctls), diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h index 682ebdd1c09e..b19053fb3130 100644 --- a/drivers/gpu/drm/xe/xe_device_types.h +++ b/drivers/gpu/drm/xe/xe_device_types.h @@ -10,6 +10,7 @@ #include #include +#include #include #include "xe_gt_types.h" diff --git a/drivers/gpu/drm/xe/xe_netlink.c b/drivers/gpu/drm/xe/xe_netlink.c new file mode 100644 index 000000000000..81d785455632 --- /dev/null +++ b/drivers/gpu/drm/xe/xe_netlink.c @@ -0,0 +1,22 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2023 Intel Corporation + */ +#include "xe_device.h" + +static int xe_genl_list_errors(struct drm_device *drm, struct sk_buff *msg, struct genl_info *info) +{ + return 0; +} + +static int xe_genl_read_error(struct drm_device *drm, struct sk_buff *msg, struct genl_info *info) +{ + return 0; +} + +/* driver callbacks to DRM netlink commands*/ +const struct driver_genl_ops xe_genl_ops[] = { + [DRM_RAS_CMD_QUERY] = { .doit = xe_genl_list_errors }, + [DRM_RAS_CMD_READ_ONE] = { .doit = xe_genl_read_error }, + [DRM_RAS_CMD_READ_ALL] = { .doit = xe_genl_list_errors, }, +}; From patchwork Fri Aug 25 11:55:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Aravind Iddamsetty X-Patchwork-Id: 13365485 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 34099C3DA6F for ; Fri, 25 Aug 2023 11:51:37 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 34D0110E65D; Fri, 25 Aug 2023 11:51:31 +0000 (UTC) Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.126]) by gabe.freedesktop.org (Postfix) with ESMTPS id 9320110E657; Fri, 25 Aug 2023 11:51:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1692964288; x=1724500288; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=+bBzkncEVnq5M4u3w5FgLoZ4RUyihV9Rls8xh+Xl73E=; b=PI+TWnFsrkNAtdwzAnEITS5KMcwOZXczfRpccwFAQpwnZqpj8SKUIAqQ TXYZoO/AcQmOuh6x/CHBMdRMAUghKd57tUQSA0FoZV/9IqV94f1iy0r1a NwPIAhbjmwkvBQRGscNaPqNRDuOJTybaZTY3oM+l/Q6NDgSDvTBExvDid 3tRDqdbzH2RpFYoWS8dJvf3of0TwjR4yUgY+RU9Zx7j+/WtYJhxKDfpnS Pw6aJR36dK3g2NkQoL35/UKUTPPldiUctQYm87kNWbQ6ZhUOkjPj9xESK kxv2KTi1zMV7SJ/+MZlAwLuT27pd52FqNJnnaWdTkRlvZX6yDVf7mGwyY g==; X-IronPort-AV: E=McAfee;i="6600,9927,10812"; a="359694737" X-IronPort-AV: E=Sophos;i="6.02,195,1688454000"; d="scan'208";a="359694737" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Aug 2023 04:51:28 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.01,202,1684825200"; d="scan'208";a="881153539" Received: from aravind-dev.iind.intel.com ([10.145.162.80]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Aug 2023 04:51:31 -0700 From: Aravind Iddamsetty To: intel-xe@lists.freedesktop.org Subject: [RFC v2 3/5] drm/xe/RAS: Expose the error counters Date: Fri, 25 Aug 2023 17:25:29 +0530 Message-Id: <20230825115531.800574-4-aravind.iddamsetty@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230825115531.800574-1-aravind.iddamsetty@linux.intel.com> References: <20230825115531.800574-1-aravind.iddamsetty@linux.intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: dri-devel@lists.freedesktop.org Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" We expose the various error counters supported on a hardware via genl subsytem through the registered commands to userspace. The DRM_RAS_CMD_QUERY lists the error names with config id, DRM_RAD_CMD_READ_ONE returns the counter value for the requested config id and the DRM_RAS_CMD_READ_ALL lists the counters for all errors along with their names and config ids. Signed-off-by: Aravind Iddamsetty --- drivers/gpu/drm/xe/xe_netlink.c | 401 +++++++++++++++++++++++++++++++- include/uapi/drm/xe_drm.h | 64 +++++ 2 files changed, 463 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/xe/xe_netlink.c b/drivers/gpu/drm/xe/xe_netlink.c index 81d785455632..b32b8ada6cd0 100644 --- a/drivers/gpu/drm/xe/xe_netlink.c +++ b/drivers/gpu/drm/xe/xe_netlink.c @@ -2,16 +2,413 @@ /* * Copyright © 2023 Intel Corporation */ +#include + #include "xe_device.h" -static int xe_genl_list_errors(struct drm_device *drm, struct sk_buff *msg, struct genl_info *info) +#define MAX_ERROR_NAME 50 + +#define HAS_GT_ERROR_VECTORS(xe) ((xe)->info.has_gt_error_vectors) +#define HAS_MEM_SPARING_SUPPORT(xe) ((xe)->info.has_mem_sparing) + +static const char * const xe_hw_error_events[] = { + [XE_GT_ERROR_CORRECTABLE_L3_SNG] = "correctable-l3-sng", + [XE_GT_ERROR_CORRECTABLE_GUC] = "correctable-guc", + [XE_GT_ERROR_CORRECTABLE_SAMPLER] = "correctable-sampler", + [XE_GT_ERROR_CORRECTABLE_SLM] = "correctable-slm", + [XE_GT_ERROR_CORRECTABLE_EU_IC] = "correctable-eu-ic", + [XE_GT_ERROR_CORRECTABLE_EU_GRF] = "correctable-eu-grf", + [XE_GT_ERROR_FATAL_ARR_BIST] = "fatal-array-bist", + [XE_GT_ERROR_FATAL_L3_DOUB] = "fatal-l3-double", + [XE_GT_ERROR_FATAL_L3_ECC_CHK] = "fatal-l3-ecc-checker", + [XE_GT_ERROR_FATAL_GUC] = "fatal-guc", + [XE_GT_ERROR_FATAL_IDI_PAR] = "fatal-idi-parity", + [XE_GT_ERROR_FATAL_SQIDI] = "fatal-sqidi", + [XE_GT_ERROR_FATAL_SAMPLER] = "fatal-sampler", + [XE_GT_ERROR_FATAL_SLM] = "fatal-slm", + [XE_GT_ERROR_FATAL_EU_IC] = "fatal-eu-ic", + [XE_GT_ERROR_FATAL_EU_GRF] = "fatal-eu-grf", + [XE_GT_ERROR_FATAL_FPU] = "fatal-fpu", + [XE_GT_ERROR_FATAL_TLB] = "fatal-tlb", + [XE_GT_ERROR_FATAL_L3_FABRIC] = "fatal-l3-fabric", + [XE_GT_ERROR_CORRECTABLE_SUBSLICE] = "correctable-subslice", + [XE_GT_ERROR_CORRECTABLE_L3BANK] = "correctable-l3bank", + [XE_GT_ERROR_FATAL_SUBSLICE] = "fatal-subslice", + [XE_GT_ERROR_FATAL_L3BANK] = "fatal-l3bank", + [XE_SGUNIT_ERROR_CORRECTABLE] = "sgunit-correctable", + [XE_SGUNIT_ERROR_NONFATAL] = "sgunit-nonfatal", + [XE_SGUNIT_ERROR_FATAL] = "sgunit-fatal", + [XE_SOC_ERROR_FATAL_PSF_CSC_0] = "soc-fatal-psf-csc-0", + [XE_SOC_ERROR_FATAL_PSF_CSC_1] = "soc-fatal-psf-csc-1", + [XE_SOC_ERROR_FATAL_PSF_CSC_2] = "soc-fatal-psf-csc-2", + [XE_SOC_ERROR_FATAL_PUNIT] = "soc-fatal-punit", + [XE_PVC_SOC_ERROR_FATAL_PSF_0] = "soc-fatal-psf-0", + [XE_PVC_SOC_ERROR_FATAL_PSF_1] = "soc-fatal-psf-1", + [XE_PVC_SOC_ERROR_FATAL_PSF_2] = "soc-fatal-psf-2", + [XE_PVC_SOC_ERROR_FATAL_CD0] = "soc-fatal-cd0", + [XE_PVC_SOC_ERROR_FATAL_CD0_MDFI] = "soc-fatal-cd0-mdfi", + [XE_PVC_SOC_ERROR_FATAL_MDFI_EAST] = "soc-fatal-mdfi-east", + [XE_PVC_SOC_ERROR_FATAL_MDFI_SOUTH] = "soc-fatal-mdfi-south", + [XE_PVC_SOC_ERROR_FATAL_HBM(0, 0)] = "soc-fatal-hbm-ss0-0", + [XE_PVC_SOC_ERROR_FATAL_HBM(0, 1)] = "soc-fatal-hbm-ss0-1", + [XE_PVC_SOC_ERROR_FATAL_HBM(0, 2)] = "soc-fatal-hbm-ss0-2", + [XE_PVC_SOC_ERROR_FATAL_HBM(0, 3)] = "soc-fatal-hbm-ss0-3", + [XE_PVC_SOC_ERROR_FATAL_HBM(0, 4)] = "soc-fatal-hbm-ss0-4", + [XE_PVC_SOC_ERROR_FATAL_HBM(0, 5)] = "soc-fatal-hbm-ss0-5", + [XE_PVC_SOC_ERROR_FATAL_HBM(0, 6)] = "soc-fatal-hbm-ss0-6", + [XE_PVC_SOC_ERROR_FATAL_HBM(0, 7)] = "soc-fatal-hbm-ss0-7", + [XE_PVC_SOC_ERROR_FATAL_HBM(0, 8)] = "soc-fatal-hbm-ss1-0", + [XE_PVC_SOC_ERROR_FATAL_HBM(0, 9)] = "soc-fatal-hbm-ss1-1", + [XE_PVC_SOC_ERROR_FATAL_HBM(0, 10)] = "soc-fatal-hbm-ss1-2", + [XE_PVC_SOC_ERROR_FATAL_HBM(0, 11)] = "soc-fatal-hbm-ss1-3", + [XE_PVC_SOC_ERROR_FATAL_HBM(0, 12)] = "soc-fatal-hbm-ss1-4", + [XE_PVC_SOC_ERROR_FATAL_HBM(0, 13)] = "soc-fatal-hbm-ss1-5", + [XE_PVC_SOC_ERROR_FATAL_HBM(0, 14)] = "soc-fatal-hbm-ss1-6", + [XE_PVC_SOC_ERROR_FATAL_HBM(0, 15)] = "soc-fatal-hbm-ss1-7", + [XE_PVC_SOC_ERROR_FATAL_HBM(1, 0)] = "soc-fatal-hbm-ss2-0", + [XE_PVC_SOC_ERROR_FATAL_HBM(1, 1)] = "soc-fatal-hbm-ss2-1", + [XE_PVC_SOC_ERROR_FATAL_HBM(1, 2)] = "soc-fatal-hbm-ss2-2", + [XE_PVC_SOC_ERROR_FATAL_HBM(1, 3)] = "soc-fatal-hbm-ss2-3", + [XE_PVC_SOC_ERROR_FATAL_HBM(1, 4)] = "soc-fatal-hbm-ss2-4", + [XE_PVC_SOC_ERROR_FATAL_HBM(1, 5)] = "soc-fatal-hbm-ss2-5", + [XE_PVC_SOC_ERROR_FATAL_HBM(1, 6)] = "soc-fatal-hbm-ss2-6", + [XE_PVC_SOC_ERROR_FATAL_HBM(1, 7)] = "soc-fatal-hbm-ss2-7", + [XE_PVC_SOC_ERROR_FATAL_HBM(1, 8)] = "soc-fatal-hbm-ss3-0", + [XE_PVC_SOC_ERROR_FATAL_HBM(1, 9)] = "soc-fatal-hbm-ss3-1", + [XE_PVC_SOC_ERROR_FATAL_HBM(1, 10)] = "soc-fatal-hbm-ss3-2", + [XE_PVC_SOC_ERROR_FATAL_HBM(1, 11)] = "soc-fatal-hbm-ss3-3", + [XE_PVC_SOC_ERROR_FATAL_HBM(1, 12)] = "soc-fatal-hbm-ss3-4", + [XE_PVC_SOC_ERROR_FATAL_HBM(1, 13)] = "soc-fatal-hbm-ss3-5", + [XE_PVC_SOC_ERROR_FATAL_HBM(1, 14)] = "soc-fatal-hbm-ss3-6", + [XE_PVC_SOC_ERROR_FATAL_HBM(1, 15)] = "soc-fatal-hbm-ss3-7", + [XE_GSC_ERROR_CORRECTABLE_SRAM_ECC] = "gsc-correctable-sram-ecc", + [XE_GSC_ERROR_NONFATAL_MIA_SHUTDOWN] = "gsc-nonfatal-mia-shutdown", + [XE_GSC_ERROR_NONFATAL_MIA_INT] = "gsc-nonfatal-mia-int", + [XE_GSC_ERROR_NONFATAL_SRAM_ECC] = "gsc-nonfatal-sram-ecc", + [XE_GSC_ERROR_NONFATAL_WDG_TIMEOUT] = "gsc-nonfatal-wdg-timeout", + [XE_GSC_ERROR_NONFATAL_ROM_PARITY] = "gsc-nonfatal-rom-parity", + [XE_GSC_ERROR_NONFATAL_UCODE_PARITY] = "gsc-nonfatal-ucode-parity", + [XE_GSC_ERROR_NONFATAL_GLITCH_DET] = "gsc-nonfatal-glitch-det", + [XE_GSC_ERROR_NONFATAL_FUSE_PULL] = "gsc-nonfatal-fuse-pull", + [XE_GSC_ERROR_NONFATAL_FUSE_CRC_CHECK] = "gsc-nonfatal-fuse-crc-check", + [XE_GSC_ERROR_NONFATAL_FUSE_SELFMBIST] = "gsc-nonfatal-selfmbist", + [XE_GSC_ERROR_NONFATAL_AON_PARITY] = "gsc-nonfatal-aon-parity", +}; + +static const unsigned long xe_hw_error_map[] = { + [XE_GT_ERROR_CORRECTABLE_L3_SNG] = INTEL_GT_HW_ERROR_COR_L3_SNG, + [XE_GT_ERROR_CORRECTABLE_GUC] = INTEL_GT_HW_ERROR_COR_GUC, + [XE_GT_ERROR_CORRECTABLE_SAMPLER] = INTEL_GT_HW_ERROR_COR_SAMPLER, + [XE_GT_ERROR_CORRECTABLE_SLM] = INTEL_GT_HW_ERROR_COR_SLM, + [XE_GT_ERROR_CORRECTABLE_EU_IC] = INTEL_GT_HW_ERROR_COR_EU_IC, + [XE_GT_ERROR_CORRECTABLE_EU_GRF] = INTEL_GT_HW_ERROR_COR_EU_GRF, + [XE_GT_ERROR_FATAL_ARR_BIST] = INTEL_GT_HW_ERROR_FAT_ARR_BIST, + [XE_GT_ERROR_FATAL_L3_DOUB] = INTEL_GT_HW_ERROR_FAT_L3_DOUB, + [XE_GT_ERROR_FATAL_L3_ECC_CHK] = INTEL_GT_HW_ERROR_FAT_L3_ECC_CHK, + [XE_GT_ERROR_FATAL_GUC] = INTEL_GT_HW_ERROR_FAT_GUC, + [XE_GT_ERROR_FATAL_IDI_PAR] = INTEL_GT_HW_ERROR_FAT_IDI_PAR, + [XE_GT_ERROR_FATAL_SQIDI] = INTEL_GT_HW_ERROR_FAT_SQIDI, + [XE_GT_ERROR_FATAL_SAMPLER] = INTEL_GT_HW_ERROR_FAT_SAMPLER, + [XE_GT_ERROR_FATAL_SLM] = INTEL_GT_HW_ERROR_FAT_SLM, + [XE_GT_ERROR_FATAL_EU_IC] = INTEL_GT_HW_ERROR_FAT_EU_IC, + [XE_GT_ERROR_FATAL_EU_GRF] = INTEL_GT_HW_ERROR_FAT_EU_GRF, + [XE_GT_ERROR_FATAL_FPU] = INTEL_GT_HW_ERROR_FAT_FPU, + [XE_GT_ERROR_FATAL_TLB] = INTEL_GT_HW_ERROR_FAT_TLB, + [XE_GT_ERROR_FATAL_L3_FABRIC] = INTEL_GT_HW_ERROR_FAT_L3_FABRIC, + [XE_GT_ERROR_CORRECTABLE_SUBSLICE] = INTEL_GT_HW_ERROR_COR_SUBSLICE, + [XE_GT_ERROR_CORRECTABLE_L3BANK] = INTEL_GT_HW_ERROR_COR_L3BANK, + [XE_GT_ERROR_FATAL_SUBSLICE] = INTEL_GT_HW_ERROR_FAT_SUBSLICE, + [XE_GT_ERROR_FATAL_L3BANK] = INTEL_GT_HW_ERROR_FAT_L3BANK, + [XE_SGUNIT_ERROR_CORRECTABLE] = HARDWARE_ERROR_CORRECTABLE, + [XE_SGUNIT_ERROR_NONFATAL] = HARDWARE_ERROR_NONFATAL, + [XE_SGUNIT_ERROR_FATAL] = HARDWARE_ERROR_FATAL, + [XE_SOC_ERROR_FATAL_PSF_CSC_0] = SOC_ERR_INDEX(INTEL_GT_SOC_IEH0, INTEL_SOC_REG_LOCAL, HARDWARE_ERROR_FATAL, SOC_PSF_CSC_0), + [XE_SOC_ERROR_FATAL_PSF_CSC_1] = SOC_ERR_INDEX(INTEL_GT_SOC_IEH0, INTEL_SOC_REG_LOCAL, HARDWARE_ERROR_FATAL, SOC_PSF_CSC_1), + [XE_SOC_ERROR_FATAL_PSF_CSC_2] = SOC_ERR_INDEX(INTEL_GT_SOC_IEH0, INTEL_SOC_REG_LOCAL, HARDWARE_ERROR_FATAL, SOC_PSF_CSC_2), + [XE_SOC_ERROR_FATAL_PUNIT] = SOC_ERR_INDEX(INTEL_GT_SOC_IEH0, INTEL_SOC_REG_GLOBAL, HARDWARE_ERROR_FATAL, SOC_PUNIT), + [XE_PVC_SOC_ERROR_FATAL_PSF_0] = SOC_ERR_INDEX(INTEL_GT_SOC_IEH0, INTEL_SOC_REG_LOCAL, HARDWARE_ERROR_FATAL, PVC_SOC_PSF_0), + [XE_PVC_SOC_ERROR_FATAL_PSF_1] = SOC_ERR_INDEX(INTEL_GT_SOC_IEH0, INTEL_SOC_REG_LOCAL, HARDWARE_ERROR_FATAL, PVC_SOC_PSF_1), + [XE_PVC_SOC_ERROR_FATAL_PSF_2] = SOC_ERR_INDEX(INTEL_GT_SOC_IEH0, INTEL_SOC_REG_LOCAL, HARDWARE_ERROR_FATAL, PVC_SOC_PSF_2), + [XE_PVC_SOC_ERROR_FATAL_CD0] = SOC_ERR_INDEX(INTEL_GT_SOC_IEH0, INTEL_SOC_REG_GLOBAL, HARDWARE_ERROR_FATAL, PVC_SOC_CD0), + [XE_PVC_SOC_ERROR_FATAL_CD0_MDFI] = SOC_ERR_INDEX(INTEL_GT_SOC_IEH0, INTEL_SOC_REG_GLOBAL, HARDWARE_ERROR_FATAL, PVC_SOC_CD0_MDFI), + [XE_PVC_SOC_ERROR_FATAL_MDFI_EAST] = SOC_ERR_INDEX(INTEL_GT_SOC_IEH0, INTEL_SOC_REG_LOCAL, HARDWARE_ERROR_FATAL, PVC_SOC_MDFI_EAST), + [XE_PVC_SOC_ERROR_FATAL_MDFI_SOUTH] = SOC_ERR_INDEX(INTEL_GT_SOC_IEH0, INTEL_SOC_REG_LOCAL, HARDWARE_ERROR_FATAL, PVC_SOC_MDFI_SOUTH), + [XE_PVC_SOC_ERROR_FATAL_HBM(0, 0)] = SOC_ERR_INDEX(INTEL_GT_SOC_IEH0, INTEL_SOC_REG_GLOBAL, HARDWARE_ERROR_FATAL, SOC_HBM_SS0_0), + [XE_PVC_SOC_ERROR_FATAL_HBM(0, 1)] = SOC_ERR_INDEX(INTEL_GT_SOC_IEH0, INTEL_SOC_REG_GLOBAL, HARDWARE_ERROR_FATAL, SOC_HBM_SS0_1), + [XE_PVC_SOC_ERROR_FATAL_HBM(0, 2)] = SOC_ERR_INDEX(INTEL_GT_SOC_IEH0, INTEL_SOC_REG_GLOBAL, HARDWARE_ERROR_FATAL, SOC_HBM_SS0_2), + [XE_PVC_SOC_ERROR_FATAL_HBM(0, 3)] = SOC_ERR_INDEX(INTEL_GT_SOC_IEH0, INTEL_SOC_REG_GLOBAL, HARDWARE_ERROR_FATAL, SOC_HBM_SS0_3), + [XE_PVC_SOC_ERROR_FATAL_HBM(0, 4)] = SOC_ERR_INDEX(INTEL_GT_SOC_IEH0, INTEL_SOC_REG_GLOBAL, HARDWARE_ERROR_FATAL, SOC_HBM_SS0_4), + [XE_PVC_SOC_ERROR_FATAL_HBM(0, 5)] = SOC_ERR_INDEX(INTEL_GT_SOC_IEH0, INTEL_SOC_REG_GLOBAL, HARDWARE_ERROR_FATAL, SOC_HBM_SS0_5), + [XE_PVC_SOC_ERROR_FATAL_HBM(0, 6)] = SOC_ERR_INDEX(INTEL_GT_SOC_IEH0, INTEL_SOC_REG_GLOBAL, HARDWARE_ERROR_FATAL, SOC_HBM_SS0_6), + [XE_PVC_SOC_ERROR_FATAL_HBM(0, 7)] = SOC_ERR_INDEX(INTEL_GT_SOC_IEH0, INTEL_SOC_REG_GLOBAL, HARDWARE_ERROR_FATAL, SOC_HBM_SS0_7), + [XE_PVC_SOC_ERROR_FATAL_HBM(0, 8)] = SOC_ERR_INDEX(INTEL_GT_SOC_IEH0, INTEL_SOC_REG_GLOBAL, HARDWARE_ERROR_FATAL, PVC_SOC_HBM_SS1_0), + [XE_PVC_SOC_ERROR_FATAL_HBM(0, 9)] = SOC_ERR_INDEX(INTEL_GT_SOC_IEH0, INTEL_SOC_REG_GLOBAL, HARDWARE_ERROR_FATAL, PVC_SOC_HBM_SS1_1), + [XE_PVC_SOC_ERROR_FATAL_HBM(0, 10)] = SOC_ERR_INDEX(INTEL_GT_SOC_IEH0, INTEL_SOC_REG_GLOBAL, HARDWARE_ERROR_FATAL, PVC_SOC_HBM_SS1_2), + [XE_PVC_SOC_ERROR_FATAL_HBM(0, 11)] = SOC_ERR_INDEX(INTEL_GT_SOC_IEH0, INTEL_SOC_REG_GLOBAL, HARDWARE_ERROR_FATAL, PVC_SOC_HBM_SS1_3), + [XE_PVC_SOC_ERROR_FATAL_HBM(0, 12)] = SOC_ERR_INDEX(INTEL_GT_SOC_IEH0, INTEL_SOC_REG_GLOBAL, HARDWARE_ERROR_FATAL, PVC_SOC_HBM_SS1_4), + [XE_PVC_SOC_ERROR_FATAL_HBM(0, 13)] = SOC_ERR_INDEX(INTEL_GT_SOC_IEH0, INTEL_SOC_REG_GLOBAL, HARDWARE_ERROR_FATAL, PVC_SOC_HBM_SS1_5), + [XE_PVC_SOC_ERROR_FATAL_HBM(0, 14)] = SOC_ERR_INDEX(INTEL_GT_SOC_IEH0, INTEL_SOC_REG_GLOBAL, HARDWARE_ERROR_FATAL, PVC_SOC_HBM_SS1_6), + [XE_PVC_SOC_ERROR_FATAL_HBM(0, 15)] = SOC_ERR_INDEX(INTEL_GT_SOC_IEH0, INTEL_SOC_REG_GLOBAL, HARDWARE_ERROR_FATAL, PVC_SOC_HBM_SS1_7), + [XE_PVC_SOC_ERROR_FATAL_HBM(1, 0)] = SOC_ERR_INDEX(INTEL_GT_SOC_IEH1, INTEL_SOC_REG_GLOBAL, HARDWARE_ERROR_FATAL, PVC_SOC_HBM_SS2_0), + [XE_PVC_SOC_ERROR_FATAL_HBM(1, 1)] = SOC_ERR_INDEX(INTEL_GT_SOC_IEH1, INTEL_SOC_REG_GLOBAL, HARDWARE_ERROR_FATAL, PVC_SOC_HBM_SS2_1), + [XE_PVC_SOC_ERROR_FATAL_HBM(1, 2)] = SOC_ERR_INDEX(INTEL_GT_SOC_IEH1, INTEL_SOC_REG_GLOBAL, HARDWARE_ERROR_FATAL, PVC_SOC_HBM_SS2_2), + [XE_PVC_SOC_ERROR_FATAL_HBM(1, 3)] = SOC_ERR_INDEX(INTEL_GT_SOC_IEH1, INTEL_SOC_REG_GLOBAL, HARDWARE_ERROR_FATAL, PVC_SOC_HBM_SS2_3), + [XE_PVC_SOC_ERROR_FATAL_HBM(1, 4)] = SOC_ERR_INDEX(INTEL_GT_SOC_IEH1, INTEL_SOC_REG_GLOBAL, HARDWARE_ERROR_FATAL, PVC_SOC_HBM_SS2_4), + [XE_PVC_SOC_ERROR_FATAL_HBM(1, 5)] = SOC_ERR_INDEX(INTEL_GT_SOC_IEH1, INTEL_SOC_REG_GLOBAL, HARDWARE_ERROR_FATAL, PVC_SOC_HBM_SS2_5), + [XE_PVC_SOC_ERROR_FATAL_HBM(1, 6)] = SOC_ERR_INDEX(INTEL_GT_SOC_IEH1, INTEL_SOC_REG_GLOBAL, HARDWARE_ERROR_FATAL, PVC_SOC_HBM_SS2_6), + [XE_PVC_SOC_ERROR_FATAL_HBM(1, 7)] = SOC_ERR_INDEX(INTEL_GT_SOC_IEH1, INTEL_SOC_REG_GLOBAL, HARDWARE_ERROR_FATAL, PVC_SOC_HBM_SS2_7), + [XE_PVC_SOC_ERROR_FATAL_HBM(1, 8)] = SOC_ERR_INDEX(INTEL_GT_SOC_IEH1, INTEL_SOC_REG_GLOBAL, HARDWARE_ERROR_FATAL, PVC_SOC_HBM_SS3_0), + [XE_PVC_SOC_ERROR_FATAL_HBM(1, 9)] = SOC_ERR_INDEX(INTEL_GT_SOC_IEH1, INTEL_SOC_REG_GLOBAL, HARDWARE_ERROR_FATAL, PVC_SOC_HBM_SS3_1), + [XE_PVC_SOC_ERROR_FATAL_HBM(1, 10)] = SOC_ERR_INDEX(INTEL_GT_SOC_IEH1, INTEL_SOC_REG_GLOBAL, HARDWARE_ERROR_FATAL, PVC_SOC_HBM_SS3_2), + [XE_PVC_SOC_ERROR_FATAL_HBM(1, 11)] = SOC_ERR_INDEX(INTEL_GT_SOC_IEH1, INTEL_SOC_REG_GLOBAL, HARDWARE_ERROR_FATAL, PVC_SOC_HBM_SS3_3), + [XE_PVC_SOC_ERROR_FATAL_HBM(1, 12)] = SOC_ERR_INDEX(INTEL_GT_SOC_IEH1, INTEL_SOC_REG_GLOBAL, HARDWARE_ERROR_FATAL, PVC_SOC_HBM_SS3_4), + [XE_PVC_SOC_ERROR_FATAL_HBM(1, 13)] = SOC_ERR_INDEX(INTEL_GT_SOC_IEH1, INTEL_SOC_REG_GLOBAL, HARDWARE_ERROR_FATAL, PVC_SOC_HBM_SS3_5), + [XE_PVC_SOC_ERROR_FATAL_HBM(1, 14)] = SOC_ERR_INDEX(INTEL_GT_SOC_IEH1, INTEL_SOC_REG_GLOBAL, HARDWARE_ERROR_FATAL, PVC_SOC_HBM_SS3_6), + [XE_PVC_SOC_ERROR_FATAL_HBM(1, 15)] = SOC_ERR_INDEX(INTEL_GT_SOC_IEH1, INTEL_SOC_REG_GLOBAL, HARDWARE_ERROR_FATAL, PVC_SOC_HBM_SS3_7), + [XE_GSC_ERROR_CORRECTABLE_SRAM_ECC] = INTEL_GSC_HW_ERROR_COR_SRAM_ECC, + [XE_GSC_ERROR_NONFATAL_MIA_SHUTDOWN] = INTEL_GSC_HW_ERROR_UNCOR_MIA_SHUTDOWN, + [XE_GSC_ERROR_NONFATAL_MIA_INT] = INTEL_GSC_HW_ERROR_UNCOR_MIA_INT, + [XE_GSC_ERROR_NONFATAL_SRAM_ECC] = INTEL_GSC_HW_ERROR_UNCOR_SRAM_ECC, + [XE_GSC_ERROR_NONFATAL_WDG_TIMEOUT] = INTEL_GSC_HW_ERROR_UNCOR_WDG_TIMEOUT, + [XE_GSC_ERROR_NONFATAL_ROM_PARITY] = INTEL_GSC_HW_ERROR_UNCOR_ROM_PARITY, + [XE_GSC_ERROR_NONFATAL_UCODE_PARITY] = INTEL_GSC_HW_ERROR_UNCOR_UCODE_PARITY, + [XE_GSC_ERROR_NONFATAL_GLITCH_DET] = INTEL_GSC_HW_ERROR_UNCOR_GLITCH_DET, + [XE_GSC_ERROR_NONFATAL_FUSE_PULL] = INTEL_GSC_HW_ERROR_UNCOR_FUSE_PULL, + [XE_GSC_ERROR_NONFATAL_FUSE_CRC_CHECK] = INTEL_GSC_HW_ERROR_UNCOR_FUSE_CRC_CHECK, + [XE_GSC_ERROR_NONFATAL_FUSE_SELFMBIST] = INTEL_GSC_HW_ERROR_UNCOR_SELFMBIST, + [XE_GSC_ERROR_NONFATAL_AON_PARITY] = INTEL_GSC_HW_ERROR_UNCOR_AON_PARITY, +}; + +static unsigned int config_gt_id(const u64 config) +{ + return config >> __XE_GT_SHIFT; +} + +static u64 config_counter(const u64 config) +{ + return config & ~(~0ULL << __XE_GT_SHIFT); +} + +static bool is_gt_vector_error(const u64 config) +{ + unsigned int error; + + error = config_counter(config); + if (error >= XE_GT_ERROR_FATAL_TLB && + error <= XE_GT_ERROR_FATAL_L3BANK) + return true; + + return false; +} + +static bool is_pvc_invalid_gt_errors(const u64 config) { + switch (config_counter(config)) { + case XE_GT_ERROR_CORRECTABLE_L3_SNG: + case XE_GT_ERROR_CORRECTABLE_SAMPLER: + case XE_GT_ERROR_FATAL_ARR_BIST: + case XE_GT_ERROR_FATAL_L3_DOUB: + case XE_GT_ERROR_FATAL_L3_ECC_CHK: + case XE_GT_ERROR_FATAL_IDI_PAR: + case XE_GT_ERROR_FATAL_SQIDI: + case XE_GT_ERROR_FATAL_SAMPLER: + case XE_GT_ERROR_FATAL_EU_IC: + return true; + default: + return false; + } +} + +static bool is_gsc_hw_error(const u64 config) +{ + if (config_counter(config) >= XE_GSC_ERROR_CORRECTABLE_SRAM_ECC && + config_counter(config) <= XE_GSC_ERROR_NONFATAL_AON_PARITY) + return true; + + return false; +} + +static bool is_soc_error(const u64 config) +{ + if (config_counter(config) >= XE_SOC_ERROR_FATAL_PSF_CSC_0 && + config_counter(config) <= XE_PVC_SOC_ERROR_FATAL_HBM(1, 15)) + return true; + + return false; +} + +static int +config_status(struct xe_device *xe, u64 config) +{ + unsigned int gt_id = config_gt_id(config); + + if (!IS_DGFX(xe)) + return -ENODEV; + + if (xe->gt[gt_id].info.type == XE_GT_TYPE_UNINITIALIZED) + return -ENOENT; + + /* GSC HW ERRORS are present on root tile of + * platform supporting MEMORY SPARING only + */ + if (is_gsc_hw_error(config) && !(HAS_MEM_SPARING_SUPPORT(xe) && gt_id == 0)) + return -ENODEV; + + /* GT vectors error are valid on Platforms supporting error vectors only */ + if (is_gt_vector_error(config) && !HAS_GT_ERROR_VECTORS(xe)) + return -ENODEV; + + /* Skip gt errors not supported on pvc */ + if (is_pvc_invalid_gt_errors(config) && (xe->info.platform == XE_PVC)) + return -ENODEV; + + /* FATAL FPU error is valid on PVC only */ + if (config_counter(config) == XE_GT_ERROR_FATAL_FPU && + !(xe->info.platform == XE_PVC)) + return -ENODEV; + + if (is_soc_error(config) && !(xe->info.platform == XE_PVC)) + return -ENODEV; + + return (config_counter(config) >= + ARRAY_SIZE(xe_hw_error_map)) ? -ENOENT : 0; +} + +static u64 get_counter_value(struct xe_device *xe, u64 config) +{ + const unsigned int gt_id = config_gt_id(config); + unsigned int id = config_counter(config); + + if (is_soc_error(config)) + return xa_to_value(xa_load(&xe->gt[gt_id].errors.soc, xe_hw_error_map[id])); + else if (is_gsc_hw_error(config)) + return xe->gt[gt_id].errors.gsc_hw[xe_hw_error_map[id]]; + else if (id >= XE_SGUNIT_ERROR_CORRECTABLE && + id <= XE_SGUNIT_ERROR_FATAL) + return xe->gt[gt_id].errors.sgunit[xe_hw_error_map[id]]; + else + return xe->gt[gt_id].errors.hw[xe_hw_error_map[id]]; + return 0; } -static int xe_genl_read_error(struct drm_device *drm, struct sk_buff *msg, struct genl_info *info) +int fill_error_details(struct xe_device *xe, struct genl_info *info, struct sk_buff *new_msg) { + struct nlattr *entry_attr; + bool counter = false; + struct xe_gt *gt; + int i, j; + + BUILD_BUG_ON(ARRAY_SIZE(xe_hw_error_events) != + ARRAY_SIZE(xe_hw_error_map)); + + if (info->genlhdr->cmd == DRM_RAS_CMD_READ_ALL) + counter = true; + + entry_attr = nla_nest_start(new_msg, DRM_RAS_ATTR_QUERY_REPLY); + if (!entry_attr) + return -EMSGSIZE; + + for_each_gt(gt, xe, j) { + char str[MAX_ERROR_NAME]; + u64 val; + + for (i = 0; i < ARRAY_SIZE(xe_hw_error_events); i++) { + u64 config = XE_HW_ERROR(j, i); + + if (config_status(xe, config)) + continue; + + /* should this be cleared everytime */ + snprintf(str, sizeof(str), "error-gt%d-%s", j, xe_hw_error_events[i]); + + if (nla_put_string(new_msg, DRM_RAS_ATTR_ERROR_NAME, str)) + goto err; + if (nla_put_u64_64bit(new_msg, DRM_RAS_ATTR_ERROR_ID, config, DRM_ATTR_PAD)) + goto err; + if (counter) { + val = get_counter_value(xe, config); + if (nla_put_u64_64bit(new_msg, DRM_RAS_ATTR_ERROR_VALUE, val, DRM_ATTR_PAD)) + goto err; + } + } + } + + nla_nest_end(new_msg, entry_attr); + return 0; +err: + drm_dbg_driver(&xe->drm, "msg buff is small\n"); + nla_nest_cancel(new_msg, entry_attr); + nlmsg_free(new_msg); + + return -EMSGSIZE; +} + +static int xe_genl_list_errors(struct drm_device *drm, struct sk_buff *msg, struct genl_info *info) +{ + struct xe_device *xe = to_xe_device(drm); + size_t msg_size = NLMSG_DEFAULT_SIZE; + struct sk_buff *new_msg; + int retries = 2; + void *usrhdr; + int ret = 0; + + if (GENL_REQ_ATTR_CHECK(info, DRM_RAS_ATTR_REQUEST)) + return -EINVAL; + + do { + new_msg = drm_genl_alloc_msg(drm, info, msg_size, &usrhdr); + if (!new_msg) + return -ENOMEM; + + ret = fill_error_details(xe, info, new_msg); + if (!ret) + break; + + msg_size += NLMSG_DEFAULT_SIZE; + } while (retries--); + + if (!ret) + ret = drm_genl_send(new_msg, info, usrhdr); + + return ret; +} + +static int xe_genl_read_error(struct drm_device *drm, struct sk_buff *msg, struct genl_info *info) +{ + struct xe_device *xe = to_xe_device(drm); + size_t msg_size = NLMSG_DEFAULT_SIZE; + struct sk_buff *new_msg; + void *usrhdr; + int ret = 0; + int retries = 2; + u64 config, val; + + if (GENL_REQ_ATTR_CHECK(info, DRM_RAS_ATTR_ERROR_ID)) + return -EINVAL; + + config = nla_get_u64(info->attrs[DRM_RAS_ATTR_ERROR_ID]); + ret = config_status(xe, config); + if (ret) + return ret; + do { + new_msg = drm_genl_alloc_msg(drm, info, msg_size, &usrhdr); + if (!new_msg) + return -ENOMEM; + + val = get_counter_value(xe, config); + if (nla_put_u64_64bit(new_msg, DRM_RAS_ATTR_ERROR_VALUE, val, DRM_ATTR_PAD)) { + msg_size += NLMSG_DEFAULT_SIZE; + continue; + } + + break; + } while (retries--); + + ret = drm_genl_send(new_msg, info, usrhdr); + + return ret; } /* driver callbacks to DRM netlink commands*/ diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h index b0b80aae3ee8..8ca843003f3e 100644 --- a/include/uapi/drm/xe_drm.h +++ b/include/uapi/drm/xe_drm.h @@ -801,6 +801,70 @@ struct drm_xe_vm_madvise { __u64 reserved[2]; }; +/* + * HW error IDs + */ + +#define __XE_GT_SHIFT (56) + +#define XE_HW_ERROR(gt, id) \ + ((id) | ((__u64)(gt) << __XE_GT_SHIFT)) + +#define XE_GT_ERROR_CORRECTABLE_L3_SNG (0) +#define XE_GT_ERROR_CORRECTABLE_GUC (1) +#define XE_GT_ERROR_CORRECTABLE_SAMPLER (2) +#define XE_GT_ERROR_CORRECTABLE_SLM (3) +#define XE_GT_ERROR_CORRECTABLE_EU_IC (4) +#define XE_GT_ERROR_CORRECTABLE_EU_GRF (5) +#define XE_GT_ERROR_FATAL_ARR_BIST (6) +#define XE_GT_ERROR_FATAL_L3_DOUB (7) +#define XE_GT_ERROR_FATAL_L3_ECC_CHK (8) +#define XE_GT_ERROR_FATAL_GUC (9) +#define XE_GT_ERROR_FATAL_IDI_PAR (10) +#define XE_GT_ERROR_FATAL_SQIDI (11) +#define XE_GT_ERROR_FATAL_SAMPLER (12) +#define XE_GT_ERROR_FATAL_SLM (13) +#define XE_GT_ERROR_FATAL_EU_IC (14) +#define XE_GT_ERROR_FATAL_EU_GRF (15) +#define XE_GT_ERROR_FATAL_FPU (16) +#define XE_GT_ERROR_FATAL_TLB (17) +#define XE_GT_ERROR_FATAL_L3_FABRIC (18) +#define XE_GT_ERROR_CORRECTABLE_SUBSLICE (19) +#define XE_GT_ERROR_CORRECTABLE_L3BANK (20) +#define XE_GT_ERROR_FATAL_SUBSLICE (21) +#define XE_GT_ERROR_FATAL_L3BANK (22) +#define XE_SGUNIT_ERROR_CORRECTABLE (23) +#define XE_SGUNIT_ERROR_NONFATAL (24) +#define XE_SGUNIT_ERROR_FATAL (25) +#define XE_SOC_ERROR_FATAL_PSF_CSC_0 (26) +#define XE_SOC_ERROR_FATAL_PSF_CSC_1 (27) +#define XE_SOC_ERROR_FATAL_PSF_CSC_2 (28) +#define XE_SOC_ERROR_FATAL_PUNIT (29) +#define XE_PVC_SOC_ERROR_FATAL_PSF_0 (30) +#define XE_PVC_SOC_ERROR_FATAL_PSF_1 (31) +#define XE_PVC_SOC_ERROR_FATAL_PSF_2 (32) +#define XE_PVC_SOC_ERROR_FATAL_CD0 (33) +#define XE_PVC_SOC_ERROR_FATAL_CD0_MDFI (34) +#define XE_PVC_SOC_ERROR_FATAL_MDFI_EAST (35) +#define XE_PVC_SOC_ERROR_FATAL_MDFI_SOUTH (36) + +#define XE_PVC_SOC_ERROR_FATAL_HBM(ss, n)\ + (XE_PVC_SOC_ERROR_FATAL_MDFI_SOUTH + 0x1 + (ss) * 0x10 + (n)) + +/* 68 is the last ID used by SOC errors */ +#define XE_GSC_ERROR_CORRECTABLE_SRAM_ECC (69) +#define XE_GSC_ERROR_NONFATAL_MIA_SHUTDOWN (70) +#define XE_GSC_ERROR_NONFATAL_MIA_INT (71) +#define XE_GSC_ERROR_NONFATAL_SRAM_ECC (72) +#define XE_GSC_ERROR_NONFATAL_WDG_TIMEOUT (73) +#define XE_GSC_ERROR_NONFATAL_ROM_PARITY (74) +#define XE_GSC_ERROR_NONFATAL_UCODE_PARITY (75) +#define XE_GSC_ERROR_NONFATAL_GLITCH_DET (76) +#define XE_GSC_ERROR_NONFATAL_FUSE_PULL (77) +#define XE_GSC_ERROR_NONFATAL_FUSE_CRC_CHECK (78) +#define XE_GSC_ERROR_NONFATAL_FUSE_SELFMBIST (79) +#define XE_GSC_ERROR_NONFATAL_AON_PARITY (80) + #if defined(__cplusplus) } #endif From patchwork Fri Aug 25 11:55:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aravind Iddamsetty X-Patchwork-Id: 13365489 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A8F4EC3DA66 for ; Fri, 25 Aug 2023 11:51:55 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 8E1E010E65B; Fri, 25 Aug 2023 11:51:54 +0000 (UTC) Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.126]) by gabe.freedesktop.org (Postfix) with ESMTPS id F186410E657; Fri, 25 Aug 2023 11:51:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1692964290; x=1724500290; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=pFxauHBh+GSrYfgND+Inj6pnrsaswwoQUZB/c9h2j2Y=; b=EDX3J3Trx664EAY084PYQWjGNYME1JVf8kP4cAhzhsJirjjoSHfrc8cA G34cxRvlzUTL67AJKbyOEPegigqE5ZzQvN2Q/zUkq+vD/f05jimt/h1DH //rbrlahQFKgItle8JgA0HPf+hARQE6siW7Xmpw+o6sN5LqVuuSf3ZHF7 FdK1S6khgFIBM/AVNp836rIigMZHL7QBrHn6UTfh+Uy66gjN2ZC57+EdT ghkmrbc0O7I6dIGxHLg9jn8860ZR/GZZWKW7y9QhQnhtGrL9TudSRDndH nk2TVbjZ0ASgzHz1vrXGx5C1lIJjhL+ss+ba/yQ8xxwjhuN+a6Bn9ufqf A==; X-IronPort-AV: E=McAfee;i="6600,9927,10812"; a="359694745" X-IronPort-AV: E=Sophos;i="6.02,195,1688454000"; d="scan'208";a="359694745" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Aug 2023 04:51:29 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.01,202,1684825200"; d="scan'208";a="881153540" Received: from aravind-dev.iind.intel.com ([10.145.162.80]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Aug 2023 04:51:33 -0700 From: Aravind Iddamsetty To: intel-xe@lists.freedesktop.org Subject: [RFC v2 4/5] drm/netlink: Define multicast groups Date: Fri, 25 Aug 2023 17:25:30 +0530 Message-Id: <20230825115531.800574-5-aravind.iddamsetty@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230825115531.800574-1-aravind.iddamsetty@linux.intel.com> References: <20230825115531.800574-1-aravind.iddamsetty@linux.intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: dri-devel@lists.freedesktop.org Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Netlink subsystem supports event notifications to userspace. we define two multicast groups for correctable and uncorrectable errors to which userspace can subscribe and be notified when any of those errors happen. The group names are local to the driver's genl netlink family. Signed-off-by: Aravind Iddamsetty --- drivers/gpu/drm/drm_netlink.c | 2 ++ include/drm/drm_netlink.h | 10 ++++++++++ include/uapi/drm/drm_netlink.h | 3 +++ 3 files changed, 15 insertions(+) diff --git a/drivers/gpu/drm/drm_netlink.c b/drivers/gpu/drm/drm_netlink.c index e41127963968..6895b8741eaf 100644 --- a/drivers/gpu/drm/drm_netlink.c +++ b/drivers/gpu/drm/drm_netlink.c @@ -118,6 +118,8 @@ static void drm_genl_family_init(struct drm_device *dev) dev->drm_genl_family.ops = drm_genl_ops; dev->drm_genl_family.n_ops = ARRAY_SIZE(drm_genl_ops); dev->drm_genl_family.maxattr = DRM_ATTR_MAX; + dev->drm_genl_family.mcgrps = drm_event_mcgrps; + dev->drm_genl_family.n_mcgrps = ARRAY_SIZE(drm_event_mcgrps); dev->drm_genl_family.module = dev->dev->driver->owner; } diff --git a/include/drm/drm_netlink.h b/include/drm/drm_netlink.h index 60f7ee39ae0a..876f73087b18 100644 --- a/include/drm/drm_netlink.h +++ b/include/drm/drm_netlink.h @@ -28,6 +28,16 @@ static const struct nla_policy drm_attr_policy_read_one[DRM_ATTR_MAX + 1] = { [DRM_RAS_ATTR_ERROR_ID] = { .type = NLA_U64 }, }; +enum mcgrps_events { + DRM_GENL_MCAST_CORR_ERR, + DRM_GENL_MCAST_UNCORR_ERR, +}; + +static const struct genl_multicast_group drm_event_mcgrps[] = { + [DRM_GENL_MCAST_CORR_ERR] = { .name = DRM_GENL_MCAST_GROUP_NAME_CORR_ERR, }, + [DRM_GENL_MCAST_UNCORR_ERR] = { .name = DRM_GENL_MCAST_GROUP_NAME_UNCORR_ERR, }, +}; + int drm_genl_register(struct drm_device *dev); void drm_genl_exit(void); int drm_genl_send(struct sk_buff *msg, struct genl_info *info, void *usrhdr); diff --git a/include/uapi/drm/drm_netlink.h b/include/uapi/drm/drm_netlink.h index a3ccc68f782d..b37f952953db 100644 --- a/include/uapi/drm/drm_netlink.h +++ b/include/uapi/drm/drm_netlink.h @@ -25,6 +25,8 @@ #define _DRM_NETLINK_H_ #define DRM_GENL_VERSION 1 +#define DRM_GENL_MCAST_GROUP_NAME_CORR_ERR "drm_corr_err" +#define DRM_GENL_MCAST_GROUP_NAME_UNCORR_ERR "drm_uncorr_err" #if defined(__cplusplus) extern "C" { @@ -38,6 +40,7 @@ enum error_cmds { DRM_RAS_CMD_READ_ONE, /* command to get counters of all errors */ DRM_RAS_CMD_READ_ALL, + DRM_RAS_CMD_ERROR_EVENT, __DRM_CMD_MAX, DRM_CMD_MAX = __DRM_CMD_MAX - 1, From patchwork Fri Aug 25 11:55:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aravind Iddamsetty X-Patchwork-Id: 13365488 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 2E19DC3DA6F for ; Fri, 25 Aug 2023 11:51:54 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 38D4310E65A; Fri, 25 Aug 2023 11:51:53 +0000 (UTC) Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.126]) by gabe.freedesktop.org (Postfix) with ESMTPS id 6467E10E65E; Fri, 25 Aug 2023 11:51:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1692964291; x=1724500291; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=CGFA7cVOFIYfuUhtWfapoqZZyr0cpZq5xvDFfc0NSRg=; b=np2RiRP4Sju5nvEmtzvgYoOrHiltzPktpiNxeAUXdmtCKXPYcDV9jD61 iepBkv0H8Wyzov+ZLbcMFke4UCHyZhaGsFL0bnmPAtczEfpZfb4cEr0Az ftsoXhR6Om7Z8UJgd7WUChj2DsoQG10cEIEF+C9lxdl2XIvEWVFehpE3T OkHNUK1lKXaxDIdthDxNJW9o8/RjplMgvSiSGFSFlRcaKEveMNql0LthV M9C8/182w3t5VOIecha+m4VPf7TkByzWotXersH2t0UDWXiYVlLObipn7 zyap4zKdjvfYzjxzEJ1uloig/awQuz3Bw9oNEidL4BJ7CuczfZ8S2+PAx A==; X-IronPort-AV: E=McAfee;i="6600,9927,10812"; a="359694753" X-IronPort-AV: E=Sophos;i="6.02,195,1688454000"; d="scan'208";a="359694753" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Aug 2023 04:51:31 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.01,202,1684825200"; d="scan'208";a="881153541" Received: from aravind-dev.iind.intel.com ([10.145.162.80]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Aug 2023 04:51:34 -0700 From: Aravind Iddamsetty To: intel-xe@lists.freedesktop.org Subject: [RFC v2 5/5] drm/xe/RAS: send multicast event on occurrence of an error Date: Fri, 25 Aug 2023 17:25:31 +0530 Message-Id: <20230825115531.800574-6-aravind.iddamsetty@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230825115531.800574-1-aravind.iddamsetty@linux.intel.com> References: <20230825115531.800574-1-aravind.iddamsetty@linux.intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: dri-devel@lists.freedesktop.org Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Whenever a correctable or an uncorrectable error happens an event is sent to the corresponding listeners of these groups. Signed-off-by: Aravind Iddamsetty --- drivers/gpu/drm/xe/xe_irq.c | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/drivers/gpu/drm/xe/xe_irq.c b/drivers/gpu/drm/xe/xe_irq.c index 226be96e341a..5eb041d7e07c 100644 --- a/drivers/gpu/drm/xe/xe_irq.c +++ b/drivers/gpu/drm/xe/xe_irq.c @@ -1073,6 +1073,37 @@ xe_gsc_hw_error_handler(struct xe_gt *gt, const enum hardware_error hw_err) xe_mmio_write32(gt, GSC_HEC_CORR_UNCORR_ERR_STATUS(base, hw_err).reg, err_status); } +static void generate_netlink_event(struct xe_gt *gt, const enum hardware_error hw_err) +{ + struct xe_device *xe = gt->xe; + struct sk_buff *msg; + void *hdr; + + if (!xe->drm.drm_genl_family.module) + return; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_ATOMIC); + if (!msg) { + drm_dbg_driver(&xe->drm, "couldn't allocate memory for error multicast event\n"); + return; + } + + hdr = genlmsg_put(msg, 0, 0, &xe->drm.drm_genl_family, 0, DRM_RAS_CMD_ERROR_EVENT); + if (!hdr) { + drm_dbg_driver(&xe->drm, "mutlicast msg buffer is small\n"); + nlmsg_free(msg); + return; + } + + genlmsg_end(msg, hdr); + + genlmsg_multicast(&xe->drm.drm_genl_family, msg, 0, + hw_err ? + DRM_GENL_MCAST_UNCORR_ERR + : DRM_GENL_MCAST_CORR_ERR, + GFP_ATOMIC); +} + static void xe_hw_error_source_handler(struct xe_gt *gt, const enum hardware_error hw_err) { @@ -1103,6 +1134,7 @@ xe_hw_error_source_handler(struct xe_gt *gt, const enum hardware_error hw_err) xe_mmio_write32(gt, DEV_ERR_STAT_REG(hw_err).reg, errsrc); + generate_netlink_event(gt, hw_err); out_unlock: spin_unlock_irqrestore(>_to_xe(gt)->irq.lock, flags); }