From patchwork Thu Feb 27 19:14:50 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Cavitt X-Patchwork-Id: 13995058 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B9D4FC19F32 for ; Thu, 27 Feb 2025 19:15:08 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 6767310EB71; Thu, 27 Feb 2025 19:15:05 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="Z4NZO9MY"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.14]) by gabe.freedesktop.org (Postfix) with ESMTPS id B368C10EB62; Thu, 27 Feb 2025 19:15:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740683703; x=1772219703; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=25X1bI0jpmDxxZZD77ZruB8Cpd2GmsaAXiu+gF/RLsQ=; b=Z4NZO9MYxeMJesUOK0LGmhHTJB1Thd/HYpzQlUKkLexd5aScEumgm7W4 kPQVPaPf4kq1Zll2YYsWtsDN3Ha+1ce35qVCWdZQaCMJx6dMLYoSXkpNV 2OaDlUIkM0L1C4slC3FIA3HLhepR8T5dGEre7jga6/JwGIOcAZCHxk5ma 5+1IwU05pEGb9CmfR18t1Qes2vfelW/y3lknEB1+qjcUlPjGlVCe0BMtG aD7cUMPM8Xgdnf3tDItbF6itFmPH+GqM3+U+o9Gg8qkEM0XQhukDog24P 3/K7xDzUpCIndmWhoyPJcOw2/LLzqaFCvHMK6Li3zeU3QAD3Z6NMsgoDq g==; X-CSE-ConnectionGUID: EhkVUnIgROKX7VCQdgpIDA== X-CSE-MsgGUID: QD/klkqYS+KEVyH+PTI5DA== X-IronPort-AV: E=McAfee;i="6700,10204,11358"; a="41850054" X-IronPort-AV: E=Sophos;i="6.13,320,1732608000"; d="scan'208";a="41850054" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa108.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Feb 2025 11:15:00 -0800 X-CSE-ConnectionGUID: G+vJsFOXRjKHRHCD7iSs+g== X-CSE-MsgGUID: FNOLI6YHQoiKtw638DpIhg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,320,1732608000"; d="scan'208";a="117775283" Received: from dut4025lnl.fm.intel.com ([10.105.8.176]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Feb 2025 11:15:00 -0800 From: Jonathan Cavitt To: intel-xe@lists.freedesktop.org Cc: saurabhg.gupta@intel.com, alex.zuo@intel.com, jonathan.cavitt@intel.com, joonas.lahtinen@linux.intel.com, matthew.brost@intel.com, jianxun.zhang@intel.com, dri-devel@lists.freedesktop.org Subject: [PATCH v2 1/8] drm/xe/xe_gt_pagefault: Disallow writes to read-only VMAs Date: Thu, 27 Feb 2025 19:14:50 +0000 Message-ID: <20250227191457.84035-2-jonathan.cavitt@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250227191457.84035-1-jonathan.cavitt@intel.com> References: <20250227191457.84035-1-jonathan.cavitt@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" The page fault handler should reject write/atomic access to read only VMAs. Add code to handle this in handle_pagefault after the VMA lookup. Fixes: 3d420e9fa848 ("drm/xe: Rework GPU page fault handling") Signed-off-by: Jonathan Cavitt Suggested-by: Matthew Brost --- drivers/gpu/drm/xe/xe_gt_pagefault.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c index 17d69039b866..f608a765fa7c 100644 --- a/drivers/gpu/drm/xe/xe_gt_pagefault.c +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c @@ -235,6 +235,11 @@ static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf) goto unlock_vm; } + if (xe_vma_read_only(vma) && pf->access_type != ACCESS_TYPE_READ) { + err = -EPERM; + goto unlock_vm; + } + err = handle_vma_pagefault(gt, pf, vma); unlock_vm: From patchwork Thu Feb 27 19:14:51 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Cavitt X-Patchwork-Id: 13995056 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E297CC19F32 for ; Thu, 27 Feb 2025 19:15:05 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 2FC3F10EB6C; Thu, 27 Feb 2025 19:15:04 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="AnLLDrhH"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.14]) by gabe.freedesktop.org (Postfix) with ESMTPS id 83EA810EB6C; Thu, 27 Feb 2025 19:15:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740683703; x=1772219703; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=kdx43vBPxQuJgAnPefatV3H8gKQrTpuV5MOzEF/ZEe8=; b=AnLLDrhHLRGwjlHRQxSty6MqfLP9hRz279d8iPpWA1jWGGNNHnpZCPfm YAW56q2DKBQCaRGfXC25bu6XfC7OXlWYYqtQhPUfKAC9Ecp1Y9ai8EvyT k8Li2pMux7p8TVv7S6aElmloKwXnFJxs1t81+YbfcGEzBS13A5VUkI9UW t1A6cBGEk6Zwfo3ZK+3PV6j8Mc/EqwevXV+AI43ShQobsCpTm/FfNg9iD Ls0RMbbse5pQzyoJhaWhNrjM2cTXrichpmN4K0Y21H3oz9Tp4xulNRWMP OW2ljlB9rm1lEZsnrf5B2BTFzpqqnFPsZucrP5jAlY1jkiL/qVElzyU20 w==; X-CSE-ConnectionGUID: vENC0brQQpW26qGIv4mKtA== X-CSE-MsgGUID: TsYNegYZQQC/7MiNXS2rJw== X-IronPort-AV: E=McAfee;i="6700,10204,11358"; a="41850055" X-IronPort-AV: E=Sophos;i="6.13,320,1732608000"; d="scan'208";a="41850055" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa108.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Feb 2025 11:15:00 -0800 X-CSE-ConnectionGUID: AQXZ/OMTTCa2bP1kFUfvAw== X-CSE-MsgGUID: rP0mIqMqSWidSGn+qmFaqA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,320,1732608000"; d="scan'208";a="117775285" Received: from dut4025lnl.fm.intel.com ([10.105.8.176]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Feb 2025 11:15:00 -0800 From: Jonathan Cavitt To: intel-xe@lists.freedesktop.org Cc: saurabhg.gupta@intel.com, alex.zuo@intel.com, jonathan.cavitt@intel.com, joonas.lahtinen@linux.intel.com, matthew.brost@intel.com, jianxun.zhang@intel.com, dri-devel@lists.freedesktop.org Subject: [PATCH v2 2/8] drm/xe/xe_exec_queue: Add ID param to exec queue struct Date: Thu, 27 Feb 2025 19:14:51 +0000 Message-ID: <20250227191457.84035-3-jonathan.cavitt@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250227191457.84035-1-jonathan.cavitt@intel.com> References: <20250227191457.84035-1-jonathan.cavitt@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Add the exec queue id to the exec queue struct. This is useful for performing a reverse lookup into the xef->exec_queue xarray. Signed-off-by: Jonathan Cavitt --- drivers/gpu/drm/xe/xe_exec_queue.c | 1 + drivers/gpu/drm/xe/xe_exec_queue_types.h | 2 ++ 2 files changed, 3 insertions(+) diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c index 23a9f519ce1c..4a98a5d0e405 100644 --- a/drivers/gpu/drm/xe/xe_exec_queue.c +++ b/drivers/gpu/drm/xe/xe_exec_queue.c @@ -709,6 +709,7 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data, if (err) goto kill_exec_queue; + q->id = id; args->exec_queue_id = id; return 0; diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h index 6eb7ff091534..088d838218e9 100644 --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h @@ -55,6 +55,8 @@ struct xe_exec_queue { struct xe_vm *vm; /** @class: class of this exec queue */ enum xe_engine_class class; + /** @id: exec queue ID as reported during create ioctl */ + u32 id; /** * @logical_mask: logical mask of where job submitted to exec queue can run */ From patchwork Thu Feb 27 19:14:52 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Cavitt X-Patchwork-Id: 13995055 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 23109C1B087 for ; Thu, 27 Feb 2025 19:15:04 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 8282D10EB62; Thu, 27 Feb 2025 19:15:03 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="kazqhaM0"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.14]) by gabe.freedesktop.org (Postfix) with ESMTPS id B3F0510EB67; Thu, 27 Feb 2025 19:15:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740683703; x=1772219703; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=fV45i9cyJNHdNbDA5RmEG5ukVBv1uPUZvHmU5qUXQFc=; b=kazqhaM0MrwXVAkqA9sio44CoQ8J8HZ/xMmiEOCoPWEM+PuyX8GX6Rkb V0reJHKZKSXedpALoqqNM1c7Uj6xhXlZRm6kKehKBdS8jc2yqP8W18L+N 507ZFGEFYigSiS7skWGr7fCH2Gd1XmPb3s2QpkBMXOwoEsypphNhARtlG pnaiKxB0hSBRoDg4nYCcBKl2dS9MOiZt71S7WgFx4U753P1iCeGR19KRn mT2ojieGP8Yui2RiuYgSxS3OWHUC0ljVOmy9+2hULJhgt8Shz3kGLG1Oa PxzSm4U3WIGHVTGtoMXG9Hl/vEEpGDqL1JDYsBYeYG0oE8ZAlweylKXi5 g==; X-CSE-ConnectionGUID: rmBfl1K5RQ+flrJbhtTrBg== X-CSE-MsgGUID: ul7bazmxTtq22cp0bsSMJg== X-IronPort-AV: E=McAfee;i="6700,10204,11358"; a="41850056" X-IronPort-AV: E=Sophos;i="6.13,320,1732608000"; d="scan'208";a="41850056" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa108.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Feb 2025 11:15:00 -0800 X-CSE-ConnectionGUID: NjaThH+lQ+GrNyzMZhVo7A== X-CSE-MsgGUID: sMyzN8DXSHiGnu1exrA17w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,320,1732608000"; d="scan'208";a="117775288" Received: from dut4025lnl.fm.intel.com ([10.105.8.176]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Feb 2025 11:15:00 -0800 From: Jonathan Cavitt To: intel-xe@lists.freedesktop.org Cc: saurabhg.gupta@intel.com, alex.zuo@intel.com, jonathan.cavitt@intel.com, joonas.lahtinen@linux.intel.com, matthew.brost@intel.com, jianxun.zhang@intel.com, dri-devel@lists.freedesktop.org Subject: [PATCH v2 3/8] drm/xe/xe_gt_pagefault: Migrate pagefault struct to header Date: Thu, 27 Feb 2025 19:14:52 +0000 Message-ID: <20250227191457.84035-4-jonathan.cavitt@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250227191457.84035-1-jonathan.cavitt@intel.com> References: <20250227191457.84035-1-jonathan.cavitt@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Migrate the pagefault struct from xe_gt_pagefault.c to the xe_gt_pagefault.h header file, along with the associated enum values. v2: Normalize names for common header (Matt Brost) Signed-off-by: Jonathan Cavitt --- drivers/gpu/drm/xe/xe_gt_pagefault.c | 43 ++++++---------------------- drivers/gpu/drm/xe/xe_gt_pagefault.h | 28 ++++++++++++++++++ 2 files changed, 36 insertions(+), 35 deletions(-) diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c index f608a765fa7c..07b52d3c1a60 100644 --- a/drivers/gpu/drm/xe/xe_gt_pagefault.c +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c @@ -22,33 +22,6 @@ #include "xe_trace_bo.h" #include "xe_vm.h" -struct pagefault { - u64 page_addr; - u32 asid; - u16 pdata; - u8 vfid; - u8 access_type; - u8 fault_type; - u8 fault_level; - u8 engine_class; - u8 engine_instance; - u8 fault_unsuccessful; - bool trva_fault; -}; - -enum access_type { - ACCESS_TYPE_READ = 0, - ACCESS_TYPE_WRITE = 1, - ACCESS_TYPE_ATOMIC = 2, - ACCESS_TYPE_RESERVED = 3, -}; - -enum fault_type { - NOT_PRESENT = 0, - WRITE_ACCESS_VIOLATION = 1, - ATOMIC_ACCESS_VIOLATION = 2, -}; - struct acc { u64 va_range_base; u32 asid; @@ -60,9 +33,9 @@ struct acc { u8 engine_instance; }; -static bool access_is_atomic(enum access_type access_type) +static bool access_is_atomic(enum xe_pagefault_access_type access_type) { - return access_type == ACCESS_TYPE_ATOMIC; + return access_type == XE_PAGEFAULT_ACCESS_TYPE_ATOMIC; } static bool vma_is_valid(struct xe_tile *tile, struct xe_vma *vma) @@ -125,7 +98,7 @@ static int xe_pf_begin(struct drm_exec *exec, struct xe_vma *vma, return 0; } -static int handle_vma_pagefault(struct xe_gt *gt, struct pagefault *pf, +static int handle_vma_pagefault(struct xe_gt *gt, struct xe_pagefault *pf, struct xe_vma *vma) { struct xe_vm *vm = xe_vma_vm(vma); @@ -204,7 +177,7 @@ static struct xe_vm *asid_to_vm(struct xe_device *xe, u32 asid) return vm; } -static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf) +static int handle_pagefault(struct xe_gt *gt, struct xe_pagefault *pf) { struct xe_device *xe = gt_to_xe(gt); struct xe_vm *vm; @@ -235,7 +208,7 @@ static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf) goto unlock_vm; } - if (xe_vma_read_only(vma) && pf->access_type != ACCESS_TYPE_READ) { + if (xe_vma_read_only(vma) && pf->access_type != XE_PAGEFAULT_ACCESS_TYPE_READ) { err = -EPERM; goto unlock_vm; } @@ -263,7 +236,7 @@ static int send_pagefault_reply(struct xe_guc *guc, return xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 0, 0); } -static void print_pagefault(struct xe_device *xe, struct pagefault *pf) +static void print_pagefault(struct xe_device *xe, struct xe_pagefault *pf) { drm_dbg(&xe->drm, "\n\tASID: %d\n" "\tVFID: %d\n" @@ -283,7 +256,7 @@ static void print_pagefault(struct xe_device *xe, struct pagefault *pf) #define PF_MSG_LEN_DW 4 -static bool get_pagefault(struct pf_queue *pf_queue, struct pagefault *pf) +static bool get_pagefault(struct pf_queue *pf_queue, struct xe_pagefault *pf) { const struct xe_guc_pagefault_desc *desc; bool ret = false; @@ -370,7 +343,7 @@ static void pf_queue_work_func(struct work_struct *w) struct xe_gt *gt = pf_queue->gt; struct xe_device *xe = gt_to_xe(gt); struct xe_guc_pagefault_reply reply = {}; - struct pagefault pf = {}; + struct xe_pagefault pf = {}; unsigned long threshold; int ret; diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.h b/drivers/gpu/drm/xe/xe_gt_pagefault.h index 839c065a5e4c..33616043d17a 100644 --- a/drivers/gpu/drm/xe/xe_gt_pagefault.h +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.h @@ -11,6 +11,34 @@ struct xe_gt; struct xe_guc; +struct xe_pagefault { + u64 page_addr; + u32 asid; + u16 pdata; + u8 vfid; + u8 access_type; + u8 fault_type; + u8 fault_level; + u8 engine_class; + u8 engine_instance; + u8 fault_unsuccessful; + bool prefetch; + bool trva_fault; +}; + +enum xe_pagefault_access_type { + XE_PAGEFAULT_ACCESS_TYPE_READ = 0, + XE_PAGEFAULT_ACCESS_TYPE_WRITE = 1, + XE_PAGEFAULT_ACCESS_TYPE_ATOMIC = 2, + XE_PAGEFAULT_ACCESS_TYPE_RESERVED = 3, +}; + +enum xe_pagefault_type { + XE_PAGEFAULT_TYPE_NOT_PRESENT = 0, + XE_PAGEFAULT_TYPE_WRITE_ACCESS_VIOLATION = 1, + XE_PAGEFAULT_TYPE_ATOMIC_ACCESS_VIOLATION = 2, +}; + int xe_gt_pagefault_init(struct xe_gt *gt); void xe_gt_pagefault_reset(struct xe_gt *gt); int xe_guc_pagefault_handler(struct xe_guc *guc, u32 *msg, u32 len); From patchwork Thu Feb 27 19:14:53 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Cavitt X-Patchwork-Id: 13995060 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 82BDEC282D0 for ; Thu, 27 Feb 2025 19:15:12 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id C2E9910EB7D; Thu, 27 Feb 2025 19:15:11 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="SlVp5NM/"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.14]) by gabe.freedesktop.org (Postfix) with ESMTPS id E553A10EB6C; Thu, 27 Feb 2025 19:15:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740683703; x=1772219703; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ygwgseD+pSXL9WMXnIjtX2LYFLZARZO4ekv2sJ0SsHQ=; b=SlVp5NM/v66iDscMwvRPyzGJ/5lvtu0GQ2PhJQY+r4JPcYQ4+UBpIPhb h4TXVReO+LYVJxaox9+G72mJi4gVEtrzNRwTbXjCkwb/UtV4NjDeeQoOe iKvao9nbXU+cmMTB4dZe6JIi6iZaWw8HDhECkMy1Brx7t5u4cnKBp25EQ Xrdj1D1WUeJEFy8pwX7oYv2mKV9hoDsS5+JCD1dbl9NovcJHMa9E52578 rQ42wtAKVgL/cHqJwR+lw6r7zhhzCVCtFWdHRT9aqg3XLwAQ/4RYVLs67 EGcqhgE/Wsm11wKGib+z2UWEmW+m01vJJTAZil4yeRDEq9VBIgIAcaz6B Q==; X-CSE-ConnectionGUID: UkecLeLFTxKHMaE9nSfV8A== X-CSE-MsgGUID: 7aLVJk2mQ4KMe7zbUmEQzA== X-IronPort-AV: E=McAfee;i="6700,10204,11358"; a="41850057" X-IronPort-AV: E=Sophos;i="6.13,320,1732608000"; d="scan'208";a="41850057" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa108.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Feb 2025 11:15:00 -0800 X-CSE-ConnectionGUID: 2ocHOdmNT8a46KgI2ys5zw== X-CSE-MsgGUID: iWHQHUj5TRSKPlzdGanmXw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,320,1732608000"; d="scan'208";a="117775291" Received: from dut4025lnl.fm.intel.com ([10.105.8.176]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Feb 2025 11:15:00 -0800 From: Jonathan Cavitt To: intel-xe@lists.freedesktop.org Cc: saurabhg.gupta@intel.com, alex.zuo@intel.com, jonathan.cavitt@intel.com, joonas.lahtinen@linux.intel.com, matthew.brost@intel.com, jianxun.zhang@intel.com, dri-devel@lists.freedesktop.org Subject: [PATCH v2 4/8] drm/xe/xe_vm: Add per VM pagefault info Date: Thu, 27 Feb 2025 19:14:53 +0000 Message-ID: <20250227191457.84035-5-jonathan.cavitt@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250227191457.84035-1-jonathan.cavitt@intel.com> References: <20250227191457.84035-1-jonathan.cavitt@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Add additional information to vm so it can report up to the last 50 relevant exec queues to have been banned on it, as well as the last pagefault seen when said exec queues were banned. Since we cannot reasonably associate a pagefault to a specific exec queue, we currently report the last seen pagefault on the associated vm instead. The last pagefault seen per exec queue is saved to the vm, and the pagefault is updated during the pagefault handling process in xe_gt_pagefault. The last seen pagefault is reset once it has been associated to the next banned exec queue. Signed-off-by: Jonathan Cavitt Suggested-by: Matthew Brost --- drivers/gpu/drm/xe/xe_exec_queue.c | 6 +++ drivers/gpu/drm/xe/xe_gt_pagefault.c | 16 +++++++ drivers/gpu/drm/xe/xe_guc_submit.c | 2 + drivers/gpu/drm/xe/xe_vm.c | 69 ++++++++++++++++++++++++++++ drivers/gpu/drm/xe/xe_vm.h | 6 +++ drivers/gpu/drm/xe/xe_vm_types.h | 31 +++++++++++++ 6 files changed, 130 insertions(+) diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c index 4a98a5d0e405..e0764f3dfd76 100644 --- a/drivers/gpu/drm/xe/xe_exec_queue.c +++ b/drivers/gpu/drm/xe/xe_exec_queue.c @@ -712,6 +712,12 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data, q->id = id; args->exec_queue_id = id; + /** + * If an exec queue in the ban list shares the same exec queue + * ID, remove it from the ban list to avoid confusion. + */ + xe_vm_remove_ban_entry(q->vm, q); + return 0; kill_exec_queue: diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c index 07b52d3c1a60..e23b9d33afa5 100644 --- a/drivers/gpu/drm/xe/xe_gt_pagefault.c +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c @@ -335,6 +335,21 @@ int xe_guc_pagefault_handler(struct xe_guc *guc, u32 *msg, u32 len) return full ? -ENOSPC : 0; } +static void save_pagefault_to_vm(struct xe_device *xe, struct xe_pagefault *pf) +{ + struct xe_vm *vm; + + vm = asid_to_vm(xe, pf->asid); + if (IS_ERR(vm)) + return; + + spin_lock(&vm->pf.lock); + if (!vm->pf.info) + vm->pf.info = kzalloc(sizeof(*pf), GFP_KERNEL); + memcpy(vm->pf.info, pf, sizeof(*pf)); + spin_unlock(&vm->pf.lock); +} + #define USM_QUEUE_MAX_RUNTIME_MS 20 static void pf_queue_work_func(struct work_struct *w) @@ -353,6 +368,7 @@ static void pf_queue_work_func(struct work_struct *w) ret = handle_pagefault(gt, &pf); if (unlikely(ret)) { print_pagefault(xe, &pf); + save_pagefault_to_vm(xe, &pf); pf.fault_unsuccessful = 1; drm_dbg(&xe->drm, "Fault response: Unsuccessful %d\n", ret); } diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c index b6a2dd742ebd..f0bfc9d109cb 100644 --- a/drivers/gpu/drm/xe/xe_guc_submit.c +++ b/drivers/gpu/drm/xe/xe_guc_submit.c @@ -2066,6 +2066,8 @@ int xe_guc_exec_queue_memory_cat_error_handler(struct xe_guc *guc, u32 *msg, if (!exec_queue_banned(q) && !exec_queue_check_timeout(q)) xe_guc_exec_queue_trigger_cleanup(q); + xe_vm_add_ban_entry(q->vm, q); + return 0; } diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c index 996000f2424e..3e88652670e6 100644 --- a/drivers/gpu/drm/xe/xe_vm.c +++ b/drivers/gpu/drm/xe/xe_vm.c @@ -746,6 +746,62 @@ int xe_vm_userptr_check_repin(struct xe_vm *vm) list_empty_careful(&vm->userptr.invalidated)) ? 0 : -EAGAIN; } +static void free_ban_entry(struct xe_exec_queue_ban_entry *b) +{ + list_del(&b->list); + kfree(b->pf); + kfree(b); +} + +void xe_vm_add_ban_entry(struct xe_vm *vm, struct xe_exec_queue *q) +{ + struct xe_exec_queue_ban_entry *b = NULL; + struct xe_file *xef = q->xef; + + b = kzalloc(sizeof(*b), GFP_KERNEL); + xe_assert(xef->xe, b); + + spin_lock(&vm->bans.lock); + list_add_tail(&b->list, &vm->bans.list); + vm->bans.len++; + /** + * Limit the number of bans in the bans list to prevent memory overuse. + */ + if (vm->bans.len > MAX_BANS) { + struct xe_exec_queue_ban_entry *rem = + list_first_entry(&vm->bans.list, struct xe_exec_queue_ban_entry, list); + + free_ban_entry(rem); + vm->bans.len--; + } + spin_unlock(&vm->bans.lock); + + /** + * Associate the current pagefault saved to the VM to the ban entry, and clear + * the VM pagefault cache. This is still valid if vm->pf.info is NULL. + */ + spin_lock(&vm->pf.lock); + b->pf = vm->pf.info; + vm->pf.info = NULL; + spin_unlock(&vm->pf.lock); + + /** Save blame data to list element */ + b->exec_queue_id = q->id; +} + +void xe_vm_remove_ban_entry(struct xe_vm *vm, struct xe_exec_queue *q) +{ + struct xe_exec_queue_ban_entry *b, *tmp; + + spin_lock(&vm->bans.lock); + list_for_each_entry_safe(b, tmp, &vm->bans.list, list) + if (b->exec_queue_id == q->id) { + free_ban_entry(b); + vm->bans.len--; + } + spin_unlock(&vm->bans.lock); +} + static int xe_vma_ops_alloc(struct xe_vma_ops *vops, bool array_of_binds) { int i; @@ -1448,6 +1504,10 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags) init_rwsem(&vm->userptr.notifier_lock); spin_lock_init(&vm->userptr.invalidated_lock); + INIT_LIST_HEAD(&vm->bans.list); + spin_lock_init(&vm->bans.lock); + spin_lock_init(&vm->pf.lock); + ttm_lru_bulk_move_init(&vm->lru_bulk_move); INIT_WORK(&vm->destroy_work, vm_destroy_work_func); @@ -1672,6 +1732,15 @@ void xe_vm_close_and_put(struct xe_vm *vm) } up_write(&xe->usm.lock); + if (vm->bans.len) { + struct xe_exec_queue_ban_entry *b, *tmp; + + spin_lock(&vm->bans.lock); + list_for_each_entry_safe(b, tmp, &vm->bans.list, list) + free_ban_entry(b); + spin_unlock(&vm->bans.lock); + } + for_each_tile(tile, xe, id) xe_range_fence_tree_fini(&vm->rftree[id]); diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h index f66075f8a6fe..9f8457ceb905 100644 --- a/drivers/gpu/drm/xe/xe_vm.h +++ b/drivers/gpu/drm/xe/xe_vm.h @@ -12,6 +12,8 @@ #include "xe_map.h" #include "xe_vm_types.h" +#define MAX_BANS 50 + struct drm_device; struct drm_printer; struct drm_file; @@ -244,6 +246,10 @@ int xe_vma_userptr_pin_pages(struct xe_userptr_vma *uvma); int xe_vma_userptr_check_repin(struct xe_userptr_vma *uvma); +void xe_vm_add_ban_entry(struct xe_vm *vm, struct xe_exec_queue *q); + +void xe_vm_remove_ban_entry(struct xe_vm *vm, struct xe_exec_queue *q); + bool xe_vm_validate_should_retry(struct drm_exec *exec, int err, ktime_t *end); int xe_vm_lock_vma(struct drm_exec *exec, struct xe_vma *vma); diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h index 52467b9b5348..e7e2d682b1b6 100644 --- a/drivers/gpu/drm/xe/xe_vm_types.h +++ b/drivers/gpu/drm/xe/xe_vm_types.h @@ -18,6 +18,7 @@ #include "xe_range_fence.h" struct xe_bo; +struct xe_pagefault; struct xe_sync_entry; struct xe_user_fence; struct xe_vm; @@ -135,6 +136,15 @@ struct xe_userptr_vma { struct xe_device; +struct xe_exec_queue_ban_entry { + /** @exec_queue_id: ID number of banned exec queue */ + u32 exec_queue_id; + /** @pf: pagefault on engine of banned exec queue, if any at time */ + struct xe_pagefault *pf; + /** @list: link into @xe_vm.bans.list */ + struct list_head list; +}; + struct xe_vm { /** @gpuvm: base GPUVM used to track VMAs */ struct drm_gpuvm gpuvm; @@ -274,6 +284,27 @@ struct xe_vm { bool capture_once; } error_capture; + /** + * @ban_list: List of relevant banned exec queues associated with this + * vm, as well as any pagefaults at time of ban. + */ + struct { + /** @lock: lock protecting @bans.list */ + spinlock_t lock; + /** @list: list of xe_exec_queue_ban_entry entries */ + struct list_head list; + /** @len: length of @bans.list */ + unsigned int len; + } bans; + + /** @pf: the last pagefault seen on this VM */ + struct { + /** @pf.info: info containing last seen pagefault details */ + struct xe_pagefault *info; + /** @pf.lock: lock protecting @pf.info */ + spinlock_t lock; + } pf; + /** * @tlb_flush_seqno: Required TLB flush seqno for the next exec. * protected by the vm resv. From patchwork Thu Feb 27 19:14:54 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Cavitt X-Patchwork-Id: 13995059 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 36CA6C19F32 for ; Thu, 27 Feb 2025 19:15:12 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 8FABE10EB81; Thu, 27 Feb 2025 19:15:11 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="LzvSOOgF"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.14]) by gabe.freedesktop.org (Postfix) with ESMTPS id E625110EB6D; Thu, 27 Feb 2025 19:15:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740683703; x=1772219703; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=9JhWOlwu+I9akc8a0zb5UZRnO2vhqdbuO+hgivhNSek=; b=LzvSOOgFfe5eWqE8zOltoWz8eRBLRo8qpAxB+rzJ6th6QHkistqV7HOG NoEixfijBKStodcQZ6qeujn3CnoczonuAawqWX8K/GogVGlnU4L1Z63Kx rmwY8HTZyWLDdeNlY3BBf0mC7SvMoyo/MY+jUNvm4Jp6Tdd/hgzmxoybd cPuAb5whR1o6o+7ybmUF5/i4uYY/4Oxyoiwgd98mzYiY8mFj4GmjBn2dS KzYSYhpvFj5ELkWf3/tWG7T3HF779FdyJUidb1ATSVqFZT1K4tS8u185x Z9MCcapwpdmwfMWTpGPvrZxG0YXjnq4bGY2jo44oC8WOnDouHT0rF7s+n A==; X-CSE-ConnectionGUID: 5FMjh7gMTMakMFZTYmb8Mg== X-CSE-MsgGUID: BWeZLAmIR2mMc8vfJ5qOuQ== X-IronPort-AV: E=McAfee;i="6700,10204,11358"; a="41850058" X-IronPort-AV: E=Sophos;i="6.13,320,1732608000"; d="scan'208";a="41850058" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa108.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Feb 2025 11:15:00 -0800 X-CSE-ConnectionGUID: tN3LV413S9mjsMcSzekeSA== X-CSE-MsgGUID: 4X87uiUESMaYy8q3XhT1Pg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,320,1732608000"; d="scan'208";a="117775294" Received: from dut4025lnl.fm.intel.com ([10.105.8.176]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Feb 2025 11:15:00 -0800 From: Jonathan Cavitt To: intel-xe@lists.freedesktop.org Cc: saurabhg.gupta@intel.com, alex.zuo@intel.com, jonathan.cavitt@intel.com, joonas.lahtinen@linux.intel.com, matthew.brost@intel.com, jianxun.zhang@intel.com, dri-devel@lists.freedesktop.org Subject: [PATCH v2 5/8] drm/xe/xe_vm: Add per VM reset stats Date: Thu, 27 Feb 2025 19:14:54 +0000 Message-ID: <20250227191457.84035-6-jonathan.cavitt@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250227191457.84035-1-jonathan.cavitt@intel.com> References: <20250227191457.84035-1-jonathan.cavitt@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Add a counter to xe_vm that tracks the number of times an engine reset has been observed with respect to the VM since creation. Signed-off-by: Jonathan Cavitt --- drivers/gpu/drm/xe/xe_guc_submit.c | 2 ++ drivers/gpu/drm/xe/xe_vm_types.h | 3 +++ 2 files changed, 5 insertions(+) diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c index f0bfc9d109cb..e4c2413ed47e 100644 --- a/drivers/gpu/drm/xe/xe_guc_submit.c +++ b/drivers/gpu/drm/xe/xe_guc_submit.c @@ -1990,6 +1990,8 @@ int xe_guc_exec_queue_reset_handler(struct xe_guc *guc, u32 *msg, u32 len) trace_xe_exec_queue_reset(q); + atomic_inc(&q->vm->reset_count); + /* * A banned engine is a NOP at this point (came from * guc_exec_queue_timedout_job). Otherwise, kick drm scheduler to cancel diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h index e7e2d682b1b6..a448402250e5 100644 --- a/drivers/gpu/drm/xe/xe_vm_types.h +++ b/drivers/gpu/drm/xe/xe_vm_types.h @@ -305,6 +305,9 @@ struct xe_vm { spinlock_t lock; } pf; + /** @reset_count: number of times this VM has seen an engine reset */ + atomic_t reset_count; + /** * @tlb_flush_seqno: Required TLB flush seqno for the next exec. * protected by the vm resv. From patchwork Thu Feb 27 19:14:55 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Cavitt X-Patchwork-Id: 13995062 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 4DE82C282C7 for ; Thu, 27 Feb 2025 19:15:14 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 81AF910EB82; Thu, 27 Feb 2025 19:15:12 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="cKs1Dssz"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.14]) by gabe.freedesktop.org (Postfix) with ESMTPS id 235BF10EB62; Thu, 27 Feb 2025 19:15:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740683703; x=1772219703; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=fhJDyCQIq5HBEy73uVphiB7N/lAkwPKcUznH0/AGzXo=; b=cKs1DsszHqp35mytQmAxOhjQFG1wM8Ymx/dlUtM2IRGOgJw4iDEedsau O8nDuNGs4UHw2Qfh0oqnbqLshIFrmvrZ9B8rBeBmQh0WokwF8Y3UHrfKL 50Leh9t3zSDoIkvAtjccNgUs4j6AX1lD4iJT+jpxejuH6P5XpSsO/2aRX rrvp2BR36Ke72r52AIQUomhUozg7HY8G4kPIv0IywQ55xLt0B3usofVZ4 raBMDoAG/8B9jmYIeH2Dr1OKfEKpDMxcfEXE2MyG7cGKCW/ucIiX3ah5w qm07fxcAYUHAX8GMr3iuj/8psFLqt3O0ZgIqyEAhP0/Q6zRO4/ALxYWkn g==; X-CSE-ConnectionGUID: g1lOyuTFTYSSvcxswrO08A== X-CSE-MsgGUID: fEzkLpIoROe9sFtBZQn0AA== X-IronPort-AV: E=McAfee;i="6700,10204,11358"; a="41850059" X-IronPort-AV: E=Sophos;i="6.13,320,1732608000"; d="scan'208";a="41850059" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa108.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Feb 2025 11:15:00 -0800 X-CSE-ConnectionGUID: K6L8F3KTQUy6hBP/F3jhMQ== X-CSE-MsgGUID: YAj3Rh1WTT2huP5z9d3EXw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,320,1732608000"; d="scan'208";a="117775298" Received: from dut4025lnl.fm.intel.com ([10.105.8.176]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Feb 2025 11:15:00 -0800 From: Jonathan Cavitt To: intel-xe@lists.freedesktop.org Cc: saurabhg.gupta@intel.com, alex.zuo@intel.com, jonathan.cavitt@intel.com, joonas.lahtinen@linux.intel.com, matthew.brost@intel.com, jianxun.zhang@intel.com, dri-devel@lists.freedesktop.org Subject: [PATCH v2 6/8] drm/xe/uapi: Define drm_xe_vm_get_property Date: Thu, 27 Feb 2025 19:14:55 +0000 Message-ID: <20250227191457.84035-7-jonathan.cavitt@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250227191457.84035-1-jonathan.cavitt@intel.com> References: <20250227191457.84035-1-jonathan.cavitt@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Add initial declarations for the drm_xe_vm_get_property ioctl. Signed-off-by: Jonathan Cavitt --- include/uapi/drm/xe_drm.h | 67 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 67 insertions(+) diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h index 76a462fae05f..78a5285bc5f8 100644 --- a/include/uapi/drm/xe_drm.h +++ b/include/uapi/drm/xe_drm.h @@ -81,6 +81,7 @@ extern "C" { * - &DRM_IOCTL_XE_EXEC * - &DRM_IOCTL_XE_WAIT_USER_FENCE * - &DRM_IOCTL_XE_OBSERVATION + * - %DRM_IOCTL_XE_VM_GET_PROPERTY */ /* @@ -102,6 +103,7 @@ extern "C" { #define DRM_XE_EXEC 0x09 #define DRM_XE_WAIT_USER_FENCE 0x0a #define DRM_XE_OBSERVATION 0x0b +#define DRM_XE_VM_GET_PROPERTY 0x0c /* Must be kept compact -- no holes */ @@ -117,6 +119,7 @@ extern "C" { #define DRM_IOCTL_XE_EXEC DRM_IOW(DRM_COMMAND_BASE + DRM_XE_EXEC, struct drm_xe_exec) #define DRM_IOCTL_XE_WAIT_USER_FENCE DRM_IOWR(DRM_COMMAND_BASE + DRM_XE_WAIT_USER_FENCE, struct drm_xe_wait_user_fence) #define DRM_IOCTL_XE_OBSERVATION DRM_IOW(DRM_COMMAND_BASE + DRM_XE_OBSERVATION, struct drm_xe_observation_param) +#define DRM_IOCTL_XE_VM_GET_PROPERTY DRM_IOWR(DRM_COMMAND_BASE + DRM_XE_VM_GET_PROPERTY, struct drm_xe_vm_get_property) /** * DOC: Xe IOCTL Extensions @@ -1166,6 +1169,70 @@ struct drm_xe_vm_bind { __u64 reserved[2]; }; +struct drm_xe_ban { + /** @exec_queue_id: ID of banned exec queue */ + __u32 exec_queue_id; + /** @faulted: Whether or not the ban has an associated pagefault. 0 is no, 1 is yes */ + __u32 faulted; + /** @address: Address of the fault, if relevant */ + __u64 address; +#define DRM_XE_FAULT_ADDRESS_TYPE_NONE_EXT 0 +#define DRM_XE_FAULT_ADDRESS_TYPE_READ_INVALID_EXT 1 +#define DRM_XE_FAULT_ADDRESS_TYPE_WRITE_INVALID_EXT 2 + /** @address_type: , if relevant */ + __u32 address_type; + /** + * @address_precision: Precision of faulted address, if relevant. + * Currently only SZ_4K. + */ + __u32 address_precision; + /** @reserved: MBZ */ + __u64 reserved[3]; +}; + +/** + * struct drm_xe_vm_get_property - Input of &DRM_IOCTL_XE_VM_GET_PROPERTY + * + * The user provides a VM ID and a property to query to this ioctl, + * and the ioctl returns the size of the return value. Calling the + * ioctl again with memory reserved for the data will save the + * requested property data to the data pointer. + * + * The valid properties are: + * - %DRM_XE_VM_GET_PROPERTY_FAULTS : List of all pagefaults that resulted in exec queue bans + * - %DRM_XE_VM_GET_PROPERTY_BANS : List of all exec queue bans + * - %DRM_XE_VM_GET_PROPERTY_NUM_RESETS : Number of engine resets seen by VM. + */ +struct drm_xe_vm_get_property { + /** @extensions: Pointer to the first extension struct, if any */ + __u64 extensions; + + /** @vm_id: The ID of the VM to query the properties of */ + __u32 vm_id; + +#define DRM_XE_VM_GET_PROPERTY_FAULTS 0 +#define DRM_XE_VM_GET_PROPERTY_BANS 1 +#define DRM_XE_VM_GET_PROPERTY_NUM_RESETS 2 + /** @property: The property to get */ + __u32 property; + + /** @size: Size of returned property @data */ + __u32 size; + + /** @pad: MBZ */ + __u32 pad; + + union { + /** @data: Return for scalar data values */ + __u64 data; + /** @ptr: Pointer to user structs when required */ + __u64 ptr; + }; + + /** @reserved: MBZ */ + __u64 reserved[2]; +}; + /** * struct drm_xe_exec_queue_create - Input of &DRM_IOCTL_XE_EXEC_QUEUE_CREATE * From patchwork Thu Feb 27 19:14:56 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Cavitt X-Patchwork-Id: 13995063 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id AC692C197BF for ; Thu, 27 Feb 2025 19:15:22 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 1968010EB67; Thu, 27 Feb 2025 19:15:22 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="iZHq7S15"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.14]) by gabe.freedesktop.org (Postfix) with ESMTPS id 23A3910EB67; Thu, 27 Feb 2025 19:15:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740683703; x=1772219703; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ihH9wBnZzhdNjMrP46o0SW/6Cb8X5ov8UZ6nFxqDJAE=; b=iZHq7S15caMhe5s8QBBfg6CHQ4huw4/vgLZ7qE5jp/kj9a3rN9F48Kkg vFsJnGaE1ScX2/i4RDVMKjs08aUALGLb6k30hr/sAiCqSqwfHYT6+wJcH 4NS+AE73mlfHsz069EIuWkcAVVxhrXiaZ4elRKu5u68I9WmaYrbox7AgW Ha4hKfSa8cbHWwJ84FgfTPVzav1go6i7KrqWYToYDpM7Z+whPd6uw3UoH /JGg11bEVvLnrpFRx4u3676BKTaJaBzxxeVpUHqXZdOfJ2zPv7vsu2qQZ Y996oFVXDnudZzk303F6V2oWH/b6lCLgVp4NAYYuPnpkSyqXeHcvQw5vE g==; X-CSE-ConnectionGUID: 5N/uLpTxRD+BeQHhYuPPaQ== X-CSE-MsgGUID: IwoH3pylQVi2QwuOic6hWw== X-IronPort-AV: E=McAfee;i="6700,10204,11358"; a="41850060" X-IronPort-AV: E=Sophos;i="6.13,320,1732608000"; d="scan'208";a="41850060" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa108.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Feb 2025 11:15:00 -0800 X-CSE-ConnectionGUID: ILU3RFI6THWlW43wgBPOdw== X-CSE-MsgGUID: ed3BMxscRs2KUHB4S7/PfA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,320,1732608000"; d="scan'208";a="117775301" Received: from dut4025lnl.fm.intel.com ([10.105.8.176]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Feb 2025 11:15:00 -0800 From: Jonathan Cavitt To: intel-xe@lists.freedesktop.org Cc: saurabhg.gupta@intel.com, alex.zuo@intel.com, jonathan.cavitt@intel.com, joonas.lahtinen@linux.intel.com, matthew.brost@intel.com, jianxun.zhang@intel.com, dri-devel@lists.freedesktop.org Subject: [PATCH v2 7/8] drm/xe/xe_gt_pagefault: Add address_type field to pagefaults Date: Thu, 27 Feb 2025 19:14:56 +0000 Message-ID: <20250227191457.84035-8-jonathan.cavitt@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250227191457.84035-1-jonathan.cavitt@intel.com> References: <20250227191457.84035-1-jonathan.cavitt@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Add a new field to the xe_pagefault struct, address_type, that tracks the type of fault the pagefault incurred. Signed-off-by: Jonathan Cavitt --- drivers/gpu/drm/xe/xe_gt_pagefault.c | 3 +++ drivers/gpu/drm/xe/xe_gt_pagefault.h | 1 + 2 files changed, 4 insertions(+) diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c index e23b9d33afa5..aae94dc3a99f 100644 --- a/drivers/gpu/drm/xe/xe_gt_pagefault.c +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c @@ -204,11 +204,13 @@ static int handle_pagefault(struct xe_gt *gt, struct xe_pagefault *pf) vma = lookup_vma(vm, pf->page_addr); if (!vma) { + pf->address_type = DRM_XE_FAULT_ADDRESS_TYPE_NONE_EXT; err = -EINVAL; goto unlock_vm; } if (xe_vma_read_only(vma) && pf->access_type != XE_PAGEFAULT_ACCESS_TYPE_READ) { + pf->address_type = DRM_XE_FAULT_ADDRESS_TYPE_WRITE_INVALID_EXT; err = -EPERM; goto unlock_vm; } @@ -276,6 +278,7 @@ static bool get_pagefault(struct pf_queue *pf_queue, struct xe_pagefault *pf) pf->asid = FIELD_GET(PFD_ASID, desc->dw1); pf->vfid = FIELD_GET(PFD_VFID, desc->dw2); pf->access_type = FIELD_GET(PFD_ACCESS_TYPE, desc->dw2); + pf->address_type = 0; pf->fault_type = FIELD_GET(PFD_FAULT_TYPE, desc->dw2); pf->page_addr = (u64)(FIELD_GET(PFD_VIRTUAL_ADDR_HI, desc->dw3)) << PFD_VIRTUAL_ADDR_HI_SHIFT; diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.h b/drivers/gpu/drm/xe/xe_gt_pagefault.h index 33616043d17a..969f7b458d3f 100644 --- a/drivers/gpu/drm/xe/xe_gt_pagefault.h +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.h @@ -17,6 +17,7 @@ struct xe_pagefault { u16 pdata; u8 vfid; u8 access_type; + u8 address_type; u8 fault_type; u8 fault_level; u8 engine_class; From patchwork Thu Feb 27 19:14:57 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Cavitt X-Patchwork-Id: 13995061 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7ACCBC197BF for ; Thu, 27 Feb 2025 19:15:13 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id F164410EB83; Thu, 27 Feb 2025 19:15:11 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="XoU8ojrq"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.14]) by gabe.freedesktop.org (Postfix) with ESMTPS id 433D010EB6C; Thu, 27 Feb 2025 19:15:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740683703; x=1772219703; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=fTT4IDp0eG35YWqJ+9j3q8JrBipHJNMQcOZOdquVIQo=; b=XoU8ojrqZfvNmzUiuRaITGlXk7IGJ+Cum8d2faqpypt2YFt6MuT8OGkC mfMzjVo3/v0BJ3fFgXAp/YQzODc74USHOb4JxWftNjMOqIVyPMH3QnLoo 5dK6+wtZfNf3UDrT/43RomFck3F0EJZeaxTBv4ahl3CjIq/gnVle5PAjr 8BLwd8i4ORxeWH4KOcGwS2fUmwRElKJu8EEIY//kHJmTsjlCu2HOssmab 2tWNVL3vfrhrG3d3sz24t/fYXYhl0IDnxOWJiPFU+DUgq+yxRpuN/Z94Q zSqjwA4e7vUSA6wvnqPxZnGFYQcaEktZK+i+cZS0s3KC778NstTN/nspm Q==; X-CSE-ConnectionGUID: exKUi9K+RXyJ046ip9Iu9Q== X-CSE-MsgGUID: 9maAFd/IR/CNpy/2yqatxw== X-IronPort-AV: E=McAfee;i="6700,10204,11358"; a="41850061" X-IronPort-AV: E=Sophos;i="6.13,320,1732608000"; d="scan'208";a="41850061" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa108.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Feb 2025 11:15:00 -0800 X-CSE-ConnectionGUID: m43/JjTIQoOlXZY/+Jk0zA== X-CSE-MsgGUID: UBT3wlipRLe3YvivmGY1/w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,320,1732608000"; d="scan'208";a="117775304" Received: from dut4025lnl.fm.intel.com ([10.105.8.176]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Feb 2025 11:15:00 -0800 From: Jonathan Cavitt To: intel-xe@lists.freedesktop.org Cc: saurabhg.gupta@intel.com, alex.zuo@intel.com, jonathan.cavitt@intel.com, joonas.lahtinen@linux.intel.com, matthew.brost@intel.com, jianxun.zhang@intel.com, dri-devel@lists.freedesktop.org Subject: [PATCH v2 8/8] drm/xe/xe_vm: Implement xe_vm_get_property_ioctl Date: Thu, 27 Feb 2025 19:14:57 +0000 Message-ID: <20250227191457.84035-9-jonathan.cavitt@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250227191457.84035-1-jonathan.cavitt@intel.com> References: <20250227191457.84035-1-jonathan.cavitt@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Add support for userspace to get various properties from a specified VM. The currently supported properties are: - The number of engine resets the VM has observed - The number of exec queue bans the VM has observed, up to the last 50 relevant ones, in total. - The number of exec queue bans the VM has observed, up to the last 50 relevant ones, that were caused by faults. The latter two requests also include information on the exec queue bans themselves, such as the ID of the banned exec queue and, when relevant, the faulting address, address type, and address precision. Signed-off-by: Jonathan Cavitt --- drivers/gpu/drm/xe/xe_device.c | 3 + drivers/gpu/drm/xe/xe_vm.c | 102 +++++++++++++++++++++++++++++++++ drivers/gpu/drm/xe/xe_vm.h | 2 + 3 files changed, 107 insertions(+) diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c index 9454b51f7ad8..43accae152ff 100644 --- a/drivers/gpu/drm/xe/xe_device.c +++ b/drivers/gpu/drm/xe/xe_device.c @@ -193,6 +193,9 @@ static const struct drm_ioctl_desc xe_ioctls[] = { DRM_IOCTL_DEF_DRV(XE_WAIT_USER_FENCE, xe_wait_user_fence_ioctl, DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(XE_OBSERVATION, xe_observation_ioctl, DRM_RENDER_ALLOW), + DRM_IOCTL_DEF_DRV(XE_VM_GET_PROPERTY, xe_vm_get_property_ioctl, + DRM_RENDER_ALLOW), + }; static long xe_drm_ioctl(struct file *file, unsigned int cmd, unsigned long arg) diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c index 3e88652670e6..8ac54aaca51a 100644 --- a/drivers/gpu/drm/xe/xe_vm.c +++ b/drivers/gpu/drm/xe/xe_vm.c @@ -3258,6 +3258,108 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file) return err; } +static u32 xe_vm_get_property_size(struct xe_vm *vm, u32 property) +{ + u32 size = 0; + + switch (property) { + case DRM_XE_VM_GET_PROPERTY_FAULTS: + struct xe_exec_queue_ban_entry *entry; + + spin_lock(&vm->bans.lock); + list_for_each_entry(entry, &vm->bans.list, list) { + struct xe_pagefault *pf = entry->pf; + + size += pf ? sizeof(struct drm_xe_ban) : 0; + } + spin_unlock(&vm->bans.lock); + return size; + case DRM_XE_VM_GET_PROPERTY_BANS: + spin_lock(&vm->bans.lock); + size = vm->bans.len * sizeof(struct drm_xe_ban); + spin_unlock(&vm->bans.lock); + return size; + case DRM_XE_VM_GET_PROPERTY_NUM_RESETS: + return 0; + default: + return -EINVAL; + } +} + +static int fill_property_bans(struct xe_vm *vm, + struct drm_xe_vm_get_property *args, + u32 size, bool faults_only) +{ + struct drm_xe_ban __user *usr_ptr = u64_to_user_ptr(args->ptr); + struct drm_xe_ban *ban_list; + struct drm_xe_ban *ban; + struct xe_exec_queue_ban_entry *entry; + int i = 0; + + if (copy_from_user(&ban_list, usr_ptr, size)) + return -EFAULT; + + spin_lock(&vm->bans.lock); + list_for_each_entry(entry, &vm->bans.list, list) { + struct xe_pagefault *pf = entry->pf; + + if (!pf && faults_only) + continue; + + ban = &ban_list[i++]; + ban->exec_queue_id = entry->exec_queue_id; + ban->faulted = !pf ? 0 : 1; + ban->address = pf ? pf->page_addr : 0; + ban->address_type = pf ? pf->address_type : 0; + ban->address_precision = SZ_4K; + } + spin_unlock(&vm->bans.lock); + + if (copy_to_user(usr_ptr, &ban_list, size)) + return -EFAULT; + + return 0; +} + +int xe_vm_get_property_ioctl(struct drm_device *drm, void *data, + struct drm_file *file) +{ + struct xe_device *xe = to_xe_device(drm); + struct xe_file *xef = to_xe_file(file); + struct drm_xe_vm_get_property *args = data; + struct xe_vm *vm; + u32 size; + + if (XE_IOCTL_DBG(xe, args->reserved[0] || args->reserved[1])) + return -EINVAL; + + vm = xe_vm_lookup(xef, args->vm_id); + if (XE_IOCTL_DBG(xe, !vm)) + return -ENOENT; + + size = xe_vm_get_property_size(vm, args->property); + if (size < 0) { + return size; + } else if (args->size != size) { + if (args->size) + return -EINVAL; + args->size = size; + return 0; + } + + switch (args->property) { + case DRM_XE_VM_GET_PROPERTY_FAULTS: + return fill_property_bans(vm, args, size, true); + case DRM_XE_VM_GET_PROPERTY_BANS: + return fill_property_bans(vm, args, size, false); + case DRM_XE_VM_GET_PROPERTY_NUM_RESETS: + args->data = atomic_read(&vm->reset_count); + return 0; + default: + return -EINVAL; + } +} + /** * xe_vm_bind_kernel_bo - bind a kernel BO to a VM * @vm: VM to bind the BO to diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h index 9f8457ceb905..0338f42f7a71 100644 --- a/drivers/gpu/drm/xe/xe_vm.h +++ b/drivers/gpu/drm/xe/xe_vm.h @@ -184,6 +184,8 @@ int xe_vm_destroy_ioctl(struct drm_device *dev, void *data, struct drm_file *file); int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file); +int xe_vm_get_property_ioctl(struct drm_device *dev, void *data, + struct drm_file *file); void xe_vm_close_and_put(struct xe_vm *vm);