From patchwork Wed Feb 26 22:55:51 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Cavitt X-Patchwork-Id: 13993283 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1590DC19F2E for ; Wed, 26 Feb 2025 22:56:00 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 7A0F810E9F2; Wed, 26 Feb 2025 22:55:59 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="d8IQzrOt"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.13]) by gabe.freedesktop.org (Postfix) with ESMTPS id EF69810E00E; Wed, 26 Feb 2025 22:55:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740610559; x=1772146559; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=rqIUUaaHokHlIA6TyPc04TRi5RIbo1YwT3LOaOjKLxI=; b=d8IQzrOtCA1ljbxKK+v8dVaTCtqvRW8C6EaRYMsm/quxpmiHXnDQcfm1 wjFMbWM6yKPocpm3BGXLSy3/1bHEmHdrav+rrPWnmkinsShYcAoPIwZMP iHumy28sDAiRZ/0SJTPaFfW/SN/74ZkyYmjTdGS/gHgpuiGLhl8dhclWE lxdAhcfcH1BfCzCQIFKhZNl4OWeGqMUiPOr7V0bE1tkqmfdvLup+CO+hb 0JbIJ27Ie3pizY0AXSydCftBV30otsthPTuo48hBatzGUwceaWJgR9QRV 5nRvEGM1v2SCGxDLuNE6KW9IcV5Ca/rZCH+0cVwsXonGbWZIdgQEjQl4J g==; X-CSE-ConnectionGUID: ZH2T6+a3S5CN8WtEYynU+Q== X-CSE-MsgGUID: IuNZoayXQsqGhXK4FWMCLQ== X-IronPort-AV: E=McAfee;i="6700,10204,11357"; a="44308773" X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="44308773" Received: from fmviesa010.fm.intel.com ([10.60.135.150]) by fmvoesa107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 14:55:58 -0800 X-CSE-ConnectionGUID: IxZETY0YRSitPteeZCAZEQ== X-CSE-MsgGUID: +8F20MFxS12V1mlR9N2lmg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="117326768" Received: from dut4025lnl.fm.intel.com ([10.105.10.92]) by fmviesa010-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 14:55:58 -0800 From: Jonathan Cavitt To: intel-xe@lists.freedesktop.org Cc: saurabhg.gupta@intel.com, alex.zuo@intel.com, jonathan.cavitt@intel.com, joonas.lahtinen@linux.intel.com, matthew.brost@intel.com, jianxun.zhang@intel.com, dri-devel@lists.freedesktop.org Subject: [PATCH 1/6] drm/xe/xe_gt_pagefault: Migrate lookup_vma to xe_vm.h Date: Wed, 26 Feb 2025 22:55:51 +0000 Message-ID: <20250226225557.133076-2-jonathan.cavitt@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250226225557.133076-1-jonathan.cavitt@intel.com> References: <20250226225557.133076-1-jonathan.cavitt@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Make lookup_vma a static inline header function for xe_vm. Signed-off-by: Jonathan Cavitt --- drivers/gpu/drm/xe/xe_gt_pagefault.c | 25 +------------------------ drivers/gpu/drm/xe/xe_vm.h | 24 ++++++++++++++++++++++++ 2 files changed, 25 insertions(+), 24 deletions(-) diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c index 17d69039b866..4a4cf0c4b68d 100644 --- a/drivers/gpu/drm/xe/xe_gt_pagefault.c +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c @@ -71,29 +71,6 @@ static bool vma_is_valid(struct xe_tile *tile, struct xe_vma *vma) !(BIT(tile->id) & vma->tile_invalidated); } -static bool vma_matches(struct xe_vma *vma, u64 page_addr) -{ - if (page_addr > xe_vma_end(vma) - 1 || - page_addr + SZ_4K - 1 < xe_vma_start(vma)) - return false; - - return true; -} - -static struct xe_vma *lookup_vma(struct xe_vm *vm, u64 page_addr) -{ - struct xe_vma *vma = NULL; - - if (vm->usm.last_fault_vma) { /* Fast lookup */ - if (vma_matches(vm->usm.last_fault_vma, page_addr)) - vma = vm->usm.last_fault_vma; - } - if (!vma) - vma = xe_vm_find_overlapping_vma(vm, page_addr, SZ_4K); - - return vma; -} - static int xe_pf_begin(struct drm_exec *exec, struct xe_vma *vma, bool atomic, unsigned int id) { @@ -229,7 +206,7 @@ static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf) goto unlock_vm; } - vma = lookup_vma(vm, pf->page_addr); + vma = xe_vm_lookup_vma(vm, pf->page_addr); if (!vma) { err = -EINVAL; goto unlock_vm; diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h index f66075f8a6fe..fb3f15ee89ec 100644 --- a/drivers/gpu/drm/xe/xe_vm.h +++ b/drivers/gpu/drm/xe/xe_vm.h @@ -248,6 +248,30 @@ bool xe_vm_validate_should_retry(struct drm_exec *exec, int err, ktime_t *end); int xe_vm_lock_vma(struct drm_exec *exec, struct xe_vma *vma); +static bool vma_matches(struct xe_vma *vma, u64 page_addr) +{ + if (page_addr > xe_vma_end(vma) - 1 || + page_addr + SZ_4K - 1 < xe_vma_start(vma)) + return false; + + return true; +} + +static inline struct xe_vma *xe_vm_lookup_vma(struct xe_vm *vm, u64 page_addr) +{ + struct xe_vma *vma = NULL; + + if (vm->usm.last_fault_vma) { /* Fast lookup */ + if (vma_matches(vm->usm.last_fault_vma, page_addr)) + vma = vm->usm.last_fault_vma; + } + if (!vma) + vma = xe_vm_find_overlapping_vma(vm, page_addr, SZ_4K); + + return vma; +} + + int xe_vm_validate_rebind(struct xe_vm *vm, struct drm_exec *exec, unsigned int num_fences); From patchwork Wed Feb 26 22:55:52 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Cavitt X-Patchwork-Id: 13993288 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A4C78C1B0FF for ; Wed, 26 Feb 2025 22:56:08 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 471C810EA02; Wed, 26 Feb 2025 22:56:01 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="OcrktRIN"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.13]) by gabe.freedesktop.org (Postfix) with ESMTPS id 1D76610E0A0; Wed, 26 Feb 2025 22:55:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740610559; x=1772146559; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=kdx43vBPxQuJgAnPefatV3H8gKQrTpuV5MOzEF/ZEe8=; b=OcrktRINKDZS7Xo53QQtgAceMaP9Q1X2R5D1TPqJaX4PAItgdw6NV58d ZL+ootVcmm2fMO8P7349COjxMoQapPWSd0Q2fTHleU3fMP8k50VlsEb9p QL/jroXjaDVfVuIqsafeOYFGvG7cVJ9yrhKNm3VDZw7xfTc+X+rg7JOHX a1pkW94tOspdySUFKm+PgVEh5BRuTDmo7Qt5HGjTjWnH5LCmw72pAvgkW m4phfpenJRHRet46cj9cV/4QD9K5AC4Cv+HjJddMrjMHwjSd1TeyZJbKh 97P/BnLjoPEf26M3+BDN2IJ/KIrejiryPuaoNlnVeyUJehSflSNT9RMsu A==; X-CSE-ConnectionGUID: 3jo/C26ITm2wdvigbpZySQ== X-CSE-MsgGUID: R3h0h6pMT+iDoLTx8OtFmw== X-IronPort-AV: E=McAfee;i="6700,10204,11357"; a="44308774" X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="44308774" Received: from fmviesa010.fm.intel.com ([10.60.135.150]) by fmvoesa107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 14:55:58 -0800 X-CSE-ConnectionGUID: /MHf7KCRQmW8kBBfRDB3Aw== X-CSE-MsgGUID: HFXa9Di7Sb2w4CJZkSPSpA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="117326770" Received: from dut4025lnl.fm.intel.com ([10.105.10.92]) by fmviesa010-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 14:55:58 -0800 From: Jonathan Cavitt To: intel-xe@lists.freedesktop.org Cc: saurabhg.gupta@intel.com, alex.zuo@intel.com, jonathan.cavitt@intel.com, joonas.lahtinen@linux.intel.com, matthew.brost@intel.com, jianxun.zhang@intel.com, dri-devel@lists.freedesktop.org Subject: [PATCH 2/6] drm/xe/xe_exec_queue: Add ID param to exec queue struct Date: Wed, 26 Feb 2025 22:55:52 +0000 Message-ID: <20250226225557.133076-3-jonathan.cavitt@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250226225557.133076-1-jonathan.cavitt@intel.com> References: <20250226225557.133076-1-jonathan.cavitt@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Add the exec queue id to the exec queue struct. This is useful for performing a reverse lookup into the xef->exec_queue xarray. Signed-off-by: Jonathan Cavitt --- drivers/gpu/drm/xe/xe_exec_queue.c | 1 + drivers/gpu/drm/xe/xe_exec_queue_types.h | 2 ++ 2 files changed, 3 insertions(+) diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c index 23a9f519ce1c..4a98a5d0e405 100644 --- a/drivers/gpu/drm/xe/xe_exec_queue.c +++ b/drivers/gpu/drm/xe/xe_exec_queue.c @@ -709,6 +709,7 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data, if (err) goto kill_exec_queue; + q->id = id; args->exec_queue_id = id; return 0; diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h index 6eb7ff091534..088d838218e9 100644 --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h @@ -55,6 +55,8 @@ struct xe_exec_queue { struct xe_vm *vm; /** @class: class of this exec queue */ enum xe_engine_class class; + /** @id: exec queue ID as reported during create ioctl */ + u32 id; /** * @logical_mask: logical mask of where job submitted to exec queue can run */ From patchwork Wed Feb 26 22:55:53 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Cavitt X-Patchwork-Id: 13993285 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B9629C1B0FF for ; Wed, 26 Feb 2025 22:56:04 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 0853B10E0A0; Wed, 26 Feb 2025 22:56:00 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="Z2b5kmbT"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.13]) by gabe.freedesktop.org (Postfix) with ESMTPS id 6239910E0A0; Wed, 26 Feb 2025 22:55:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740610559; x=1772146559; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=fp8OpIP1ZSNFu4GkJIRvnhQjWAr6y2TnupYGBm8bZrI=; b=Z2b5kmbT6YSHUHOc5wSjJK0U90pxp8Slc2mpCyvcpAj+mu0+wg+iuh8A zBmQ/QepuU8h3mdFkrYAufLzddlJjz5x7XexXi37zOAWQNPcxgcm3dGur ymb2Y1r4Cis2c66Za68kQbM4j+enYDFceMNcF8ACv56kGC/VTeGJHl42x DwP9l9fdDLrR8ZoVhYj7RVH6Tc2D5FDadvlbZeEt6tuA+sGlw/2IUdvjz F5Gva5EVqkpPCM0+czLr2onHCC6U58XNdxNIc1VBEH+lgFvlUwDiL9RuD RMHMGHn7EY/+Isr/9/FCNHoQT0higI80ezOG51J/iebOePwVHw4PssQiM A==; X-CSE-ConnectionGUID: eKAM5O97Ti+N9NyjtF4xkA== X-CSE-MsgGUID: uD8u7K1uRDCx63qsyeurYg== X-IronPort-AV: E=McAfee;i="6700,10204,11357"; a="44308775" X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="44308775" Received: from fmviesa010.fm.intel.com ([10.60.135.150]) by fmvoesa107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 14:55:58 -0800 X-CSE-ConnectionGUID: l7NcPnF8T7u3NSL2MvzK/w== X-CSE-MsgGUID: lejbgj8KSE6FN4Uad+VjaQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="117326773" Received: from dut4025lnl.fm.intel.com ([10.105.10.92]) by fmviesa010-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 14:55:58 -0800 From: Jonathan Cavitt To: intel-xe@lists.freedesktop.org Cc: saurabhg.gupta@intel.com, alex.zuo@intel.com, jonathan.cavitt@intel.com, joonas.lahtinen@linux.intel.com, matthew.brost@intel.com, jianxun.zhang@intel.com, dri-devel@lists.freedesktop.org Subject: [PATCH 3/6] drm/xe/xe_gt_pagefault: Migrate pagefault struct to header Date: Wed, 26 Feb 2025 22:55:53 +0000 Message-ID: <20250226225557.133076-4-jonathan.cavitt@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250226225557.133076-1-jonathan.cavitt@intel.com> References: <20250226225557.133076-1-jonathan.cavitt@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Migrate the pagefault struct from xe_gt_pagefault.c to the xe_gt_pagefault.h header file, along with the associated enum values. v2: Normalize names for common header (Matt Brost) Signed-off-by: Jonathan Cavitt --- drivers/gpu/drm/xe/xe_gt_pagefault.c | 41 +++++----------------------- drivers/gpu/drm/xe/xe_gt_pagefault.h | 28 +++++++++++++++++++ 2 files changed, 35 insertions(+), 34 deletions(-) diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c index 4a4cf0c4b68d..76d7feecf98e 100644 --- a/drivers/gpu/drm/xe/xe_gt_pagefault.c +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c @@ -22,33 +22,6 @@ #include "xe_trace_bo.h" #include "xe_vm.h" -struct pagefault { - u64 page_addr; - u32 asid; - u16 pdata; - u8 vfid; - u8 access_type; - u8 fault_type; - u8 fault_level; - u8 engine_class; - u8 engine_instance; - u8 fault_unsuccessful; - bool trva_fault; -}; - -enum access_type { - ACCESS_TYPE_READ = 0, - ACCESS_TYPE_WRITE = 1, - ACCESS_TYPE_ATOMIC = 2, - ACCESS_TYPE_RESERVED = 3, -}; - -enum fault_type { - NOT_PRESENT = 0, - WRITE_ACCESS_VIOLATION = 1, - ATOMIC_ACCESS_VIOLATION = 2, -}; - struct acc { u64 va_range_base; u32 asid; @@ -60,9 +33,9 @@ struct acc { u8 engine_instance; }; -static bool access_is_atomic(enum access_type access_type) +static bool access_is_atomic(enum xe_pagefault_access_type access_type) { - return access_type == ACCESS_TYPE_ATOMIC; + return access_type == XE_PAGEFAULT_ACCESS_TYPE_ATOMIC; } static bool vma_is_valid(struct xe_tile *tile, struct xe_vma *vma) @@ -102,7 +75,7 @@ static int xe_pf_begin(struct drm_exec *exec, struct xe_vma *vma, return 0; } -static int handle_vma_pagefault(struct xe_gt *gt, struct pagefault *pf, +static int handle_vma_pagefault(struct xe_gt *gt, struct xe_pagefault *pf, struct xe_vma *vma) { struct xe_vm *vm = xe_vma_vm(vma); @@ -181,7 +154,7 @@ static struct xe_vm *asid_to_vm(struct xe_device *xe, u32 asid) return vm; } -static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf) +static int handle_pagefault(struct xe_gt *gt, struct xe_pagefault *pf) { struct xe_device *xe = gt_to_xe(gt); struct xe_vm *vm; @@ -235,7 +208,7 @@ static int send_pagefault_reply(struct xe_guc *guc, return xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 0, 0); } -static void print_pagefault(struct xe_device *xe, struct pagefault *pf) +static void print_pagefault(struct xe_device *xe, struct xe_pagefault *pf) { drm_dbg(&xe->drm, "\n\tASID: %d\n" "\tVFID: %d\n" @@ -255,7 +228,7 @@ static void print_pagefault(struct xe_device *xe, struct pagefault *pf) #define PF_MSG_LEN_DW 4 -static bool get_pagefault(struct pf_queue *pf_queue, struct pagefault *pf) +static bool get_pagefault(struct pf_queue *pf_queue, struct xe_pagefault *pf) { const struct xe_guc_pagefault_desc *desc; bool ret = false; @@ -342,7 +315,7 @@ static void pf_queue_work_func(struct work_struct *w) struct xe_gt *gt = pf_queue->gt; struct xe_device *xe = gt_to_xe(gt); struct xe_guc_pagefault_reply reply = {}; - struct pagefault pf = {}; + struct xe_pagefault pf = {}; unsigned long threshold; int ret; diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.h b/drivers/gpu/drm/xe/xe_gt_pagefault.h index 839c065a5e4c..33616043d17a 100644 --- a/drivers/gpu/drm/xe/xe_gt_pagefault.h +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.h @@ -11,6 +11,34 @@ struct xe_gt; struct xe_guc; +struct xe_pagefault { + u64 page_addr; + u32 asid; + u16 pdata; + u8 vfid; + u8 access_type; + u8 fault_type; + u8 fault_level; + u8 engine_class; + u8 engine_instance; + u8 fault_unsuccessful; + bool prefetch; + bool trva_fault; +}; + +enum xe_pagefault_access_type { + XE_PAGEFAULT_ACCESS_TYPE_READ = 0, + XE_PAGEFAULT_ACCESS_TYPE_WRITE = 1, + XE_PAGEFAULT_ACCESS_TYPE_ATOMIC = 2, + XE_PAGEFAULT_ACCESS_TYPE_RESERVED = 3, +}; + +enum xe_pagefault_type { + XE_PAGEFAULT_TYPE_NOT_PRESENT = 0, + XE_PAGEFAULT_TYPE_WRITE_ACCESS_VIOLATION = 1, + XE_PAGEFAULT_TYPE_ATOMIC_ACCESS_VIOLATION = 2, +}; + int xe_gt_pagefault_init(struct xe_gt *gt); void xe_gt_pagefault_reset(struct xe_gt *gt); int xe_guc_pagefault_handler(struct xe_guc *guc, u32 *msg, u32 len); From patchwork Wed Feb 26 22:55:54 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Cavitt X-Patchwork-Id: 13993286 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 48296C1B087 for ; Wed, 26 Feb 2025 22:56:06 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 0781A10EA00; Wed, 26 Feb 2025 22:56:01 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="XBWa/0hY"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.13]) by gabe.freedesktop.org (Postfix) with ESMTPS id 8EF8D10E9F5; Wed, 26 Feb 2025 22:55:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740610560; x=1772146560; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=KeD4ywBgXFg15YiTbupTEasUOOGyxP/Of8BAkGD5wQg=; b=XBWa/0hYHeOlEda8SRi/WvywqYYI1I5hxGherXX0JV0i7CxJgYpvDERu XHkB2EZenBRIfV3zAGg64eQzuBX7nwNKdK2OoUFiq4rsnOtSQ1BpmPU36 OAPgl2DemCR+IbDgM3OPA/yHqivgijXeDipDJ4uyODnaGngULps5uPCm4 aDeZU88r2FXcVyJbUKRaan9jPmcbxhahNVSuum90Gh1hOUEPkGoqZRDQo psrs4vyu9ZL4fNaJz6AdGzzDPFVci23vw/7v5qwCZ6RXzyX5trI2Cu7IW eSsBiT2jItBdnIhy5KgX0drIEo3Z79/C1LOkXctnW8Gq1iuQROw7i8R9d w==; X-CSE-ConnectionGUID: AMffhRBNSeOqrc/dNHdT1A== X-CSE-MsgGUID: Cy02iurtRG+/iQlloamQ4Q== X-IronPort-AV: E=McAfee;i="6700,10204,11357"; a="44308776" X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="44308776" Received: from fmviesa010.fm.intel.com ([10.60.135.150]) by fmvoesa107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 14:55:58 -0800 X-CSE-ConnectionGUID: chmSQ8aDS3+ZAfk8GSLsHQ== X-CSE-MsgGUID: tafIwxQcTYKdjgPgFmUVfQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="117326776" Received: from dut4025lnl.fm.intel.com ([10.105.10.92]) by fmviesa010-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 14:55:58 -0800 From: Jonathan Cavitt To: intel-xe@lists.freedesktop.org Cc: saurabhg.gupta@intel.com, alex.zuo@intel.com, jonathan.cavitt@intel.com, joonas.lahtinen@linux.intel.com, matthew.brost@intel.com, jianxun.zhang@intel.com, dri-devel@lists.freedesktop.org Subject: [PATCH 4/6] drm/xe/xe_vm: Add per VM pagefault info Date: Wed, 26 Feb 2025 22:55:54 +0000 Message-ID: <20250226225557.133076-5-jonathan.cavitt@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250226225557.133076-1-jonathan.cavitt@intel.com> References: <20250226225557.133076-1-jonathan.cavitt@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Add additional information to vm so it can report up to the last 50 relevant exec queues to have been banned on it, as well as the last pagefault seen when said exec queues were banned. Since we cannot reasonably associate a pagefault to a specific exec queue, we currently report the last seen pagefault on the associated vm instead. The last pagefault seen per exec queue is saved to the vm, and the pagefault is updated during the pagefault handling process in xe_gt_pagefault. The last seen pagefault is reset once it has been associated to the next banned exec queue. Signed-off-by: Jonathan Cavitt Suggested-by: Matthew Brost --- drivers/gpu/drm/xe/xe_exec_queue.c | 6 +++ drivers/gpu/drm/xe/xe_gt_pagefault.c | 16 +++++++ drivers/gpu/drm/xe/xe_guc_submit.c | 2 + drivers/gpu/drm/xe/xe_vm.c | 69 ++++++++++++++++++++++++++++ drivers/gpu/drm/xe/xe_vm.h | 6 +++ drivers/gpu/drm/xe/xe_vm_types.h | 31 +++++++++++++ 6 files changed, 130 insertions(+) diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c index 4a98a5d0e405..e0764f3dfd76 100644 --- a/drivers/gpu/drm/xe/xe_exec_queue.c +++ b/drivers/gpu/drm/xe/xe_exec_queue.c @@ -712,6 +712,12 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data, q->id = id; args->exec_queue_id = id; + /** + * If an exec queue in the ban list shares the same exec queue + * ID, remove it from the ban list to avoid confusion. + */ + xe_vm_remove_ban_entry(q->vm, q); + return 0; kill_exec_queue: diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c index 76d7feecf98e..899b687de3e7 100644 --- a/drivers/gpu/drm/xe/xe_gt_pagefault.c +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c @@ -307,6 +307,21 @@ int xe_guc_pagefault_handler(struct xe_guc *guc, u32 *msg, u32 len) return full ? -ENOSPC : 0; } +static void save_pagefault_to_vm(struct xe_device *xe, struct xe_pagefault *pf) +{ + struct xe_vm *vm; + + vm = asid_to_vm(xe, pf->asid); + if (IS_ERR(vm)) + return; + + spin_lock(&vm->pf.lock); + if (!vm->pf.info) + vm->pf.info = kzalloc(sizeof(*pf), GFP_KERNEL); + memcpy(vm->pf.info, pf, sizeof(*pf)); + spin_unlock(&vm->pf.lock); +} + #define USM_QUEUE_MAX_RUNTIME_MS 20 static void pf_queue_work_func(struct work_struct *w) @@ -325,6 +340,7 @@ static void pf_queue_work_func(struct work_struct *w) ret = handle_pagefault(gt, &pf); if (unlikely(ret)) { print_pagefault(xe, &pf); + save_pagefault_to_vm(xe, &pf); pf.fault_unsuccessful = 1; drm_dbg(&xe->drm, "Fault response: Unsuccessful %d\n", ret); } diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c index b6a2dd742ebd..f0bfc9d109cb 100644 --- a/drivers/gpu/drm/xe/xe_guc_submit.c +++ b/drivers/gpu/drm/xe/xe_guc_submit.c @@ -2066,6 +2066,8 @@ int xe_guc_exec_queue_memory_cat_error_handler(struct xe_guc *guc, u32 *msg, if (!exec_queue_banned(q) && !exec_queue_check_timeout(q)) xe_guc_exec_queue_trigger_cleanup(q); + xe_vm_add_ban_entry(q->vm, q); + return 0; } diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c index 996000f2424e..3e88652670e6 100644 --- a/drivers/gpu/drm/xe/xe_vm.c +++ b/drivers/gpu/drm/xe/xe_vm.c @@ -746,6 +746,62 @@ int xe_vm_userptr_check_repin(struct xe_vm *vm) list_empty_careful(&vm->userptr.invalidated)) ? 0 : -EAGAIN; } +static void free_ban_entry(struct xe_exec_queue_ban_entry *b) +{ + list_del(&b->list); + kfree(b->pf); + kfree(b); +} + +void xe_vm_add_ban_entry(struct xe_vm *vm, struct xe_exec_queue *q) +{ + struct xe_exec_queue_ban_entry *b = NULL; + struct xe_file *xef = q->xef; + + b = kzalloc(sizeof(*b), GFP_KERNEL); + xe_assert(xef->xe, b); + + spin_lock(&vm->bans.lock); + list_add_tail(&b->list, &vm->bans.list); + vm->bans.len++; + /** + * Limit the number of bans in the bans list to prevent memory overuse. + */ + if (vm->bans.len > MAX_BANS) { + struct xe_exec_queue_ban_entry *rem = + list_first_entry(&vm->bans.list, struct xe_exec_queue_ban_entry, list); + + free_ban_entry(rem); + vm->bans.len--; + } + spin_unlock(&vm->bans.lock); + + /** + * Associate the current pagefault saved to the VM to the ban entry, and clear + * the VM pagefault cache. This is still valid if vm->pf.info is NULL. + */ + spin_lock(&vm->pf.lock); + b->pf = vm->pf.info; + vm->pf.info = NULL; + spin_unlock(&vm->pf.lock); + + /** Save blame data to list element */ + b->exec_queue_id = q->id; +} + +void xe_vm_remove_ban_entry(struct xe_vm *vm, struct xe_exec_queue *q) +{ + struct xe_exec_queue_ban_entry *b, *tmp; + + spin_lock(&vm->bans.lock); + list_for_each_entry_safe(b, tmp, &vm->bans.list, list) + if (b->exec_queue_id == q->id) { + free_ban_entry(b); + vm->bans.len--; + } + spin_unlock(&vm->bans.lock); +} + static int xe_vma_ops_alloc(struct xe_vma_ops *vops, bool array_of_binds) { int i; @@ -1448,6 +1504,10 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags) init_rwsem(&vm->userptr.notifier_lock); spin_lock_init(&vm->userptr.invalidated_lock); + INIT_LIST_HEAD(&vm->bans.list); + spin_lock_init(&vm->bans.lock); + spin_lock_init(&vm->pf.lock); + ttm_lru_bulk_move_init(&vm->lru_bulk_move); INIT_WORK(&vm->destroy_work, vm_destroy_work_func); @@ -1672,6 +1732,15 @@ void xe_vm_close_and_put(struct xe_vm *vm) } up_write(&xe->usm.lock); + if (vm->bans.len) { + struct xe_exec_queue_ban_entry *b, *tmp; + + spin_lock(&vm->bans.lock); + list_for_each_entry_safe(b, tmp, &vm->bans.list, list) + free_ban_entry(b); + spin_unlock(&vm->bans.lock); + } + for_each_tile(tile, xe, id) xe_range_fence_tree_fini(&vm->rftree[id]); diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h index fb3f15ee89ec..78dbc5d57cd3 100644 --- a/drivers/gpu/drm/xe/xe_vm.h +++ b/drivers/gpu/drm/xe/xe_vm.h @@ -12,6 +12,8 @@ #include "xe_map.h" #include "xe_vm_types.h" +#define MAX_BANS 50 + struct drm_device; struct drm_printer; struct drm_file; @@ -244,6 +246,10 @@ int xe_vma_userptr_pin_pages(struct xe_userptr_vma *uvma); int xe_vma_userptr_check_repin(struct xe_userptr_vma *uvma); +void xe_vm_add_ban_entry(struct xe_vm *vm, struct xe_exec_queue *q); + +void xe_vm_remove_ban_entry(struct xe_vm *vm, struct xe_exec_queue *q); + bool xe_vm_validate_should_retry(struct drm_exec *exec, int err, ktime_t *end); int xe_vm_lock_vma(struct drm_exec *exec, struct xe_vma *vma); diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h index 52467b9b5348..e7e2d682b1b6 100644 --- a/drivers/gpu/drm/xe/xe_vm_types.h +++ b/drivers/gpu/drm/xe/xe_vm_types.h @@ -18,6 +18,7 @@ #include "xe_range_fence.h" struct xe_bo; +struct xe_pagefault; struct xe_sync_entry; struct xe_user_fence; struct xe_vm; @@ -135,6 +136,15 @@ struct xe_userptr_vma { struct xe_device; +struct xe_exec_queue_ban_entry { + /** @exec_queue_id: ID number of banned exec queue */ + u32 exec_queue_id; + /** @pf: pagefault on engine of banned exec queue, if any at time */ + struct xe_pagefault *pf; + /** @list: link into @xe_vm.bans.list */ + struct list_head list; +}; + struct xe_vm { /** @gpuvm: base GPUVM used to track VMAs */ struct drm_gpuvm gpuvm; @@ -274,6 +284,27 @@ struct xe_vm { bool capture_once; } error_capture; + /** + * @ban_list: List of relevant banned exec queues associated with this + * vm, as well as any pagefaults at time of ban. + */ + struct { + /** @lock: lock protecting @bans.list */ + spinlock_t lock; + /** @list: list of xe_exec_queue_ban_entry entries */ + struct list_head list; + /** @len: length of @bans.list */ + unsigned int len; + } bans; + + /** @pf: the last pagefault seen on this VM */ + struct { + /** @pf.info: info containing last seen pagefault details */ + struct xe_pagefault *info; + /** @pf.lock: lock protecting @pf.info */ + spinlock_t lock; + } pf; + /** * @tlb_flush_seqno: Required TLB flush seqno for the next exec. * protected by the vm resv. From patchwork Wed Feb 26 22:55:55 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Cavitt X-Patchwork-Id: 13993287 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9F000C19F2E for ; Wed, 26 Feb 2025 22:56:07 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 012BE10E9FF; Wed, 26 Feb 2025 22:56:01 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="TSidy1Lc"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.13]) by gabe.freedesktop.org (Postfix) with ESMTPS id B092610E0A0; Wed, 26 Feb 2025 22:55:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740610560; x=1772146560; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=9JhWOlwu+I9akc8a0zb5UZRnO2vhqdbuO+hgivhNSek=; b=TSidy1LcNR7eEdDxBwEh7OzoteY2ENFl3oD2wHhWQ5vpr+C0W/bglhbQ WGDRNkxhYo2ch3Sci9BlzAhKjuoEOsDY+160Srja/FcPmoJ/at8JBbH+9 ncVr+KhsoApcab4cl1W812AOnorKsiYgdQaOXfUPJkFhqffA3cfipDVmH 2L9jEaTuSAIWpVG1XpWwXvnyyGQYDNSM5/wzbLCXkMzWzcXs2/RcNdExL OQcBurE8yq61HG/FBtCbMWEn9/DugX/UIZPT5kkgt7cfSMt0PuNxPsWta dvNEdnvWJLHBXTyuqunTf9bjpWKiSPk+qdTmO9darwAp7DKsgfR/KpC6a A==; X-CSE-ConnectionGUID: Jr9too33T1S5ZsOmYz4hvQ== X-CSE-MsgGUID: ae9j5hmjTq+X0d1yrmelJA== X-IronPort-AV: E=McAfee;i="6700,10204,11357"; a="44308777" X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="44308777" Received: from fmviesa010.fm.intel.com ([10.60.135.150]) by fmvoesa107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 14:55:59 -0800 X-CSE-ConnectionGUID: MmZganv7SF+5vMCcY3XhOg== X-CSE-MsgGUID: XBLJCL9eSBOo3xTWtJsEMA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="117326779" Received: from dut4025lnl.fm.intel.com ([10.105.10.92]) by fmviesa010-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 14:55:58 -0800 From: Jonathan Cavitt To: intel-xe@lists.freedesktop.org Cc: saurabhg.gupta@intel.com, alex.zuo@intel.com, jonathan.cavitt@intel.com, joonas.lahtinen@linux.intel.com, matthew.brost@intel.com, jianxun.zhang@intel.com, dri-devel@lists.freedesktop.org Subject: [PATCH 5/6] drm/xe/xe_vm: Add per VM reset stats Date: Wed, 26 Feb 2025 22:55:55 +0000 Message-ID: <20250226225557.133076-6-jonathan.cavitt@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250226225557.133076-1-jonathan.cavitt@intel.com> References: <20250226225557.133076-1-jonathan.cavitt@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Add a counter to xe_vm that tracks the number of times an engine reset has been observed with respect to the VM since creation. Signed-off-by: Jonathan Cavitt --- drivers/gpu/drm/xe/xe_guc_submit.c | 2 ++ drivers/gpu/drm/xe/xe_vm_types.h | 3 +++ 2 files changed, 5 insertions(+) diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c index f0bfc9d109cb..e4c2413ed47e 100644 --- a/drivers/gpu/drm/xe/xe_guc_submit.c +++ b/drivers/gpu/drm/xe/xe_guc_submit.c @@ -1990,6 +1990,8 @@ int xe_guc_exec_queue_reset_handler(struct xe_guc *guc, u32 *msg, u32 len) trace_xe_exec_queue_reset(q); + atomic_inc(&q->vm->reset_count); + /* * A banned engine is a NOP at this point (came from * guc_exec_queue_timedout_job). Otherwise, kick drm scheduler to cancel diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h index e7e2d682b1b6..a448402250e5 100644 --- a/drivers/gpu/drm/xe/xe_vm_types.h +++ b/drivers/gpu/drm/xe/xe_vm_types.h @@ -305,6 +305,9 @@ struct xe_vm { spinlock_t lock; } pf; + /** @reset_count: number of times this VM has seen an engine reset */ + atomic_t reset_count; + /** * @tlb_flush_seqno: Required TLB flush seqno for the next exec. * protected by the vm resv. From patchwork Wed Feb 26 22:55:56 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Cavitt X-Patchwork-Id: 13993289 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 6591DC19776 for ; Wed, 26 Feb 2025 22:56:09 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 998E810EA04; Wed, 26 Feb 2025 22:56:01 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="eyi+nwG+"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.13]) by gabe.freedesktop.org (Postfix) with ESMTPS id D6F6110E9F5; Wed, 26 Feb 2025 22:55:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740610560; x=1772146560; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Z6Hc159zCmfly6F0ovZoPCRKFCnECewKarPQa+TpW6k=; b=eyi+nwG+QroGY86ypXJ3y1q1MELRxBl0Kr8ZiqyoiHelxpdDhRTx1WlA 6416+W7AlW4xYiBgieXsPCCgFLC5gv336cID5QtAmd5yXecU5hV5eYEPb yF6jBq8Ll9ZgChU3VQHK7MkahVPm8IcGCw1i4pykLjx2eck0lFLyLVp7Z rFU/filAP6YwdrjZe10dssdt1WGHwj0HRjmwu0jqD012b3G0MBQaEvmMk oyeQrSkC0eM3meoSFW5Z0aisDPLa+mUayNCfIrKkzF8F5m8cjVmUj8ccV LINu5Ik6U2/hnwCG9UPBW0uAu8n7CpWQxEOs/wzlVwnE7AAmYbOH5n0QG A==; X-CSE-ConnectionGUID: BwlAgtnvQLmO5TwNUrQLSw== X-CSE-MsgGUID: jW8KHelgSkGJNRZctS3cZg== X-IronPort-AV: E=McAfee;i="6700,10204,11357"; a="44308778" X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="44308778" Received: from fmviesa010.fm.intel.com ([10.60.135.150]) by fmvoesa107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 14:55:59 -0800 X-CSE-ConnectionGUID: 1n+3YZxgR0SamkvtswDMnQ== X-CSE-MsgGUID: ulFhaef0TsuanClako0BPw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="117326783" Received: from dut4025lnl.fm.intel.com ([10.105.10.92]) by fmviesa010-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 14:55:58 -0800 From: Jonathan Cavitt To: intel-xe@lists.freedesktop.org Cc: saurabhg.gupta@intel.com, alex.zuo@intel.com, jonathan.cavitt@intel.com, joonas.lahtinen@linux.intel.com, matthew.brost@intel.com, jianxun.zhang@intel.com, dri-devel@lists.freedesktop.org Subject: [PATCH 6/6] drm/xe/xe_vm: Implement xe_vm_get_property_ioctl Date: Wed, 26 Feb 2025 22:55:56 +0000 Message-ID: <20250226225557.133076-7-jonathan.cavitt@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250226225557.133076-1-jonathan.cavitt@intel.com> References: <20250226225557.133076-1-jonathan.cavitt@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Add support for userspace to get various properties from a specified VM. The currently supported properties are: - The number of engine resets the VM has observed - The number of exec queue bans the VM has observed, up to the last 50 relevant ones, and how many of those were caused by faults. The latter request also includes information on the exec queue bans, such as the ID of the banned exec queue, whether the ban was caused by a pagefault or not, and the address and address type of the associated fault (if one exists). Signed-off-by: Jonathan Cavitt Suggested-by: Matthew Brost --- drivers/gpu/drm/xe/xe_device.c | 2 + drivers/gpu/drm/xe/xe_vm.c | 106 +++++++++++++++++++++++++++++++++ drivers/gpu/drm/xe/xe_vm.h | 2 + include/uapi/drm/xe_drm.h | 73 +++++++++++++++++++++++ 4 files changed, 183 insertions(+) diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c index 9454b51f7ad8..3a509a69062c 100644 --- a/drivers/gpu/drm/xe/xe_device.c +++ b/drivers/gpu/drm/xe/xe_device.c @@ -193,6 +193,8 @@ static const struct drm_ioctl_desc xe_ioctls[] = { DRM_IOCTL_DEF_DRV(XE_WAIT_USER_FENCE, xe_wait_user_fence_ioctl, DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(XE_OBSERVATION, xe_observation_ioctl, DRM_RENDER_ALLOW), + DRM_IOCTL_DEF_DRV(XE_VM_GET_PROPERTY, xe_vm_get_property_ioctl, + DRM_RENDER_ALLOW), }; static long xe_drm_ioctl(struct file *file, unsigned int cmd, unsigned long arg) diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c index 3e88652670e6..047908eb9ff7 100644 --- a/drivers/gpu/drm/xe/xe_vm.c +++ b/drivers/gpu/drm/xe/xe_vm.c @@ -3258,6 +3258,112 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file) return err; } +static u32 xe_vm_get_property_size(struct xe_vm *vm, u32 property) +{ + u32 size = -EINVAL; + + switch (property) { + case DRM_XE_VM_GET_PROPERTY_FAULTS: + spin_lock(&vm->bans.lock); + size = vm->bans.len * sizeof(struct drm_xe_ban); + spin_unlock(&vm->bans.lock); + size += sizeof(struct drm_xe_faults); + break; + case DRM_XE_VM_GET_PROPERTY_NUM_RESETS: + size = sizeof(u64); + break; + default: + break; + } + + return size; +} + +static enum drm_xe_fault_address_type +xe_pagefault_access_type_to_address_type(struct xe_vm *vm, struct xe_pagefault *pf) +{ + if (!pf) + return 0; + + vma = lookup_vma(vm, pf->page_addr); + if (!vma) + return DRM_XE_FAULT_ADDRESS_TYPE_NONE_EXT; + if (xe_vma_read_only(vma) && pf->access_type != XE_PAGEFAULT_ACCESS_TYPE_READ) + return DRM_XE_FAULT_ADDRESS_TYPE_WRITE_INVALID_EXT; + return 0; +} + +int xe_vm_get_property_ioctl(struct drm_device *drm, void *data, + struct drm_file *file) +{ + struct xe_device *xe = to_xe_device(drm); + struct xe_file *xef = to_xe_file(file); + struct drm_xe_vm_get_property *args = data; + struct xe_vm *vm; + u32 size; + + if (XE_IOCTL_DBG(xe, args->reserved[0] || args->reserved[1])) + return -EINVAL; + + vm = xe_vm_lookup(xef, args->vm_id); + if (XE_IOCTL_DBG(xe, !vm)) + return -ENOENT; + + size = xe_vm_get_property_size(vm, args->property); + if (size < 0) { + return size; + } else if (!args->size) { + args->size = size; + return 0; + } else if (args->size != size) { + return -EINVAL; + } + + if (args->property == DRM_XE_VM_GET_PROPERTY_FAULTS) { + struct drm_xe_faults __user *usr_ptr = u64_to_user_ptr(args->data); + struct drm_xe_faults fault_list; + struct drm_xe_ban *ban; + struct xe_exec_queue_ban_entry *entry; + int i = 0; + + if (copy_from_user(&fault_list, usr_ptr, size)) + return -EFAULT; + + fault_list.num_faults = 0; + + spin_lock(&vm->bans.lock); + list_for_each_entry(entry, &vm->bans.list, list) { + struct xe_pagefault *pf = entry->pf; + + ban = &fault_list.list[i++]; + ban->exec_queue_id = entry->exec_queue_id; + ban->faulted = !!pf ? 1 : 0; + ban->address = pf ? pf->page_addr : 0; + ban->address_type = xe_pagefault_access_type_to_address_type(vm, pf); + ban->address_type = pf ? pf->fault_type : 0; + fault_list.num_faults += ban->faulted; + } + spin_unlock(&vm->bans.lock); + + fault_list.num_bans = i; + + if (copy_to_user(usr_ptr, &fault_list, size)) + return -EFAULT; + + } else if (args->property == DRM_XE_VM_GET_PROPERTY_NUM_RESETS) { + u64 __user *usr_ptr = u64_to_user_ptr(args->data); + u64 num_resets = atomic_read(&vm->reset_count); + + if (copy_to_user(usr_ptr, &num_resets, size)) + return -EFAULT; + + } else { + return -EINVAL; + } + + return 0; +} + /** * xe_vm_bind_kernel_bo - bind a kernel BO to a VM * @vm: VM to bind the BO to diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h index 78dbc5d57cd3..84653539d8db 100644 --- a/drivers/gpu/drm/xe/xe_vm.h +++ b/drivers/gpu/drm/xe/xe_vm.h @@ -184,6 +184,8 @@ int xe_vm_destroy_ioctl(struct drm_device *dev, void *data, struct drm_file *file); int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file); +int xe_vm_get_property_ioctl(struct drm_device *dev, void *data, + struct drm_file *file); void xe_vm_close_and_put(struct xe_vm *vm); diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h index 76a462fae05f..00328d8a15dd 100644 --- a/include/uapi/drm/xe_drm.h +++ b/include/uapi/drm/xe_drm.h @@ -81,6 +81,7 @@ extern "C" { * - &DRM_IOCTL_XE_EXEC * - &DRM_IOCTL_XE_WAIT_USER_FENCE * - &DRM_IOCTL_XE_OBSERVATION + * - &DRM_IOCTL_XE_VM_GET_BANS */ /* @@ -102,6 +103,7 @@ extern "C" { #define DRM_XE_EXEC 0x09 #define DRM_XE_WAIT_USER_FENCE 0x0a #define DRM_XE_OBSERVATION 0x0b +#define DRM_XE_VM_GET_PROPERTY 0x0c /* Must be kept compact -- no holes */ @@ -117,6 +119,7 @@ extern "C" { #define DRM_IOCTL_XE_EXEC DRM_IOW(DRM_COMMAND_BASE + DRM_XE_EXEC, struct drm_xe_exec) #define DRM_IOCTL_XE_WAIT_USER_FENCE DRM_IOWR(DRM_COMMAND_BASE + DRM_XE_WAIT_USER_FENCE, struct drm_xe_wait_user_fence) #define DRM_IOCTL_XE_OBSERVATION DRM_IOW(DRM_COMMAND_BASE + DRM_XE_OBSERVATION, struct drm_xe_observation_param) +#define DRM_IOCTL_XE_VM_GET_PROPERTY DRM_IOW(DRM_COMMAND_BASE + DRM_XE_VM_GET_PROPERTY, struct drm_xe_vm_get_property) /** * DOC: Xe IOCTL Extensions @@ -1166,6 +1169,76 @@ struct drm_xe_vm_bind { __u64 reserved[2]; }; +/** Types of fault address */ +enum drm_xe_fault_address_type { + DRM_XE_FAULT_ADDRESS_TYPE_NONE_EXT, + DRM_XE_FAULT_ADDRESS_TYPE_READ_INVALID_EXT, + DRM_XE_FAULT_ADDRESS_TYPE_WRITE_INVALID_EXT, +}; + +struct drm_xe_ban { + /** @exec_queue_id: ID of banned exec queue */ + __u32 exec_queue_id; + /** @faulted: Whether or not the ban has an associated pagefault. 0 is no, 1 is yes */ + __u32 faulted; + /** @address: Address of the fault, if relevant */ + __u64 address; + /** @address_type: enum drm_xe_fault_address_type, if relevant */ + __u32 address_type; + /** @pad: MBZ */ + __u32 pad; + /** @reserved: MBZ */ + __u64 reserved[3]; +}; + +struct drm_xe_faults { + /** @num_faults: Number of faults observed on the VM */ + __u32 num_faults; + /** @num_bans: Number of bans observed on the VM */ + __u32 num_bans; + /** @reserved: MBZ */ + __u64 reserved[2]; + /** @list: Dynamic sized array of drm_xe_ban bans */ + struct drm_xe_ban list[]; +}; + +/** + * struct drm_xe_vm_get_property - Input of &DRM_IOCTL_XE_VM_GET_PROPERTY + * + * The user provides a VM ID and a property to query to this ioctl, + * and the ioctl returns the size of the return value. Calling the + * ioctl again with memory reserved for the data will save the + * requested property data to the data pointer. + * + * The valid properties are: + * - %DRM_XE_VM_GET_PROPERTY_FAULTS : Property is a drm_xe_faults struct of dynamic size + * - %DRM_XE_VM_GET_PROPERTY_NUM_RESETS: Property is a scalar + */ +struct drm_xe_vm_get_property { + /** @extensions: Pointer to the first extension struct, if any */ + __u64 extensions; + + /** @vm_id: The ID of the VM to query the properties of */ + __u32 vm_id; + +#define DRM_XE_VM_GET_PROPERTY_FAULTS 0 +#define DRM_XE_VM_GET_PROPERTY_NUM_RESETS 1 + /** @property: The property to get */ + __u32 property; + + /** @size: Size of returned property @data */ + __u32 size; + + /** @pad: MBZ */ + __u32 pad; + + /** @reserved: MBZ */ + __u64 reserved[2]; + + /** @data: Pointer storing return data */ + __u64 data; +}; + /** * struct drm_xe_exec_queue_create - Input of &DRM_IOCTL_XE_EXEC_QUEUE_CREATE *