mbox series

[v6,0/6] drm/xe/xe_vm: Implement xe_vm_get_faults_ioctl

Message ID 20250307224125.111430-1-jonathan.cavitt@intel.com (mailing list archive)
Headers show
Series drm/xe/xe_vm: Implement xe_vm_get_faults_ioctl | expand

Message

Jonathan Cavitt March 7, 2025, 10:41 p.m. UTC
Add additional information to each VM so they can report up to the first
50 seen pagefaults.  Only failed pagefaults are saved this way, as
successful pagefaults should recover and not need to be reported to
userspace.

Additionally, add a new ioctl - xe_vm_get_faults_ioctl - that allows the
user to query these pagefaults

v2: (Matt Brost)
- Break full ban list request into a separate property.
- Reformat drm_xe_vm_get_property struct.
- Remove need for drm_xe_faults helper struct.
- Separate data pointer and scalar return value in ioctl.
- Get address type on pagefault report and save it to the pagefault.
- Correctly reject writes to read-only VMAs.
- Miscellaneous formatting fixes.

v3: (Matt Brost)
- Only allow querying of failed pagefaults

v4:
- Remove unnecessary size parameter from helper function, as it
  is a property of the arguments. (jcavitt)
- Remove unnecessary copy_from_user (Jainxun)
- Set address_precision to 1 (Jainxun)
- Report max size instead of dynamic size for memory allocation
  purposes.  Total memory usage is reported separately.

v5:
- Return int from xe_vm_get_property_size (Shuicheng)
- Fix memory leak (Shuicheng)
- Remove unnecessary size variable (jcavitt)

v6:
- Free vm after use (Shuicheng)
- Compress pf copy logic (Shuicheng)
- Update fault_unsuccessful before storing (Shuicheng)
- Fix old struct name in comments (Shuicheng)
- Keep first 50 pagefaults instead of last 50 (Jianxun)
- Rename ioctl to xe_vm_get_faults_ioctl (jcavitt)

Signed-off-by: Jonathan Cavitt <joanthan.cavitt@intel.com>
Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Suggested-by: Matthew Brost <matthew.brost@intel.com>
CC: Zhang Jianxun <jianxun.zhang@intel.com>
CC: Shuicheng Lin <shuicheng.lin@intel.com>

Jonathan Cavitt (6):
  drm/xe/xe_gt_pagefault: Disallow writes to read-only VMAs
  drm/xe/xe_gt_pagefault: Migrate pagefault struct to header
  drm/xe/xe_vm: Add per VM pagefault info
  drm/xe/uapi: Define drm_xe_vm_get_faults
  drm/xe/xe_gt_pagefault: Add address_type field to pagefaults
  drm/xe/xe_vm: Implement xe_vm_get_faults_ioctl

 drivers/gpu/drm/xe/xe_device.c       |   3 +
 drivers/gpu/drm/xe/xe_gt_pagefault.c |  64 +++++++-------
 drivers/gpu/drm/xe/xe_gt_pagefault.h |  29 +++++++
 drivers/gpu/drm/xe/xe_vm.c           | 120 +++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_vm.h           |   8 ++
 drivers/gpu/drm/xe/xe_vm_types.h     |  20 +++++
 include/uapi/drm/xe_drm.h            |  49 +++++++++++
 7 files changed, 260 insertions(+), 33 deletions(-)

Comments

Lin, Shuicheng March 7, 2025, 11:25 p.m. UTC | #1
One generic question, do we have test case to verify whether the function is working correctly? Thanks.
I think we could have IGT test case to trigger the fault, then have another IGT test case to query it with this new uapi.
And make sure we could get the expected data with the new uapi.

Shuicheng 

On Fri, Mar 7, 2025 2:41 PM Cavitt, Jonathan wrote
> Add additional information to each VM so they can report up to the first
> 50 seen pagefaults.  Only failed pagefaults are saved this way, as successful
> pagefaults should recover and not need to be reported to userspace.
> 
> Additionally, add a new ioctl - xe_vm_get_faults_ioctl - that allows the user to
> query these pagefaults
> 
> v2: (Matt Brost)
> - Break full ban list request into a separate property.
> - Reformat drm_xe_vm_get_property struct.
> - Remove need for drm_xe_faults helper struct.
> - Separate data pointer and scalar return value in ioctl.
> - Get address type on pagefault report and save it to the pagefault.
> - Correctly reject writes to read-only VMAs.
> - Miscellaneous formatting fixes.
> 
> v3: (Matt Brost)
> - Only allow querying of failed pagefaults
> 
> v4:
> - Remove unnecessary size parameter from helper function, as it
>   is a property of the arguments. (jcavitt)
> - Remove unnecessary copy_from_user (Jainxun)
> - Set address_precision to 1 (Jainxun)
> - Report max size instead of dynamic size for memory allocation
>   purposes.  Total memory usage is reported separately.
> 
> v5:
> - Return int from xe_vm_get_property_size (Shuicheng)
> - Fix memory leak (Shuicheng)
> - Remove unnecessary size variable (jcavitt)
> 
> v6:
> - Free vm after use (Shuicheng)
> - Compress pf copy logic (Shuicheng)
> - Update fault_unsuccessful before storing (Shuicheng)
> - Fix old struct name in comments (Shuicheng)
> - Keep first 50 pagefaults instead of last 50 (Jianxun)
> - Rename ioctl to xe_vm_get_faults_ioctl (jcavitt)
> 
> Signed-off-by: Jonathan Cavitt <joanthan.cavitt@intel.com>
> Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Suggested-by: Matthew Brost <matthew.brost@intel.com>
> CC: Zhang Jianxun <jianxun.zhang@intel.com>
> CC: Shuicheng Lin <shuicheng.lin@intel.com>
> 
> Jonathan Cavitt (6):
>   drm/xe/xe_gt_pagefault: Disallow writes to read-only VMAs
>   drm/xe/xe_gt_pagefault: Migrate pagefault struct to header
>   drm/xe/xe_vm: Add per VM pagefault info
>   drm/xe/uapi: Define drm_xe_vm_get_faults
>   drm/xe/xe_gt_pagefault: Add address_type field to pagefaults
>   drm/xe/xe_vm: Implement xe_vm_get_faults_ioctl
> 
>  drivers/gpu/drm/xe/xe_device.c       |   3 +
>  drivers/gpu/drm/xe/xe_gt_pagefault.c |  64 +++++++-------
> drivers/gpu/drm/xe/xe_gt_pagefault.h |  29 +++++++
>  drivers/gpu/drm/xe/xe_vm.c           | 120 +++++++++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_vm.h           |   8 ++
>  drivers/gpu/drm/xe/xe_vm_types.h     |  20 +++++
>  include/uapi/drm/xe_drm.h            |  49 +++++++++++
>  7 files changed, 260 insertions(+), 33 deletions(-)
> 
> --
> 2.43.0