diff mbox series

drm/i915: Fix a VMA UAF for multi-gt platform

Message ID 20230605201021.13928-1-nirmoy.das@intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915: Fix a VMA UAF for multi-gt platform | expand

Commit Message

Nirmoy Das June 5, 2023, 8:10 p.m. UTC
Ensure correct handling of closed VMAs on multi-gt platforms to prevent
Use-After-Free. Currently, when GT0 goes idle, closed VMAs that are
exclusively added to GT0's closed_vma link (gt->closed_vma) and
subsequently freed by i915_vma_parked(), which assumes the entire GPU is
idle. However, on platforms with multiple GTs, such as MTL, GT1 may
remain active while GT0 is idle. This causes GT0 to mistakenly consider
the closed VMAs in its closed_vma list as unnecessary, potentially
leading to Use-After-Free issues if a job for GT1 attempts to access a
freed VMA.

Although we do take a wakeref for GT0 but it happens later, after
evaluating VMAs. To mitigate this, it is necessary to hold a GT0 wakeref
early.

Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Chris Wilson <chris.p.wilson@intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Comments

Andi Shyti June 5, 2023, 8:27 p.m. UTC | #1
Hi Nirmoy,

On Mon, Jun 05, 2023 at 10:10:21PM +0200, Nirmoy Das wrote:
> Ensure correct handling of closed VMAs on multi-gt platforms to prevent
> Use-After-Free. Currently, when GT0 goes idle, closed VMAs that are
> exclusively added to GT0's closed_vma link (gt->closed_vma) and
> subsequently freed by i915_vma_parked(), which assumes the entire GPU is
> idle. However, on platforms with multiple GTs, such as MTL, GT1 may
> remain active while GT0 is idle. This causes GT0 to mistakenly consider
> the closed VMAs in its closed_vma list as unnecessary, potentially
> leading to Use-After-Free issues if a job for GT1 attempts to access a
> freed VMA.
> 
> Although we do take a wakeref for GT0 but it happens later, after
> evaluating VMAs. To mitigate this, it is necessary to hold a GT0 wakeref
> early.

hooray! this is great, Nirmoy! I will give it a shot.

> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> Cc: Andrzej Hajda <andrzej.hajda@intel.com>
> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index 5fb459ea4294..adcf8837dfe6 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -2692,6 +2692,7 @@ static int
>  eb_select_engine(struct i915_execbuffer *eb)
>  {
>  	struct intel_context *ce, *child;
> +	struct intel_gt *gt;
>  	unsigned int idx;
>  	int err;
>  
> @@ -2715,10 +2716,16 @@ eb_select_engine(struct i915_execbuffer *eb)
>  		}
>  	}
>  	eb->num_batches = ce->parallel.number_children + 1;
> +	gt = ce->engine->gt;
>  
>  	for_each_child(ce, child)
>  		intel_context_get(child);
>  	intel_gt_pm_get(ce->engine->gt);
> +	/* Keep GT0 active on MTL so that i915_vma_parked() doesn't
> +	 * free VMAs while execbuf ioctl is validating VMAs.
> +	 */
> +	if (gt != to_gt(gt->i915))

you can use gt->info.id

> +		intel_gt_pm_get(to_gt(ce->engine->gt->i915));
>  
>  	if (!test_bit(CONTEXT_ALLOC_BIT, &ce->flags)) {
>  		err = intel_context_alloc_state(ce);
> @@ -2757,6 +2764,9 @@ eb_select_engine(struct i915_execbuffer *eb)
>  	return err;
>  
>  err:
> +	if (ce->engine->gt != to_gt(ce->engine->gt->i915))

	if (gt->info.id)

gt is already ce->engine->gt

> +		intel_gt_pm_get(to_gt(ce->engine->gt->i915));
> +
>  	intel_gt_pm_put(ce->engine->gt);
>  	for_each_child(ce, child)
>  		intel_context_put(child);
> @@ -2770,6 +2780,8 @@ eb_put_engine(struct i915_execbuffer *eb)
>  	struct intel_context *child;
>  
>  	i915_vm_put(eb->context->vm);
> +	if (eb->gt != to_gt(eb->gt->i915))
> +		intel_gt_pm_put(to_gt(eb->gt->i915));

this wakeref going up and down is a bit ugly... Perhaps we can
add some flag about the GT type in the info structure. MTL is a
weird multi-gt platform and, indeed, you can't shut down GT0
without affecting GT1.

For now it's OK, though, as to test it.

Andi

>  	intel_gt_pm_put(eb->gt);
>  	for_each_child(eb->context, child)
>  		intel_context_put(child);
> -- 
> 2.39.0
Nirmoy Das June 6, 2023, 9:06 a.m. UTC | #2
On 6/5/2023 10:27 PM, Andi Shyti wrote:
> Hi Nirmoy,
>
> On Mon, Jun 05, 2023 at 10:10:21PM +0200, Nirmoy Das wrote:
>> Ensure correct handling of closed VMAs on multi-gt platforms to prevent
>> Use-After-Free. Currently, when GT0 goes idle, closed VMAs that are
>> exclusively added to GT0's closed_vma link (gt->closed_vma) and
>> subsequently freed by i915_vma_parked(), which assumes the entire GPU is
>> idle. However, on platforms with multiple GTs, such as MTL, GT1 may
>> remain active while GT0 is idle. This causes GT0 to mistakenly consider
>> the closed VMAs in its closed_vma list as unnecessary, potentially
>> leading to Use-After-Free issues if a job for GT1 attempts to access a
>> freed VMA.
>>
>> Although we do take a wakeref for GT0 but it happens later, after
>> evaluating VMAs. To mitigate this, it is necessary to hold a GT0 wakeref
>> early.
> hooray! this is great, Nirmoy! I will give it a shot.
>
>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> Cc: Chris Wilson <chris.p.wilson@intel.com>
>> Cc: Andi Shyti <andi.shyti@linux.intel.com>
>> Cc: Andrzej Hajda <andrzej.hajda@intel.com>
>> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 12 ++++++++++++
>>   1 file changed, 12 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> index 5fb459ea4294..adcf8837dfe6 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> @@ -2692,6 +2692,7 @@ static int
>>   eb_select_engine(struct i915_execbuffer *eb)
>>   {
>>   	struct intel_context *ce, *child;
>> +	struct intel_gt *gt;
>>   	unsigned int idx;
>>   	int err;
>>   
>> @@ -2715,10 +2716,16 @@ eb_select_engine(struct i915_execbuffer *eb)
>>   		}
>>   	}
>>   	eb->num_batches = ce->parallel.number_children + 1;
>> +	gt = ce->engine->gt;
>>   
>>   	for_each_child(ce, child)
>>   		intel_context_get(child);
>>   	intel_gt_pm_get(ce->engine->gt);
>> +	/* Keep GT0 active on MTL so that i915_vma_parked() doesn't
>> +	 * free VMAs while execbuf ioctl is validating VMAs.
>> +	 */
>> +	if (gt != to_gt(gt->i915))
> you can use gt->info.id
>
>> +		intel_gt_pm_get(to_gt(ce->engine->gt->i915));
>>   
>>   	if (!test_bit(CONTEXT_ALLOC_BIT, &ce->flags)) {
>>   		err = intel_context_alloc_state(ce);
>> @@ -2757,6 +2764,9 @@ eb_select_engine(struct i915_execbuffer *eb)
>>   	return err;
>>   
>>   err:
>> +	if (ce->engine->gt != to_gt(ce->engine->gt->i915))
> 	if (gt->info.id)

Thanks, was wondering if there is a better way to detect that.


>
> gt is already ce->engine->gt
>
>> +		intel_gt_pm_get(to_gt(ce->engine->gt->i915));
>> +
>>   	intel_gt_pm_put(ce->engine->gt);
>>   	for_each_child(ce, child)
>>   		intel_context_put(child);
>> @@ -2770,6 +2780,8 @@ eb_put_engine(struct i915_execbuffer *eb)
>>   	struct intel_context *child;
>>   
>>   	i915_vm_put(eb->context->vm);
>> +	if (eb->gt != to_gt(eb->gt->i915))
>> +		intel_gt_pm_put(to_gt(eb->gt->i915));
> this wakeref going up and down is a bit ugly... Perhaps we can
> add some flag about the GT type in the info structure.


Chris pointed out that there is a patch[1] in internal branch which 
fixes a regression by moving the

intel_context_enter() before vma lookup. That should have same effect as 
this one.

But for now I didn't mange to make it work on drm-tip(causing some 
crash) yet.

[1] drm/i915/gem: Push wait-for-execbuf outside of ww_mutex

>   MTL is a
> weird multi-gt platform and, indeed, you can't shut down GT0
> without affecting GT1.
>
> For now it's OK, though, as to test it.

Looking forward to that. I did test it extensively and ChromeOS team as 
well.


Regards,

Nirmoy

>
> Andi
>
>>   	intel_gt_pm_put(eb->gt);
>>   	for_each_child(eb->context, child)
>>   		intel_context_put(child);
>> -- 
>> 2.39.0
Andi Shyti June 6, 2023, 8:11 p.m. UTC | #3
Hi Nirmoy,

> >   MTL is a
> > weird multi-gt platform and, indeed, you can't shut down GT0
> > without affecting GT1.
> > 
> > For now it's OK, though, as to test it.
> 
> Looking forward to that. I did test it extensively and ChromeOS team as
> well.

great job, Nirmoy! I haven't been able to reproduce the issue.
This is a great news!

Tested-by: Andi Shyti <andi.shyti@linux.intel.com>

Andi
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 5fb459ea4294..adcf8837dfe6 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -2692,6 +2692,7 @@  static int
 eb_select_engine(struct i915_execbuffer *eb)
 {
 	struct intel_context *ce, *child;
+	struct intel_gt *gt;
 	unsigned int idx;
 	int err;
 
@@ -2715,10 +2716,16 @@  eb_select_engine(struct i915_execbuffer *eb)
 		}
 	}
 	eb->num_batches = ce->parallel.number_children + 1;
+	gt = ce->engine->gt;
 
 	for_each_child(ce, child)
 		intel_context_get(child);
 	intel_gt_pm_get(ce->engine->gt);
+	/* Keep GT0 active on MTL so that i915_vma_parked() doesn't
+	 * free VMAs while execbuf ioctl is validating VMAs.
+	 */
+	if (gt != to_gt(gt->i915))
+		intel_gt_pm_get(to_gt(ce->engine->gt->i915));
 
 	if (!test_bit(CONTEXT_ALLOC_BIT, &ce->flags)) {
 		err = intel_context_alloc_state(ce);
@@ -2757,6 +2764,9 @@  eb_select_engine(struct i915_execbuffer *eb)
 	return err;
 
 err:
+	if (ce->engine->gt != to_gt(ce->engine->gt->i915))
+		intel_gt_pm_get(to_gt(ce->engine->gt->i915));
+
 	intel_gt_pm_put(ce->engine->gt);
 	for_each_child(ce, child)
 		intel_context_put(child);
@@ -2770,6 +2780,8 @@  eb_put_engine(struct i915_execbuffer *eb)
 	struct intel_context *child;
 
 	i915_vm_put(eb->context->vm);
+	if (eb->gt != to_gt(eb->gt->i915))
+		intel_gt_pm_put(to_gt(eb->gt->i915));
 	intel_gt_pm_put(eb->gt);
 	for_each_child(eb->context, child)
 		intel_context_put(child);