Message ID | 20220325175839.2717499-1-daniele.ceraolospurio@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915: fix remaining_timeout in intel_gt_retire_requests_timeout | expand |
On Fri, Mar 25, 2022 at 10:58:39AM -0700, Daniele Ceraolo Spurio wrote: > In intel_gt_wait_for_idle, we use the remaining timeout returned from > intel_gt_retire_requests_timeout to wait on the GuC being idle. However, > the returned variable can have a negative value if something goes wrong > during the wait, leading to us hitting a GEM_BUG_ON in the GuC wait > function. > To fix this, make sure to only return the timeout if it is positive. > > Fixes: b97060a99b01b ("drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC") > Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> > Cc: Matthew Brost <matthew.brost@intel.com> > Cc: John Harrison <john.c.harrison@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> > --- > drivers/gpu/drm/i915/gt/intel_gt_requests.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c > index edb881d756309..ef70c209976d8 100644 > --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c > +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c > @@ -197,7 +197,7 @@ out_active: spin_lock(&timelines->lock); > active_count++; > > if (remaining_timeout) > - *remaining_timeout = timeout; > + *remaining_timeout = timeout > 0 ? timeout : 0; > > return active_count ? timeout : 0; > } > -- > 2.25.1 >
On 3/25/2022 6:58 PM, Daniele Ceraolo Spurio wrote: > In intel_gt_wait_for_idle, we use the remaining timeout returned from > intel_gt_retire_requests_timeout to wait on the GuC being idle. However, > the returned variable can have a negative value if something goes wrong > during the wait, leading to us hitting a GEM_BUG_ON in the GuC wait > function. > To fix this, make sure to only return the timeout if it is positive. > > Fixes: b97060a99b01b ("drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC") > Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> > Cc: Matthew Brost <matthew.brost@intel.com> > Cc: John Harrison <john.c.harrison@intel.com> > --- > drivers/gpu/drm/i915/gt/intel_gt_requests.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c > index edb881d756309..ef70c209976d8 100644 > --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c > +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c > @@ -197,7 +197,7 @@ out_active: spin_lock(&timelines->lock); > active_count++; > > if (remaining_timeout) > - *remaining_timeout = timeout; > + *remaining_timeout = timeout > 0 ? timeout : 0; Should the last flush_submission() be "if ( timeout > 0 &&flush_submission(gt, timeout))" ? Nirmoy > > return active_count ? timeout : 0; > }
On 3/25/2022 11:37 AM, Das, Nirmoy wrote: > > On 3/25/2022 6:58 PM, Daniele Ceraolo Spurio wrote: >> In intel_gt_wait_for_idle, we use the remaining timeout returned from >> intel_gt_retire_requests_timeout to wait on the GuC being idle. However, >> the returned variable can have a negative value if something goes wrong >> during the wait, leading to us hitting a GEM_BUG_ON in the GuC wait >> function. >> To fix this, make sure to only return the timeout if it is positive. >> >> Fixes: b97060a99b01b ("drm/i915/guc: Update intel_gt_wait_for_idle to >> work with GuC") >> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> >> Cc: Matthew Brost <matthew.brost@intel.com> >> Cc: John Harrison <john.c.harrison@intel.com> >> --- >> drivers/gpu/drm/i915/gt/intel_gt_requests.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c >> b/drivers/gpu/drm/i915/gt/intel_gt_requests.c >> index edb881d756309..ef70c209976d8 100644 >> --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c >> +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c >> @@ -197,7 +197,7 @@ out_active: spin_lock(&timelines->lock); >> active_count++; >> if (remaining_timeout) >> - *remaining_timeout = timeout; >> + *remaining_timeout = timeout > 0 ? timeout : 0; > > > Should the last flush_submission() be "if ( timeout > 0 > &&flush_submission(gt, timeout))" ? I considered it, but flush_submission only checks for timeout != 0 so it won't accidentally make use of a negative value thinking it's positive. I don't know if the flush is purposely done even if timeout is negative or if that's a mistake, but that code has been there long before we modified the function to return the remaining timeout and never seems to have caused issues, so I decided not to change it. Daniele > > > Nirmoy > >> return active_count ? timeout : 0; >> }
On 3/25/2022 9:33 PM, Ceraolo Spurio, Daniele wrote: > > > On 3/25/2022 11:37 AM, Das, Nirmoy wrote: >> >> On 3/25/2022 6:58 PM, Daniele Ceraolo Spurio wrote: >>> In intel_gt_wait_for_idle, we use the remaining timeout returned from >>> intel_gt_retire_requests_timeout to wait on the GuC being idle. >>> However, >>> the returned variable can have a negative value if something goes wrong >>> during the wait, leading to us hitting a GEM_BUG_ON in the GuC wait >>> function. >>> To fix this, make sure to only return the timeout if it is positive. >>> >>> Fixes: b97060a99b01b ("drm/i915/guc: Update intel_gt_wait_for_idle >>> to work with GuC") >>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> >>> Cc: Matthew Brost <matthew.brost@intel.com> >>> Cc: John Harrison <john.c.harrison@intel.com> >>> --- >>> drivers/gpu/drm/i915/gt/intel_gt_requests.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c >>> b/drivers/gpu/drm/i915/gt/intel_gt_requests.c >>> index edb881d756309..ef70c209976d8 100644 >>> --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c >>> +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c >>> @@ -197,7 +197,7 @@ out_active: spin_lock(&timelines->lock); >>> active_count++; >>> if (remaining_timeout) >>> - *remaining_timeout = timeout; >>> + *remaining_timeout = timeout > 0 ? timeout : 0; >> >> >> Should the last flush_submission() be "if ( timeout > 0 >> &&flush_submission(gt, timeout))" ? > > I considered it, but flush_submission only checks for timeout != 0 so > it won't accidentally make use of a negative value thinking it's > positive. I don't know if the flush is purposely done even if timeout > is negative or if that's a mistake, but that code has been there long > before we modified the function to return the remaining timeout and > never seems to have caused issues, so I decided not to change it. Yes, we need clarify if we really need the final flush if the timeout is negative. But this patch is Acked-by: Nirmoy Das <nirmoy.das@intel.com> Nirmoy > > Daniele > >> >> >> Nirmoy >> >>> return active_count ? timeout : 0; >>> } >
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c index edb881d756309..ef70c209976d8 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c @@ -197,7 +197,7 @@ out_active: spin_lock(&timelines->lock); active_count++; if (remaining_timeout) - *remaining_timeout = timeout; + *remaining_timeout = timeout > 0 ? timeout : 0; return active_count ? timeout : 0; }
In intel_gt_wait_for_idle, we use the remaining timeout returned from intel_gt_retire_requests_timeout to wait on the GuC being idle. However, the returned variable can have a negative value if something goes wrong during the wait, leading to us hitting a GEM_BUG_ON in the GuC wait function. To fix this, make sure to only return the timeout if it is positive. Fixes: b97060a99b01b ("drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: John Harrison <john.c.harrison@intel.com> --- drivers/gpu/drm/i915/gt/intel_gt_requests.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)