diff mbox

[2/3] drm/i915/guc: downgrade some DRM_ERROR() messages to DRM_WARN()

Message ID 1468260090-2756-2-git-send-email-david.s.gordon@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Dave Gordon July 11, 2016, 6:01 p.m. UTC
Where we're going to continue regardless of the problem, rather than
fail, then the message should be a WARNing rather than an ERROR.

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
---
 drivers/gpu/drm/i915/i915_guc_submission.c | 18 +++++++-----------
 1 file changed, 7 insertions(+), 11 deletions(-)

Comments

Tvrtko Ursulin July 12, 2016, 9:20 a.m. UTC | #1
On 11/07/16 19:01, Dave Gordon wrote:
> Where we're going to continue regardless of the problem, rather than
> fail, then the message should be a WARNing rather than an ERROR.
>
> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_guc_submission.c | 18 +++++++-----------
>   1 file changed, 7 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
> index 2112e02..e299b64 100644
> --- a/drivers/gpu/drm/i915/i915_guc_submission.c
> +++ b/drivers/gpu/drm/i915/i915_guc_submission.c
> @@ -114,10 +114,8 @@ static int host2guc_action(struct intel_guc *guc, u32 *data, u32 len)
>   		if (ret != -ETIMEDOUT)
>   			ret = -EIO;
>
> -		DRM_ERROR("GUC: host2guc action 0x%X failed. ret=%d "
> -				"status=0x%08X response=0x%08X\n",
> -				data[0], ret, status,
> -				I915_READ(SOFT_SCRATCH(15)));
> +		DRM_WARN("Action 0x%X failed; ret=%d status=0x%08X response=0x%08X\n",
> +			 data[0], ret, status, I915_READ(SOFT_SCRATCH(15)));

Hm, this does propagate the error code to the callers some which will 
act and log the failure. Majority won't though - like suspend/resume 
etc. In those cases it feels more like an error than a warning.

>
>   		dev_priv->guc.action_fail += 1;
>   		dev_priv->guc.action_err = ret;
> @@ -553,8 +551,8 @@ static int guc_ring_doorbell(struct i915_guc_client *gc)
>   		if (db_ret.db_status == GUC_DOORBELL_DISABLED)
>   			break;
>
> -		DRM_ERROR("Cookie mismatch. Expected %d, returned %d\n",
> -			  db_cmp.cookie, db_ret.cookie);
> +		DRM_WARN("Cookie mismatch. Expected %d, found %d\n",
> +			 db_cmp.cookie, db_ret.cookie);

This one is interesting, error is propagated out a bit but then ignored 
in actual command submission.

If the above message means command will not be submitted error is 
probably more appropriate. Or perhaps we cannot tell if the command was 
submitted or not in this case?

>
>   		/* update the cookie to newly read cookie from GuC */
>   		db_cmp.cookie = db_ret.cookie;
> @@ -726,8 +724,8 @@ static void guc_init_doorbell_hw(struct intel_guc *guc)
>   	/* Restore to original value */
>   	err = guc_update_doorbell_id(guc, client, db_id);
>   	if (err)
> -		DRM_ERROR("Failed to restore doorbell to %d, err %d\n",
> -			db_id, err);
> +		DRM_WARN("Failed to restore doorbell to %d, err %d\n",
> +			 db_id, err);
>
>   	for (i = 0; i < GUC_MAX_DOORBELLS; ++i) {
>   		i915_reg_t drbreg = GEN8_DRBREGL(i);
> @@ -819,8 +817,6 @@ static void guc_init_doorbell_hw(struct intel_guc *guc)
>   	return client;
>
>   err:
> -	DRM_ERROR("FAILED to create priority %u GuC client!\n", priority);
> -
>   	guc_client_free(dev_priv, client);
>   	return NULL;
>   }
> @@ -998,7 +994,7 @@ int i915_guc_submission_enable(struct drm_i915_private *dev_priv)
>   				  GUC_CTX_PRIORITY_KMD_NORMAL,
>   				  dev_priv->kernel_context);
>   	if (!client) {
> -		DRM_ERROR("Failed to create execbuf guc_client\n");
> +		DRM_ERROR("Failed to create normal GuC client!\n");
>   		return -ENOMEM;
>   	}
>
>

Regards,

Tvrtko
Chris Wilson July 12, 2016, 9:27 a.m. UTC | #2
On Tue, Jul 12, 2016 at 10:20:43AM +0100, Tvrtko Ursulin wrote:
> On 11/07/16 19:01, Dave Gordon wrote:
> >@@ -553,8 +551,8 @@ static int guc_ring_doorbell(struct i915_guc_client *gc)
> >  		if (db_ret.db_status == GUC_DOORBELL_DISABLED)
> >  			break;
> >
> >-		DRM_ERROR("Cookie mismatch. Expected %d, returned %d\n",
> >-			  db_cmp.cookie, db_ret.cookie);
> >+		DRM_WARN("Cookie mismatch. Expected %d, found %d\n",
> >+			 db_cmp.cookie, db_ret.cookie);
> 
> This one is interesting, error is propagated out a bit but then
> ignored in actual command submission.
> 
> If the above message means command will not be submitted error is
> probably more appropriate. Or perhaps we cannot tell if the command
> was submitted or not in this case?

It's insignificant. An actual error would result in a GPU hang, and
without being recorded in the error state any message here is useless.
-Chris
Tvrtko Ursulin July 12, 2016, 9:37 a.m. UTC | #3
On 12/07/16 10:27, Chris Wilson wrote:
> On Tue, Jul 12, 2016 at 10:20:43AM +0100, Tvrtko Ursulin wrote:
>> On 11/07/16 19:01, Dave Gordon wrote:
>>> @@ -553,8 +551,8 @@ static int guc_ring_doorbell(struct i915_guc_client *gc)
>>>   		if (db_ret.db_status == GUC_DOORBELL_DISABLED)
>>>   			break;
>>>
>>> -		DRM_ERROR("Cookie mismatch. Expected %d, returned %d\n",
>>> -			  db_cmp.cookie, db_ret.cookie);
>>> +		DRM_WARN("Cookie mismatch. Expected %d, found %d\n",
>>> +			 db_cmp.cookie, db_ret.cookie);
>>
>> This one is interesting, error is propagated out a bit but then
>> ignored in actual command submission.
>>
>> If the above message means command will not be submitted error is
>> probably more appropriate. Or perhaps we cannot tell if the command
>> was submitted or not in this case?
>
> It's insignificant. An actual error would result in a GPU hang, and
> without being recorded in the error state any message here is useless.

I don't agree that it is useless, if it is a very unexpected situation 
it deserves to be logged. People do store and look at logs when things 
go bad.

Regards,

Tvrtko
Dave Gordon July 12, 2016, 12:29 p.m. UTC | #4
On 12/07/16 10:20, Tvrtko Ursulin wrote:
>
> On 11/07/16 19:01, Dave Gordon wrote:
>> Where we're going to continue regardless of the problem, rather than
>> fail, then the message should be a WARNing rather than an ERROR.
>>
>> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_guc_submission.c | 18 +++++++-----------
>>   1 file changed, 7 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c
>> b/drivers/gpu/drm/i915/i915_guc_submission.c
>> index 2112e02..e299b64 100644
>> --- a/drivers/gpu/drm/i915/i915_guc_submission.c
>> +++ b/drivers/gpu/drm/i915/i915_guc_submission.c
>> @@ -114,10 +114,8 @@ static int host2guc_action(struct intel_guc *guc,
>> u32 *data, u32 len)
>>           if (ret != -ETIMEDOUT)
>>               ret = -EIO;
>>
>> -        DRM_ERROR("GUC: host2guc action 0x%X failed. ret=%d "
>> -                "status=0x%08X response=0x%08X\n",
>> -                data[0], ret, status,
>> -                I915_READ(SOFT_SCRATCH(15)));
>> +        DRM_WARN("Action 0x%X failed; ret=%d status=0x%08X
>> response=0x%08X\n",
>> +             data[0], ret, status, I915_READ(SOFT_SCRATCH(15)));
>
> Hm, this does propagate the error code to the callers some which will
> act and log the failure. Majority won't though - like suspend/resume
> etc. In those cases it feels more like an error than a warning.

It's definitely something that shouldn't happen, so we need to log it; 
and it has to be done at this level because we don't pass enough 
information back to leave it to the caller. But OTOH this layer doesn't 
have enough information to determine just how serious a failure is.

So as a compromise the idea is to log a WARNING here, and then the 
caller can choose to:
1. pass the failure up (until we reach a layer with more context)
2. quietly disregard the failure and continue anyway
3. report an ERROR and fail/abort the process.

That way we should get all the useful information about the root cause 
of something that ends up as an ERROR, while neither ignoring nor being 
too verbose about failures from which we may ultimately recover.

For example, one of the callers (the doorbell h/w initialisation code) 
considers some failures as interesting but not critical (DEBUG level) 
but other instances of the exact same operation are fatal (ERROR).

>>           dev_priv->guc.action_fail += 1;
>>           dev_priv->guc.action_err = ret;
>> @@ -553,8 +551,8 @@ static int guc_ring_doorbell(struct
>> i915_guc_client *gc)
>>           if (db_ret.db_status == GUC_DOORBELL_DISABLED)
>>               break;
>>
>> -        DRM_ERROR("Cookie mismatch. Expected %d, returned %d\n",
>> -              db_cmp.cookie, db_ret.cookie);
>> +        DRM_WARN("Cookie mismatch. Expected %d, found %d\n",
>> +             db_cmp.cookie, db_ret.cookie);
>
> This one is interesting, error is propagated out a bit but then ignored
> in actual command submission.
>
> If the above message means command will not be submitted error is
> probably more appropriate. Or perhaps we cannot tell if the command was
> submitted or not in this case?

Note that this is inside a retry loop. It shouldn't ever happen, but if 
it does we'll report it and try one more time. If it keeps happening 
(which would require active interference by some other party) we won't 
be able to ring the doorbell and the failure will be propagated back out 
of the GuC submission code. OTOH the caller then ignores it because 
"submission is not allowed to fail" (!)

And yes, it is then undefined as to whether the command has been 
submitted or not. If it hasn't we'll expect a GPU hang later.

.Dave.

>>           /* update the cookie to newly read cookie from GuC */
>>           db_cmp.cookie = db_ret.cookie;
>> @@ -726,8 +724,8 @@ static void guc_init_doorbell_hw(struct intel_guc
>> *guc)
>>       /* Restore to original value */
>>       err = guc_update_doorbell_id(guc, client, db_id);
>>       if (err)
>> -        DRM_ERROR("Failed to restore doorbell to %d, err %d\n",
>> -            db_id, err);
>> +        DRM_WARN("Failed to restore doorbell to %d, err %d\n",
>> +             db_id, err);
>>
>>       for (i = 0; i < GUC_MAX_DOORBELLS; ++i) {
>>           i915_reg_t drbreg = GEN8_DRBREGL(i);
>> @@ -819,8 +817,6 @@ static void guc_init_doorbell_hw(struct intel_guc
>> *guc)
>>       return client;
>>
>>   err:
>> -    DRM_ERROR("FAILED to create priority %u GuC client!\n", priority);
>> -
>>       guc_client_free(dev_priv, client);
>>       return NULL;
>>   }
>> @@ -998,7 +994,7 @@ int i915_guc_submission_enable(struct
>> drm_i915_private *dev_priv)
>>                     GUC_CTX_PRIORITY_KMD_NORMAL,
>>                     dev_priv->kernel_context);
>>       if (!client) {
>> -        DRM_ERROR("Failed to create execbuf guc_client\n");
>> +        DRM_ERROR("Failed to create normal GuC client!\n");
>>           return -ENOMEM;
>>       }
>>
>>
>
> Regards,
> Tvrtko
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index 2112e02..e299b64 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -114,10 +114,8 @@  static int host2guc_action(struct intel_guc *guc, u32 *data, u32 len)
 		if (ret != -ETIMEDOUT)
 			ret = -EIO;
 
-		DRM_ERROR("GUC: host2guc action 0x%X failed. ret=%d "
-				"status=0x%08X response=0x%08X\n",
-				data[0], ret, status,
-				I915_READ(SOFT_SCRATCH(15)));
+		DRM_WARN("Action 0x%X failed; ret=%d status=0x%08X response=0x%08X\n",
+			 data[0], ret, status, I915_READ(SOFT_SCRATCH(15)));
 
 		dev_priv->guc.action_fail += 1;
 		dev_priv->guc.action_err = ret;
@@ -553,8 +551,8 @@  static int guc_ring_doorbell(struct i915_guc_client *gc)
 		if (db_ret.db_status == GUC_DOORBELL_DISABLED)
 			break;
 
-		DRM_ERROR("Cookie mismatch. Expected %d, returned %d\n",
-			  db_cmp.cookie, db_ret.cookie);
+		DRM_WARN("Cookie mismatch. Expected %d, found %d\n",
+			 db_cmp.cookie, db_ret.cookie);
 
 		/* update the cookie to newly read cookie from GuC */
 		db_cmp.cookie = db_ret.cookie;
@@ -726,8 +724,8 @@  static void guc_init_doorbell_hw(struct intel_guc *guc)
 	/* Restore to original value */
 	err = guc_update_doorbell_id(guc, client, db_id);
 	if (err)
-		DRM_ERROR("Failed to restore doorbell to %d, err %d\n",
-			db_id, err);
+		DRM_WARN("Failed to restore doorbell to %d, err %d\n",
+			 db_id, err);
 
 	for (i = 0; i < GUC_MAX_DOORBELLS; ++i) {
 		i915_reg_t drbreg = GEN8_DRBREGL(i);
@@ -819,8 +817,6 @@  static void guc_init_doorbell_hw(struct intel_guc *guc)
 	return client;
 
 err:
-	DRM_ERROR("FAILED to create priority %u GuC client!\n", priority);
-
 	guc_client_free(dev_priv, client);
 	return NULL;
 }
@@ -998,7 +994,7 @@  int i915_guc_submission_enable(struct drm_i915_private *dev_priv)
 				  GUC_CTX_PRIORITY_KMD_NORMAL,
 				  dev_priv->kernel_context);
 	if (!client) {
-		DRM_ERROR("Failed to create execbuf guc_client\n");
+		DRM_ERROR("Failed to create normal GuC client!\n");
 		return -ENOMEM;
 	}