diff mbox

drm/radeon: resume fence driver to last sync sequence on lockup

Message ID 1355506776-3213-1-git-send-email-j.glisse@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jerome Glisse Dec. 14, 2012, 5:39 p.m. UTC
From: Jerome Glisse <jglisse@redhat.com>

After lockup we need to resume fence to last sync sequence and not
last received sequence so that all thread waiting on command stream
that lockedup resume. Otherwise GPU reset will be ineffective in most
cases.

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
---
 drivers/gpu/drm/radeon/radeon_fence.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Christian König Dec. 14, 2012, 8:13 p.m. UTC | #1
On 14.12.2012 18:39, j.glisse@gmail.com wrote:
> From: Jerome Glisse <jglisse@redhat.com>
>
> After lockup we need to resume fence to last sync sequence and not
> last received sequence so that all thread waiting on command stream
> that lockedup resume. Otherwise GPU reset will be ineffective in most
> cases.
NAK. I changed this on purpose to get partial resets working, please 
don't change it back.

The IB test code should reset this to the last synced value anyway, if 
it doesn't work then there is something wrong there.

Christian.


>
> Signed-off-by: Jerome Glisse <jglisse@redhat.com>
> ---
>   drivers/gpu/drm/radeon/radeon_fence.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
> index 22bd6c2..38233e7 100644
> --- a/drivers/gpu/drm/radeon/radeon_fence.c
> +++ b/drivers/gpu/drm/radeon/radeon_fence.c
> @@ -787,7 +787,7 @@ int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring)
>   	}
>   	rdev->fence_drv[ring].cpu_addr = &rdev->wb.wb[index/4];
>   	rdev->fence_drv[ring].gpu_addr = rdev->wb.gpu_addr + index;
> -	radeon_fence_write(rdev, atomic64_read(&rdev->fence_drv[ring].last_seq), ring);
> +	radeon_fence_write(rdev, rdev->fence_drv[ring].sync_seq[ring], ring);
>   	rdev->fence_drv[ring].initialized = true;
>   	dev_info(rdev->dev, "fence driver on ring %d use gpu addr 0x%016llx and cpu addr 0x%p\n",
>   		 ring, rdev->fence_drv[ring].gpu_addr, rdev->fence_drv[ring].cpu_addr);
Jerome Glisse Dec. 14, 2012, 8:33 p.m. UTC | #2
On Fri, Dec 14, 2012 at 3:13 PM, Christian König
<deathsimple@vodafone.de> wrote:
> On 14.12.2012 18:39, j.glisse@gmail.com wrote:
>>
>> From: Jerome Glisse <jglisse@redhat.com>
>>
>> After lockup we need to resume fence to last sync sequence and not
>> last received sequence so that all thread waiting on command stream
>> that lockedup resume. Otherwise GPU reset will be ineffective in most
>> cases.
>
> NAK. I changed this on purpose to get partial resets working, please don't
> change it back.
>
> The IB test code should reset this to the last synced value anyway, if it
> doesn't work then there is something wrong there.
>
> Christian.

There is something wrong ....

Jerome
Christian König Dec. 17, 2012, 9:11 a.m. UTC | #3
On 14.12.2012 21:33, Jerome Glisse wrote:
> On Fri, Dec 14, 2012 at 3:13 PM, Christian König
> <deathsimple@vodafone.de> wrote:
>> On 14.12.2012 18:39, j.glisse@gmail.com wrote:
>>> From: Jerome Glisse <jglisse@redhat.com>
>>>
>>> After lockup we need to resume fence to last sync sequence and not
>>> last received sequence so that all thread waiting on command stream
>>> that lockedup resume. Otherwise GPU reset will be ineffective in most
>>> cases.
>> NAK. I changed this on purpose to get partial resets working, please don't
>> change it back.
>>
>> The IB test code should reset this to the last synced value anyway, if it
>> doesn't work then there is something wrong there.
>>
>> Christian.
> There is something wrong ....

What symptoms? What exactly is going wrong?

Thinking about it the sequence probably won't get reseted when we 
encounter a unrecoverable GPU lockup. And even when the partial GPU 
reset fails it might be a good idea to reset the fence sequence like 
this....

Ok, you're right there is something wrong. Going to write a patch for 
this...

Cheers,
Christian.
Jerome Glisse Dec. 17, 2012, 3:23 p.m. UTC | #4
On Mon, Dec 17, 2012 at 4:11 AM, Christian König
<deathsimple@vodafone.de> wrote:
> On 14.12.2012 21:33, Jerome Glisse wrote:
>>
>> On Fri, Dec 14, 2012 at 3:13 PM, Christian König
>> <deathsimple@vodafone.de> wrote:
>>>
>>> On 14.12.2012 18:39, j.glisse@gmail.com wrote:
>>>>
>>>> From: Jerome Glisse <jglisse@redhat.com>
>>>>
>>>> After lockup we need to resume fence to last sync sequence and not
>>>> last received sequence so that all thread waiting on command stream
>>>> that lockedup resume. Otherwise GPU reset will be ineffective in most
>>>> cases.
>>>
>>> NAK. I changed this on purpose to get partial resets working, please
>>> don't
>>> change it back.
>>>
>>> The IB test code should reset this to the last synced value anyway, if it
>>> doesn't work then there is something wrong there.
>>>
>>> Christian.
>>
>> There is something wrong ....
>
>
> What symptoms? What exactly is going wrong?
>
> Thinking about it the sequence probably won't get reseted when we encounter
> a unrecoverable GPU lockup. And even when the partial GPU reset fails it
> might be a good idea to reset the fence sequence like this....
>
> Ok, you're right there is something wrong. Going to write a patch for
> this...
>
> Cheers,
> Christian.

I already sent a patch that fix most of the issue. But for failed GPU
reset we need to write it.

Cheers,
Jerome
diff mbox

Patch

diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
index 22bd6c2..38233e7 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -787,7 +787,7 @@  int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring)
 	}
 	rdev->fence_drv[ring].cpu_addr = &rdev->wb.wb[index/4];
 	rdev->fence_drv[ring].gpu_addr = rdev->wb.gpu_addr + index;
-	radeon_fence_write(rdev, atomic64_read(&rdev->fence_drv[ring].last_seq), ring);
+	radeon_fence_write(rdev, rdev->fence_drv[ring].sync_seq[ring], ring);
 	rdev->fence_drv[ring].initialized = true;
 	dev_info(rdev->dev, "fence driver on ring %d use gpu addr 0x%016llx and cpu addr 0x%p\n",
 		 ring, rdev->fence_drv[ring].gpu_addr, rdev->fence_drv[ring].cpu_addr);