Message ID | 1355506776-3213-1-git-send-email-j.glisse@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 14.12.2012 18:39, j.glisse@gmail.com wrote: > From: Jerome Glisse <jglisse@redhat.com> > > After lockup we need to resume fence to last sync sequence and not > last received sequence so that all thread waiting on command stream > that lockedup resume. Otherwise GPU reset will be ineffective in most > cases. NAK. I changed this on purpose to get partial resets working, please don't change it back. The IB test code should reset this to the last synced value anyway, if it doesn't work then there is something wrong there. Christian. > > Signed-off-by: Jerome Glisse <jglisse@redhat.com> > --- > drivers/gpu/drm/radeon/radeon_fence.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c > index 22bd6c2..38233e7 100644 > --- a/drivers/gpu/drm/radeon/radeon_fence.c > +++ b/drivers/gpu/drm/radeon/radeon_fence.c > @@ -787,7 +787,7 @@ int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring) > } > rdev->fence_drv[ring].cpu_addr = &rdev->wb.wb[index/4]; > rdev->fence_drv[ring].gpu_addr = rdev->wb.gpu_addr + index; > - radeon_fence_write(rdev, atomic64_read(&rdev->fence_drv[ring].last_seq), ring); > + radeon_fence_write(rdev, rdev->fence_drv[ring].sync_seq[ring], ring); > rdev->fence_drv[ring].initialized = true; > dev_info(rdev->dev, "fence driver on ring %d use gpu addr 0x%016llx and cpu addr 0x%p\n", > ring, rdev->fence_drv[ring].gpu_addr, rdev->fence_drv[ring].cpu_addr);
On Fri, Dec 14, 2012 at 3:13 PM, Christian König <deathsimple@vodafone.de> wrote: > On 14.12.2012 18:39, j.glisse@gmail.com wrote: >> >> From: Jerome Glisse <jglisse@redhat.com> >> >> After lockup we need to resume fence to last sync sequence and not >> last received sequence so that all thread waiting on command stream >> that lockedup resume. Otherwise GPU reset will be ineffective in most >> cases. > > NAK. I changed this on purpose to get partial resets working, please don't > change it back. > > The IB test code should reset this to the last synced value anyway, if it > doesn't work then there is something wrong there. > > Christian. There is something wrong .... Jerome
On 14.12.2012 21:33, Jerome Glisse wrote: > On Fri, Dec 14, 2012 at 3:13 PM, Christian König > <deathsimple@vodafone.de> wrote: >> On 14.12.2012 18:39, j.glisse@gmail.com wrote: >>> From: Jerome Glisse <jglisse@redhat.com> >>> >>> After lockup we need to resume fence to last sync sequence and not >>> last received sequence so that all thread waiting on command stream >>> that lockedup resume. Otherwise GPU reset will be ineffective in most >>> cases. >> NAK. I changed this on purpose to get partial resets working, please don't >> change it back. >> >> The IB test code should reset this to the last synced value anyway, if it >> doesn't work then there is something wrong there. >> >> Christian. > There is something wrong .... What symptoms? What exactly is going wrong? Thinking about it the sequence probably won't get reseted when we encounter a unrecoverable GPU lockup. And even when the partial GPU reset fails it might be a good idea to reset the fence sequence like this.... Ok, you're right there is something wrong. Going to write a patch for this... Cheers, Christian.
On Mon, Dec 17, 2012 at 4:11 AM, Christian König <deathsimple@vodafone.de> wrote: > On 14.12.2012 21:33, Jerome Glisse wrote: >> >> On Fri, Dec 14, 2012 at 3:13 PM, Christian König >> <deathsimple@vodafone.de> wrote: >>> >>> On 14.12.2012 18:39, j.glisse@gmail.com wrote: >>>> >>>> From: Jerome Glisse <jglisse@redhat.com> >>>> >>>> After lockup we need to resume fence to last sync sequence and not >>>> last received sequence so that all thread waiting on command stream >>>> that lockedup resume. Otherwise GPU reset will be ineffective in most >>>> cases. >>> >>> NAK. I changed this on purpose to get partial resets working, please >>> don't >>> change it back. >>> >>> The IB test code should reset this to the last synced value anyway, if it >>> doesn't work then there is something wrong there. >>> >>> Christian. >> >> There is something wrong .... > > > What symptoms? What exactly is going wrong? > > Thinking about it the sequence probably won't get reseted when we encounter > a unrecoverable GPU lockup. And even when the partial GPU reset fails it > might be a good idea to reset the fence sequence like this.... > > Ok, you're right there is something wrong. Going to write a patch for > this... > > Cheers, > Christian. I already sent a patch that fix most of the issue. But for failed GPU reset we need to write it. Cheers, Jerome
diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c index 22bd6c2..38233e7 100644 --- a/drivers/gpu/drm/radeon/radeon_fence.c +++ b/drivers/gpu/drm/radeon/radeon_fence.c @@ -787,7 +787,7 @@ int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring) } rdev->fence_drv[ring].cpu_addr = &rdev->wb.wb[index/4]; rdev->fence_drv[ring].gpu_addr = rdev->wb.gpu_addr + index; - radeon_fence_write(rdev, atomic64_read(&rdev->fence_drv[ring].last_seq), ring); + radeon_fence_write(rdev, rdev->fence_drv[ring].sync_seq[ring], ring); rdev->fence_drv[ring].initialized = true; dev_info(rdev->dev, "fence driver on ring %d use gpu addr 0x%016llx and cpu addr 0x%p\n", ring, rdev->fence_drv[ring].gpu_addr, rdev->fence_drv[ring].cpu_addr);