diff mbox series

drm/radeon: Reset ASIC if suspend is not managed by platform firmware

Message ID 20200901063227.6057-1-kai.heng.feng@canonical.com (mailing list archive)
State New, archived
Headers show
Series drm/radeon: Reset ASIC if suspend is not managed by platform firmware | expand

Commit Message

Kai-Heng Feng Sept. 1, 2020, 6:32 a.m. UTC
Suspend with s2idle or by the following steps cause screen frozen:
 # echo devices > /sys/power/pm_test
 # echo freeze > /sys/power/mem

[  289.625461] [drm:uvd_v1_0_ib_test [radeon]] *ERROR* radeon: fence wait timed out.
[  289.625494] [drm:radeon_ib_ring_tests [radeon]] *ERROR* radeon: failed testing IB on ring 5 (-110).

The issue doesn't happen on traditional S3, probably because firmware or
hardware provides extra power management.

Inspired by Daniel Drake's patch [1] on amdgpu, using a similar approach
can fix the issue.

[1] https://patchwork.freedesktop.org/patch/335839/

Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
---
 drivers/gpu/drm/radeon/radeon_device.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Alex Deucher Sept. 1, 2020, 2:19 p.m. UTC | #1
On Tue, Sep 1, 2020 at 3:32 AM Kai-Heng Feng
<kai.heng.feng@canonical.com> wrote:
>
> Suspend with s2idle or by the following steps cause screen frozen:
>  # echo devices > /sys/power/pm_test
>  # echo freeze > /sys/power/mem
>
> [  289.625461] [drm:uvd_v1_0_ib_test [radeon]] *ERROR* radeon: fence wait timed out.
> [  289.625494] [drm:radeon_ib_ring_tests [radeon]] *ERROR* radeon: failed testing IB on ring 5 (-110).
>
> The issue doesn't happen on traditional S3, probably because firmware or
> hardware provides extra power management.
>
> Inspired by Daniel Drake's patch [1] on amdgpu, using a similar approach
> can fix the issue.

It doesn't actually fix the issue.  The device is never powered down
so you are using more power than you would if you did not suspend in
the first place.  The reset just works around the fact that the device
is never powered down.

Alex

>
> [1] https://patchwork.freedesktop.org/patch/335839/
>
> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> ---
>  drivers/gpu/drm/radeon/radeon_device.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
> index 266e3cbbd09b..df823b9ad79f 100644
> --- a/drivers/gpu/drm/radeon/radeon_device.c
> +++ b/drivers/gpu/drm/radeon/radeon_device.c
> @@ -33,6 +33,7 @@
>  #include <linux/slab.h>
>  #include <linux/vga_switcheroo.h>
>  #include <linux/vgaarb.h>
> +#include <linux/suspend.h>
>
>  #include <drm/drm_cache.h>
>  #include <drm/drm_crtc_helper.h>
> @@ -1643,6 +1644,8 @@ int radeon_suspend_kms(struct drm_device *dev, bool suspend,
>                 rdev->asic->asic_reset(rdev, true);
>                 pci_restore_state(dev->pdev);
>         } else if (suspend) {
> +               if (pm_suspend_no_platform())
> +                       rdev->asic->asic_reset(rdev, true);
>                 /* Shut down the device */
>                 pci_disable_device(dev->pdev);
>                 pci_set_power_state(dev->pdev, PCI_D3hot);
> --
> 2.17.1
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Kai-Heng Feng Sept. 1, 2020, 4:20 p.m. UTC | #2
> On Sep 1, 2020, at 22:19, Alex Deucher <alexdeucher@gmail.com> wrote:
> 
> On Tue, Sep 1, 2020 at 3:32 AM Kai-Heng Feng
> <kai.heng.feng@canonical.com> wrote:
>> 
>> Suspend with s2idle or by the following steps cause screen frozen:
>> # echo devices > /sys/power/pm_test
>> # echo freeze > /sys/power/mem
>> 
>> [  289.625461] [drm:uvd_v1_0_ib_test [radeon]] *ERROR* radeon: fence wait timed out.
>> [  289.625494] [drm:radeon_ib_ring_tests [radeon]] *ERROR* radeon: failed testing IB on ring 5 (-110).
>> 
>> The issue doesn't happen on traditional S3, probably because firmware or
>> hardware provides extra power management.
>> 
>> Inspired by Daniel Drake's patch [1] on amdgpu, using a similar approach
>> can fix the issue.
> 
> It doesn't actually fix the issue.  The device is never powered down
> so you are using more power than you would if you did not suspend in
> the first place.  The reset just works around the fact that the device
> is never powered down.

So how do we properly suspend/resume the device without help from platform firmware?

Kai-Heng

> 
> Alex
> 
>> 
>> [1] https://patchwork.freedesktop.org/patch/335839/
>> 
>> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
>> ---
>> drivers/gpu/drm/radeon/radeon_device.c | 3 +++
>> 1 file changed, 3 insertions(+)
>> 
>> diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
>> index 266e3cbbd09b..df823b9ad79f 100644
>> --- a/drivers/gpu/drm/radeon/radeon_device.c
>> +++ b/drivers/gpu/drm/radeon/radeon_device.c
>> @@ -33,6 +33,7 @@
>> #include <linux/slab.h>
>> #include <linux/vga_switcheroo.h>
>> #include <linux/vgaarb.h>
>> +#include <linux/suspend.h>
>> 
>> #include <drm/drm_cache.h>
>> #include <drm/drm_crtc_helper.h>
>> @@ -1643,6 +1644,8 @@ int radeon_suspend_kms(struct drm_device *dev, bool suspend,
>>                rdev->asic->asic_reset(rdev, true);
>>                pci_restore_state(dev->pdev);
>>        } else if (suspend) {
>> +               if (pm_suspend_no_platform())
>> +                       rdev->asic->asic_reset(rdev, true);
>>                /* Shut down the device */
>>                pci_disable_device(dev->pdev);
>>                pci_set_power_state(dev->pdev, PCI_D3hot);
>> --
>> 2.17.1
>> 
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Alex Deucher Sept. 1, 2020, 4:30 p.m. UTC | #3
On Tue, Sep 1, 2020 at 12:21 PM Kai-Heng Feng
<kai.heng.feng@canonical.com> wrote:
>
>
>
> > On Sep 1, 2020, at 22:19, Alex Deucher <alexdeucher@gmail.com> wrote:
> >
> > On Tue, Sep 1, 2020 at 3:32 AM Kai-Heng Feng
> > <kai.heng.feng@canonical.com> wrote:
> >>
> >> Suspend with s2idle or by the following steps cause screen frozen:
> >> # echo devices > /sys/power/pm_test
> >> # echo freeze > /sys/power/mem
> >>
> >> [  289.625461] [drm:uvd_v1_0_ib_test [radeon]] *ERROR* radeon: fence wait timed out.
> >> [  289.625494] [drm:radeon_ib_ring_tests [radeon]] *ERROR* radeon: failed testing IB on ring 5 (-110).
> >>
> >> The issue doesn't happen on traditional S3, probably because firmware or
> >> hardware provides extra power management.
> >>
> >> Inspired by Daniel Drake's patch [1] on amdgpu, using a similar approach
> >> can fix the issue.
> >
> > It doesn't actually fix the issue.  The device is never powered down
> > so you are using more power than you would if you did not suspend in
> > the first place.  The reset just works around the fact that the device
> > is never powered down.
>
> So how do we properly suspend/resume the device without help from platform firmware?

I guess you don't?

Alex


>
> Kai-Heng
>
> >
> > Alex
> >
> >>
> >> [1] https://patchwork.freedesktop.org/patch/335839/
> >>
> >> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> >> ---
> >> drivers/gpu/drm/radeon/radeon_device.c | 3 +++
> >> 1 file changed, 3 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
> >> index 266e3cbbd09b..df823b9ad79f 100644
> >> --- a/drivers/gpu/drm/radeon/radeon_device.c
> >> +++ b/drivers/gpu/drm/radeon/radeon_device.c
> >> @@ -33,6 +33,7 @@
> >> #include <linux/slab.h>
> >> #include <linux/vga_switcheroo.h>
> >> #include <linux/vgaarb.h>
> >> +#include <linux/suspend.h>
> >>
> >> #include <drm/drm_cache.h>
> >> #include <drm/drm_crtc_helper.h>
> >> @@ -1643,6 +1644,8 @@ int radeon_suspend_kms(struct drm_device *dev, bool suspend,
> >>                rdev->asic->asic_reset(rdev, true);
> >>                pci_restore_state(dev->pdev);
> >>        } else if (suspend) {
> >> +               if (pm_suspend_no_platform())
> >> +                       rdev->asic->asic_reset(rdev, true);
> >>                /* Shut down the device */
> >>                pci_disable_device(dev->pdev);
> >>                pci_set_power_state(dev->pdev, PCI_D3hot);
> >> --
> >> 2.17.1
> >>
> >> _______________________________________________
> >> dri-devel mailing list
> >> dri-devel@lists.freedesktop.org
> >> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
Kai-Heng Feng Sept. 2, 2020, 12:39 a.m. UTC | #4
> On Sep 2, 2020, at 00:30, Alex Deucher <alexdeucher@gmail.com> wrote:
> 
> On Tue, Sep 1, 2020 at 12:21 PM Kai-Heng Feng
> <kai.heng.feng@canonical.com> wrote:
>> 
>> 
>> 
>>> On Sep 1, 2020, at 22:19, Alex Deucher <alexdeucher@gmail.com> wrote:
>>> 
>>> On Tue, Sep 1, 2020 at 3:32 AM Kai-Heng Feng
>>> <kai.heng.feng@canonical.com> wrote:
>>>> 
>>>> Suspend with s2idle or by the following steps cause screen frozen:
>>>> # echo devices > /sys/power/pm_test
>>>> # echo freeze > /sys/power/mem
>>>> 
>>>> [  289.625461] [drm:uvd_v1_0_ib_test [radeon]] *ERROR* radeon: fence wait timed out.
>>>> [  289.625494] [drm:radeon_ib_ring_tests [radeon]] *ERROR* radeon: failed testing IB on ring 5 (-110).
>>>> 
>>>> The issue doesn't happen on traditional S3, probably because firmware or
>>>> hardware provides extra power management.
>>>> 
>>>> Inspired by Daniel Drake's patch [1] on amdgpu, using a similar approach
>>>> can fix the issue.
>>> 
>>> It doesn't actually fix the issue.  The device is never powered down
>>> so you are using more power than you would if you did not suspend in
>>> the first place.  The reset just works around the fact that the device
>>> is never powered down.
>> 
>> So how do we properly suspend/resume the device without help from platform firmware?
> 
> I guess you don't?

Unfortunate but I guess we need to accept reality and use the default suspend method.

Kai-Heng

> 
> Alex
> 
> 
>> 
>> Kai-Heng
>> 
>>> 
>>> Alex
>>> 
>>>> 
>>>> [1] https://patchwork.freedesktop.org/patch/335839/
>>>> 
>>>> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
>>>> ---
>>>> drivers/gpu/drm/radeon/radeon_device.c | 3 +++
>>>> 1 file changed, 3 insertions(+)
>>>> 
>>>> diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
>>>> index 266e3cbbd09b..df823b9ad79f 100644
>>>> --- a/drivers/gpu/drm/radeon/radeon_device.c
>>>> +++ b/drivers/gpu/drm/radeon/radeon_device.c
>>>> @@ -33,6 +33,7 @@
>>>> #include <linux/slab.h>
>>>> #include <linux/vga_switcheroo.h>
>>>> #include <linux/vgaarb.h>
>>>> +#include <linux/suspend.h>
>>>> 
>>>> #include <drm/drm_cache.h>
>>>> #include <drm/drm_crtc_helper.h>
>>>> @@ -1643,6 +1644,8 @@ int radeon_suspend_kms(struct drm_device *dev, bool suspend,
>>>>               rdev->asic->asic_reset(rdev, true);
>>>>               pci_restore_state(dev->pdev);
>>>>       } else if (suspend) {
>>>> +               if (pm_suspend_no_platform())
>>>> +                       rdev->asic->asic_reset(rdev, true);
>>>>               /* Shut down the device */
>>>>               pci_disable_device(dev->pdev);
>>>>               pci_set_power_state(dev->pdev, PCI_D3hot);
>>>> --
>>>> 2.17.1
>>>> 
>>>> _______________________________________________
>>>> dri-devel mailing list
>>>> dri-devel@lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
index 266e3cbbd09b..df823b9ad79f 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -33,6 +33,7 @@ 
 #include <linux/slab.h>
 #include <linux/vga_switcheroo.h>
 #include <linux/vgaarb.h>
+#include <linux/suspend.h>
 
 #include <drm/drm_cache.h>
 #include <drm/drm_crtc_helper.h>
@@ -1643,6 +1644,8 @@  int radeon_suspend_kms(struct drm_device *dev, bool suspend,
 		rdev->asic->asic_reset(rdev, true);
 		pci_restore_state(dev->pdev);
 	} else if (suspend) {
+		if (pm_suspend_no_platform())
+			rdev->asic->asic_reset(rdev, true);
 		/* Shut down the device */
 		pci_disable_device(dev->pdev);
 		pci_set_power_state(dev->pdev, PCI_D3hot);