diff mbox series

drm/nouveau: Don't disable polling in fallback mode

Message ID 20180912105843.18117-1-tiwai@suse.de (mailing list archive)
State New, archived
Headers show
Series drm/nouveau: Don't disable polling in fallback mode | expand

Commit Message

Takashi Iwai Sept. 12, 2018, 10:58 a.m. UTC
When a fan is controlled via linear fallback without cstate, we
shouldn't stop polling.  Otherwise it won't be adjusted again and
keeps running at an initial crazy pace.

Fixes: 800efb4c2857 ("drm/nouveau/drm/therm/fan: add a fallback if no fan control is specified in the vbios")
Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1103356
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107447
Reported-by: Thomas Blume <thomas.blume@suse.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

---
 drivers/gpu/drm/nouveau/nvkm/subdev/therm/base.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Comments

Ben Skeggs Sept. 14, 2018, 7:28 a.m. UTC | #1
On Wed, 12 Sep 2018 at 20:59, Takashi Iwai <tiwai@suse.de> wrote:
>
> When a fan is controlled via linear fallback without cstate, we
> shouldn't stop polling.  Otherwise it won't be adjusted again and
> keeps running at an initial crazy pace.
Martin,

Any thoughts on this?

Ben.

>
> Fixes: 800efb4c2857 ("drm/nouveau/drm/therm/fan: add a fallback if no fan control is specified in the vbios")
> Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1103356
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107447
> Reported-by: Thomas Blume <thomas.blume@suse.com>
> Signed-off-by: Takashi Iwai <tiwai@suse.de>
>
> ---
>  drivers/gpu/drm/nouveau/nvkm/subdev/therm/base.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/therm/base.c b/drivers/gpu/drm/nouveau/nvkm/subdev/therm/base.c
> index 3695cde669f8..07914e36939e 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/therm/base.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/therm/base.c
> @@ -132,11 +132,12 @@ nvkm_therm_update(struct nvkm_therm *therm, int mode)
>                         duty = nvkm_therm_update_linear(therm);
>                         break;
>                 case NVBIOS_THERM_FAN_OTHER:
> -                       if (therm->cstate)
> +                       if (therm->cstate) {
>                                 duty = therm->cstate;
> -                       else
> +                               poll = false;
> +                       } else {
>                                 duty = nvkm_therm_update_linear_fallback(therm);
> -                       poll = false;
> +                       }
>                         break;
>                 }
>                 immd = false;
> --
> 2.18.0
>
> _______________________________________________
> Nouveau mailing list
> Nouveau@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau
Martin Peres Sept. 14, 2018, 11:59 a.m. UTC | #2
On 14/09/2018 10:28, Ben Skeggs wrote:
> On Wed, 12 Sep 2018 at 20:59, Takashi Iwai <tiwai@suse.de> wrote:
>>
>> When a fan is controlled via linear fallback without cstate, we
>> shouldn't stop polling.  Otherwise it won't be adjusted again and
>> keeps running at an initial crazy pace.
> Martin,
> 
> Any thoughts on this?
> 
> Ben.

Wow, blast from the past!

Anyway, the analysis is pretty spot on here. When using the cstate-based
fan speed (change the speed of the fan based on what frequency is used),
then polling is unnecessary and this function should only be called when
changing the pstate.

However, in the absence of ANY information, we fallback to a
temperature-based management which requires constant polling, so the
patch is accurate and poll = false should only be set if we have a cstate.

So, the patch is Reviewed-by: Martin Peres <martin.peres@free.fr>

> 
>>
>> Fixes: 800efb4c2857 ("drm/nouveau/drm/therm/fan: add a fallback if no fan control is specified in the vbios")
>> Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1103356
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107447

I see that Thomas has been having issues with the noise level anyway. I
suggest he should bump the value of temp1_auto_point1_temp (see
https://www.kernel.org/doc/Documentation/thermal/nouveau_thermal).

The default value is set to 90°C which is quite safe on these old GPUs
(NVIDIA G71 / nv49). I would say that it is safe to go up to 110°C.
Which should reduce the noise level.

Another technique may be to reduce the minimum fan speed to something
lower than 30°C. It should increase the slope but reduce the noise level
at a given temperature.

One reason why these GPUs run so hot on nouveau is the lack of power and
clock gating. I am sorry that I never finished to reverse engineer these...

Anyway, thanks a lot for the patch!

>> Reported-by: Thomas Blume <thomas.blume@suse.com>
>> Signed-off-by: Takashi Iwai <tiwai@suse.de>
>>
>> ---
>>  drivers/gpu/drm/nouveau/nvkm/subdev/therm/base.c | 7 ++++---
>>  1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/therm/base.c b/drivers/gpu/drm/nouveau/nvkm/subdev/therm/base.c
>> index 3695cde669f8..07914e36939e 100644
>> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/therm/base.c
>> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/therm/base.c
>> @@ -132,11 +132,12 @@ nvkm_therm_update(struct nvkm_therm *therm, int mode)
>>                         duty = nvkm_therm_update_linear(therm);
>>                         break;
>>                 case NVBIOS_THERM_FAN_OTHER:
>> -                       if (therm->cstate)
>> +                       if (therm->cstate) {
>>                                 duty = therm->cstate;
>> -                       else
>> +                               poll = false;
>> +                       } else {
>>                                 duty = nvkm_therm_update_linear_fallback(therm);
>> -                       poll = false;
>> +                       }
>>                         break;
>>                 }
>>                 immd = false;
>> --
>> 2.18.0
>>
>> _______________________________________________
>> Nouveau mailing list
>> Nouveau@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/nouveau
Takashi Iwai Nov. 14, 2018, 4:01 p.m. UTC | #3
On Fri, 14 Sep 2018 13:59:25 +0200,
Martin Peres wrote:
> 
> On 14/09/2018 10:28, Ben Skeggs wrote:
> > On Wed, 12 Sep 2018 at 20:59, Takashi Iwai <tiwai@suse.de> wrote:
> >>
> >> When a fan is controlled via linear fallback without cstate, we
> >> shouldn't stop polling.  Otherwise it won't be adjusted again and
> >> keeps running at an initial crazy pace.
> > Martin,
> > 
> > Any thoughts on this?
> > 
> > Ben.
> 
> Wow, blast from the past!
> 
> Anyway, the analysis is pretty spot on here. When using the cstate-based
> fan speed (change the speed of the fan based on what frequency is used),
> then polling is unnecessary and this function should only be called when
> changing the pstate.
> 
> However, in the absence of ANY information, we fallback to a
> temperature-based management which requires constant polling, so the
> patch is accurate and poll = false should only be set if we have a cstate.
> 
> So, the patch is Reviewed-by: Martin Peres <martin.peres@free.fr>

Just a gentle reminder: this patch seems forgotten for 4.20 merge.
Could you guys pick it if it's OK?


Thanks!

Takashi

> 
> > 
> >>
> >> Fixes: 800efb4c2857 ("drm/nouveau/drm/therm/fan: add a fallback if no fan control is specified in the vbios")
> >> Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1103356
> >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107447
> 
> I see that Thomas has been having issues with the noise level anyway. I
> suggest he should bump the value of temp1_auto_point1_temp (see
> https://www.kernel.org/doc/Documentation/thermal/nouveau_thermal).
> 
> The default value is set to 90°C which is quite safe on these old GPUs
> (NVIDIA G71 / nv49). I would say that it is safe to go up to 110°C.
> Which should reduce the noise level.
> 
> Another technique may be to reduce the minimum fan speed to something
> lower than 30°C. It should increase the slope but reduce the noise level
> at a given temperature.
> 
> One reason why these GPUs run so hot on nouveau is the lack of power and
> clock gating. I am sorry that I never finished to reverse engineer these...
> 
> Anyway, thanks a lot for the patch!
> 
> >> Reported-by: Thomas Blume <thomas.blume@suse.com>
> >> Signed-off-by: Takashi Iwai <tiwai@suse.de>
> >>
> >> ---
> >>  drivers/gpu/drm/nouveau/nvkm/subdev/therm/base.c | 7 ++++---
> >>  1 file changed, 4 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/therm/base.c b/drivers/gpu/drm/nouveau/nvkm/subdev/therm/base.c
> >> index 3695cde669f8..07914e36939e 100644
> >> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/therm/base.c
> >> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/therm/base.c
> >> @@ -132,11 +132,12 @@ nvkm_therm_update(struct nvkm_therm *therm, int mode)
> >>                         duty = nvkm_therm_update_linear(therm);
> >>                         break;
> >>                 case NVBIOS_THERM_FAN_OTHER:
> >> -                       if (therm->cstate)
> >> +                       if (therm->cstate) {
> >>                                 duty = therm->cstate;
> >> -                       else
> >> +                               poll = false;
> >> +                       } else {
> >>                                 duty = nvkm_therm_update_linear_fallback(therm);
> >> -                       poll = false;
> >> +                       }
> >>                         break;
> >>                 }
> >>                 immd = false;
> >> --
> >> 2.18.0
> >>
> >> _______________________________________________
> >> Nouveau mailing list
> >> Nouveau@lists.freedesktop.org
> >> https://lists.freedesktop.org/mailman/listinfo/nouveau
>
Ilia Mirkin Dec. 30, 2018, 8:42 a.m. UTC | #4
Ben - ping? Just ran into this myself on a NV42.

On Wed, Nov 14, 2018 at 11:01 AM Takashi Iwai <tiwai@suse.de> wrote:
>
> On Fri, 14 Sep 2018 13:59:25 +0200,
> Martin Peres wrote:
> >
> > On 14/09/2018 10:28, Ben Skeggs wrote:
> > > On Wed, 12 Sep 2018 at 20:59, Takashi Iwai <tiwai@suse.de> wrote:
> > >>
> > >> When a fan is controlled via linear fallback without cstate, we
> > >> shouldn't stop polling.  Otherwise it won't be adjusted again and
> > >> keeps running at an initial crazy pace.
> > > Martin,
> > >
> > > Any thoughts on this?
> > >
> > > Ben.
> >
> > Wow, blast from the past!
> >
> > Anyway, the analysis is pretty spot on here. When using the cstate-based
> > fan speed (change the speed of the fan based on what frequency is used),
> > then polling is unnecessary and this function should only be called when
> > changing the pstate.
> >
> > However, in the absence of ANY information, we fallback to a
> > temperature-based management which requires constant polling, so the
> > patch is accurate and poll = false should only be set if we have a cstate.
> >
> > So, the patch is Reviewed-by: Martin Peres <martin.peres@free.fr>
>
> Just a gentle reminder: this patch seems forgotten for 4.20 merge.
> Could you guys pick it if it's OK?
>
>
> Thanks!
>
> Takashi
>
> >
> > >
> > >>
> > >> Fixes: 800efb4c2857 ("drm/nouveau/drm/therm/fan: add a fallback if no fan control is specified in the vbios")
> > >> Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1103356
> > >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107447
> >
> > I see that Thomas has been having issues with the noise level anyway. I
> > suggest he should bump the value of temp1_auto_point1_temp (see
> > https://www.kernel.org/doc/Documentation/thermal/nouveau_thermal).
> >
> > The default value is set to 90°C which is quite safe on these old GPUs
> > (NVIDIA G71 / nv49). I would say that it is safe to go up to 110°C.
> > Which should reduce the noise level.
> >
> > Another technique may be to reduce the minimum fan speed to something
> > lower than 30°C. It should increase the slope but reduce the noise level
> > at a given temperature.
> >
> > One reason why these GPUs run so hot on nouveau is the lack of power and
> > clock gating. I am sorry that I never finished to reverse engineer these...
> >
> > Anyway, thanks a lot for the patch!
> >
> > >> Reported-by: Thomas Blume <thomas.blume@suse.com>
> > >> Signed-off-by: Takashi Iwai <tiwai@suse.de>
> > >>
> > >> ---
> > >>  drivers/gpu/drm/nouveau/nvkm/subdev/therm/base.c | 7 ++++---
> > >>  1 file changed, 4 insertions(+), 3 deletions(-)
> > >>
> > >> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/therm/base.c b/drivers/gpu/drm/nouveau/nvkm/subdev/therm/base.c
> > >> index 3695cde669f8..07914e36939e 100644
> > >> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/therm/base.c
> > >> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/therm/base.c
> > >> @@ -132,11 +132,12 @@ nvkm_therm_update(struct nvkm_therm *therm, int mode)
> > >>                         duty = nvkm_therm_update_linear(therm);
> > >>                         break;
> > >>                 case NVBIOS_THERM_FAN_OTHER:
> > >> -                       if (therm->cstate)
> > >> +                       if (therm->cstate) {
> > >>                                 duty = therm->cstate;
> > >> -                       else
> > >> +                               poll = false;
> > >> +                       } else {
> > >>                                 duty = nvkm_therm_update_linear_fallback(therm);
> > >> -                       poll = false;
> > >> +                       }
> > >>                         break;
> > >>                 }
> > >>                 immd = false;
> > >> --
> > >> 2.18.0
> > >>
> > >> _______________________________________________
> > >> Nouveau mailing list
> > >> Nouveau@lists.freedesktop.org
> > >> https://lists.freedesktop.org/mailman/listinfo/nouveau
> >
> _______________________________________________
> Nouveau mailing list
> Nouveau@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau
diff mbox series

Patch

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/therm/base.c b/drivers/gpu/drm/nouveau/nvkm/subdev/therm/base.c
index 3695cde669f8..07914e36939e 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/therm/base.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/therm/base.c
@@ -132,11 +132,12 @@  nvkm_therm_update(struct nvkm_therm *therm, int mode)
 			duty = nvkm_therm_update_linear(therm);
 			break;
 		case NVBIOS_THERM_FAN_OTHER:
-			if (therm->cstate)
+			if (therm->cstate) {
 				duty = therm->cstate;
-			else
+				poll = false;
+			} else {
 				duty = nvkm_therm_update_linear_fallback(therm);
-			poll = false;
+			}
 			break;
 		}
 		immd = false;