mbox series

[v2,0/5] acpi/x86: s2idle: move Display off/on calls outside suspend (fixes ROG Ally suspend)

Message ID 20240922172258.48435-1-lkml@antheas.dev (mailing list archive)
Headers show
Series acpi/x86: s2idle: move Display off/on calls outside suspend (fixes ROG Ally suspend) | expand

Message

Antheas Kapenekakis Sept. 22, 2024, 5:22 p.m. UTC
The following series moves the Display off/on calls outside of the suspend
sequence, as they are performed in Windows. This fixes certain issues that appear
in devices that use the calls and expect the kernel to be active during their
call (especially in the case of the ROG Ally devices) and opens the possibility
of implementing a "Screen Off" state in the future (which mirrors Windows).
In addition, it adds a quirk table that will allow for adding delays between
Modern Standby transitions, to help resolve racing conditions.

This series requires a bit of background on how modern standby works in Windows.
Fundamentally, it is composed of four states: "Active", "Screen Off", "Sleep",
and "DRIPS". Here, I take the liberty of naming the state "Active", as it is
implied in Windows documentation.

When the user actively interacts with a device, it is in the "Active" state.
The screen is on, all devices are connected, and desired software is running.
The other 3 stages play a role once the user stops interacting with the device.
This is triggered in two main ways: either by pressing the power button or by
inactivity. Once either of those targets is met, the system enters Modern Standby.

Modern Standby consists of an orchestration of the "Screen Off", "Sleep", and
"DRIPS" states. Windows is free to move throughout these states until the user
interacts with the device again, where the device will transition to being
"Active". Moving between the states implies a transition, where Windows performs
a set of actions. In addition, Windows can only move between adjacent states
as follows:

"Active" <-> "Screen Off" <-> "Sleep" <-> "DRIPS"

"Screen Off" is the state where all active displays in the device (whether
*virtual* or real; this means unrelated to DRM) are off. The user might still
be interacting with the device or running programs (e.g., compiling a kernel).

"Sleep" is a newer state, in which the device turns off its fan and pulses its
power button, but still supports running software activities. As part of this,
and to avoid overheating the device a lot of manufacturers lower the TDP (PLx)
of the device [3; _DSM 9 description].

Finally, DRIPS stands for Deepest Runtime Idle Power State, i.e. suspend.

While Windows may transition from any state to any state, doing so implies
performing all transitions to reach that state. All states other than DRIPS
have a fully active kernel (Wifi, USB at least) and allow userspace activity.
What changes is the extent of the activity, and whether some power consuming
devices are turned off (this is done with Modern Standby aware drivers and
firmware notifications). The Windows kernel suspends during the transition from
the "Sleep" state to the "DRIPS" state. In all other states it is active.

After finishing each transition, the kernel performs a firmware notification,
described in the _DSM document [3]. Moving from left to right with function num.,
we have Display Off (3; Active -> Screen Off), Sleep Entry (7; Screen Off -> Sleep),
and Lowest Power State Entry Notification (5; LPSEN; Sleep -> DRIPS). Then, from
right to left, we have Lowest Power State Exit Notification (6; DRIPS -> Sleep),
Sleep Exit (8; Sleep -> Screen) and Display On (4; Screen Off -> Active).

The Linux kernel is not currently Modern Standby aware but will still make these
calls. Currently, the problem is that the kernel calls all of the firmware
notifications at the point LPSEN (5, 6) should be called, which is when the
kernel is mostly suspended. This is a clear deviation from Windows, and causes
undesirable behavior in certain devices, the main one focused in this patch
series being the ROG Ally. Although series patch is aimed at Modern Standby
devices in general.

The ROG Ally is a Modern Standby capable device (uses Secure Core too; really
ticks all the MS boxes) and in it, there are issues with both Display 3,4
calls and Sleep 7,8 calls cause issues (7,8 are suspected and todo).

The Display 3,4 calls are responsible for the controller. The Display Off call
disconnects (if powersave is off) or powers off (if powersave is on and on DC
power) the MCU(s) responsible for the controller and deactivates the RGB of the
device. Display On powers on or reconnects the controller respectively.
This controller, in the Ally X, is composed of 6 HID devices that register to
the kernel. From testing, it seems that the majority of the problem in the Ally
comes from Display Off being called way too late timewise, and Display

The Sleep 7,8 calls, in general, are responsible for setting a low power state
that is safe to use while the device is sleeping, making the suspend light
pulse, and turning off the fan. Due to a variety of race conditions, there is
a rare occasion where the Ally EC can get stuck in its Sleep mode, where the
TDP is 5W, and prevent increasing it until the device reboots. The sleep entries
contain actions in the Ally, so there is a suspicion that calling them during
DRIPS is causing issues. However, this is not the subject of this patch and
has not been verified yet.

This patch centers around moving the Display 3,4 calls outside the suspend
sequence (which is the transition from Sleep to DRIPS in Modern Standby terms),
and by implementing the proper locks necessary, opening up the possibility of
making these calls as part of a more elaborate "Modern Standby"-like userspace
suspend/wakelock implementation. As of this patch, they are only called before
the suspend sequence, including with the possibility of adding a delay.

This makes the intent of this patch primarily compatibility focused, as it aims
to fix issues by the current implementation. And to that end it works.
After moving the calls outside of the suspend sequence, my ROG Ally X test unit
can suspend more than 50 times without rebooting, both with powersave on or off,
regardless of whether it is plugged/unplugged during suspend, and still have the
controller work with full reliability. In V1, there was an unsolved race condition
that was dealt by (5) before Display Off triggers. Essentially, Linux suspends
too fast for the current version of the firmware to deal with. After adding a
quirk table, which delays suspend after the Display Off call, the controller
of the original Ally should power off properly (a lot of testing will be done).

Moving the calls outside of the suspend sequence (and the validation work it
implies) is an important first step in including "Modern Standby"-like
features in Linux. For example, consider an endpoint /sys/power/standby, that
allows for entering "active", "inactive" (for Screen Off; since the name causes
too much confusion), "sleep" values. Those values will then in turn call the
respective firmware notifications (and driver callbacks in the very future)
for all transitions required to reach the entered state. Here, the value
"suspend" (for DRIPS; another confusing name as it can refer to drivers) is
missing, as userspace will never be able to see it. The kernel should support
suspending at all standby states, orchestrating the required transitions to
reach suspend/DRIPS and after suspend returning to the last state.

Therefore, if userspace is not standby aware, the kernel will work the same way
it works today. In addition, depending on hardware generation, certain power
states might not be supported. It is important to inform userspace of this, as
if the hardware does not support sleep, and userspace holds a wakelock for sleep,
it will just overheat and drain the device battery.

This series is worth backing this up with sources, so as part of it I reference
Microsoft's documentation on Modern standby [1-3] that explains the whole
process, including a document by Dell [7] and how to prepare for them and attach a
repository of more than 15 device DSDT tables [4] from different manufacturers.
This repository also contains instructions on how to decode the DSDT tables on
a test laptop, to figure out what the _DSM calls will do on that device.

Moreover, I conduct a short behavioral test in Windows with the Ally X to showcase
the documentation empirically. The Ally is great for such a test, as it contains
visual indicators for all Microsoft suspend points: "Display Off/On" calls are
indicated with the Controller RGB turning off/on, "Screen Off" is indicated with
the suspend light and fan being on, and Sleep is indicated with the suspend
light blinking.

Unfortunately, as part of this testing, I never found how to see if the device
is actually suspended. As the ROG Ally X NOOPs on firmware notifications 5,6,
and even though I disabled a Mouse from waking up a device, it still would wake
up my Ally X dev unit.

Referencing Microsoft's documentation, "Screen Off" is entered either through
inactivity or by pressing the power button, so I conduct two tests: one by pressing
the powerbutton, and one for entering Screen Off due to inactivity.

1) Powerbutton test:
When pressing the powerbutton, the screen of the Ally turns off, and the RGB of
the controller faints to off within 1s. Following, depending on whether the
system is plugged in, the power light and fan stay on for 5 seconds to 10 minutes.
After this point, the power light begins to blink and the fan turns off, showing
that the system has entered the "Sleep" state.

2) Inactivity test:
I set the Windows power settings to turn off the screen after 1 minute and wait.
After one minute, the display turns off, and after 5 seconds, the controller RGB
turns off. This indicates to me that "Screen Off" is not defined by the screen
being off, but is rather characterized by it. During those 5 seconds while the
RGB is on, I can use the controller to wake up the device. Afterwards it cannot.

Those tests validate Microsoft's documentation and show that "Screen Off"
seems to more closely correlate to lockscreen behavior (button locks instantly,
inactivity after 5 seconds) than the screen being off. One other behavior I
notice is that, as I look at my Ally X dev right now, with its screen off, I
notice the RGB is still on, which is kind of bothersome, as in Windows the
device would turn the RGB off. Whether as a side effect or planned, it is still
a nice touch.

This patch series is developed with help from Mario Limonciello, and, to be
bisection friendly, is structured based on a patch series he made connecting the
callbacks to the drm subsystem suspend [5]. It also references (already)
upstream work by Luke Jones on Asus-wmi for the Ally controller quirk that is
removed on patch (5) and an issue on amd-drm in 2023 in preparation for the
work in that quirk [6]. Since patch (3) now uses part of the dmi table removed
in patch (5) and adds a (small) delay, @Luke I can add you as Suggested-by.

We will begin testing on the patch series, and there will probably be a V3,
where testing acknowledgements are added. V2 patch adds a delay to display_off
(500ms), where 300-1300ms were tried, and there was no behavioral difference on
the Ally X. However, that is arbitrary so it warrants a lot of testing.
Current status is that my Ally X unit works perfectly other than a little quirk:
with powersave on, if asus_hid or a userspace program talks to it within
2 seconds, it causes the RGB to softfade to off and then on. This is a cosmetic
issue that can be dealt with by userspace (waiting 2s) or a firmware update
or both. Windows did not seem to fare much better either in that regard, with
RGB turning on and off randomly. Original Ally still needs to be verified.

I am personally going to take a bit of a breather on this patch, test it, and
revisit it next week. I send it today so I get comments on the revision.

Link: https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/prepare-hardware-for-modern-standby [1]
Link: https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/prepare-software-for-modern-standby [2]
Link: https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/modern-standby-firmware-notifications [3]
Link: https://github.com/hhd-dev/hwinfo/tree/master/devices [4]
Link: https://git.kernel.org/pub/scm/linux/kernel/git/superm1/linux.git/log/?h=superm1/dsm-screen-on-off [5]
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2719 [6]
Link: https://dl.dell.com/manuals/all-products/esuprt_solutions_int/esuprt_solutions_int_solutions_resources/client-mobile-solution-resources_white-papers45_en-us.pdf [7]

Changes in v2:
 - Rewrote cover letter to better reflect the Windows Modern Standby sequence
 - Renamed the "screen_" callbacks to "display_" to match Microsoft's naming
 - Added attribution to Mario Limonciello and changed the text to reflect that
 - Made the screen on/off callbacks warn and bail with -EINVAL if they are
   called in the wrong order (currently impossible)
    - Changed patch 2 to not bail the suspend sequence when receiving an error
      (as these calls are not part of the suspend sequence and failing the
      suspend sequence would cause a user visible fault)
 - Fixed bug reported by Denis Benato by adding a quirk table in patch (3)
    - The ROG controllers get a slight delay after the Display Off call
    - This delay fixes a race condition with the controller disconnecting from
      the system before being powered off
    - Reworded patches to reflect that
 - Moved the display calls a bit higher up the suspend sequence in patch (2)

Antheas Kapenekakis (5):
  acpi/x86: s2idle: add support for Display Off and Display On callbacks
  acpi/x86: s2idle: handle Display On/Off calls outside of suspend
    sequence
  acpi/x86: s2idle: add quirk table for modern standby delays
  acpi/x86: s2idle: call Display On/Off as part of callbacks and rename
  platform/x86: asus-wmi: remove Ally (1st gen) and Ally X suspend quirk

 drivers/acpi/x86/s2idle.c       | 89 +++++++++++++++++++++++----------
 drivers/platform/x86/asus-wmi.c | 54 --------------------
 include/linux/suspend.h         | 10 ++++
 kernel/power/suspend.c          | 62 +++++++++++++++++++++++
 4 files changed, 134 insertions(+), 81 deletions(-)

Comments

Hans de Goede Oct. 5, 2024, 2:21 p.m. UTC | #1
Hi,

On 22-Sep-24 7:22 PM, Antheas Kapenekakis wrote:
> The following series moves the Display off/on calls outside of the suspend
> sequence, as they are performed in Windows. This fixes certain issues that appear
> in devices that use the calls and expect the kernel to be active during their
> call (especially in the case of the ROG Ally devices) and opens the possibility
> of implementing a "Screen Off" state in the future (which mirrors Windows).
> In addition, it adds a quirk table that will allow for adding delays between
> Modern Standby transitions, to help resolve racing conditions.
> 
> This series requires a bit of background on how modern standby works in Windows.
> Fundamentally, it is composed of four states: "Active", "Screen Off", "Sleep",
> and "DRIPS". Here, I take the liberty of naming the state "Active", as it is
> implied in Windows documentation.
> 
> When the user actively interacts with a device, it is in the "Active" state.
> The screen is on, all devices are connected, and desired software is running.
> The other 3 stages play a role once the user stops interacting with the device.
> This is triggered in two main ways: either by pressing the power button or by
> inactivity. Once either of those targets is met, the system enters Modern Standby.
> 
> Modern Standby consists of an orchestration of the "Screen Off", "Sleep", and
> "DRIPS" states. Windows is free to move throughout these states until the user
> interacts with the device again, where the device will transition to being
> "Active". Moving between the states implies a transition, where Windows performs
> a set of actions. In addition, Windows can only move between adjacent states
> as follows:
> 
> "Active" <-> "Screen Off" <-> "Sleep" <-> "DRIPS"
> 
> "Screen Off" is the state where all active displays in the device (whether
> *virtual* or real; this means unrelated to DRM) are off. The user might still
> be interacting with the device or running programs (e.g., compiling a kernel).
> 
> "Sleep" is a newer state, in which the device turns off its fan and pulses its
> power button, but still supports running software activities. As part of this,
> and to avoid overheating the device a lot of manufacturers lower the TDP (PLx)
> of the device [3; _DSM 9 description].
> 
> Finally, DRIPS stands for Deepest Runtime Idle Power State, i.e. suspend.
> 
> While Windows may transition from any state to any state, doing so implies
> performing all transitions to reach that state. All states other than DRIPS
> have a fully active kernel (Wifi, USB at least) and allow userspace activity.
> What changes is the extent of the activity, and whether some power consuming
> devices are turned off (this is done with Modern Standby aware drivers and
> firmware notifications). The Windows kernel suspends during the transition from
> the "Sleep" state to the "DRIPS" state. In all other states it is active.
> 
> After finishing each transition, the kernel performs a firmware notification,
> described in the _DSM document [3]. Moving from left to right with function num.,
> we have Display Off (3; Active -> Screen Off), Sleep Entry (7; Screen Off -> Sleep),
> and Lowest Power State Entry Notification (5; LPSEN; Sleep -> DRIPS). Then, from
> right to left, we have Lowest Power State Exit Notification (6; DRIPS -> Sleep),
> Sleep Exit (8; Sleep -> Screen) and Display On (4; Screen Off -> Active).
> 
> The Linux kernel is not currently Modern Standby aware but will still make these
> calls. Currently, the problem is that the kernel calls all of the firmware
> notifications at the point LPSEN (5, 6) should be called, which is when the
> kernel is mostly suspended. This is a clear deviation from Windows, and causes
> undesirable behavior in certain devices, the main one focused in this patch
> series being the ROG Ally. Although series patch is aimed at Modern Standby
> devices in general.
> 
> The ROG Ally is a Modern Standby capable device (uses Secure Core too; really
> ticks all the MS boxes) and in it, there are issues with both Display 3,4
> calls and Sleep 7,8 calls cause issues (7,8 are suspected and todo).
> 
> The Display 3,4 calls are responsible for the controller. The Display Off call
> disconnects (if powersave is off) or powers off (if powersave is on and on DC
> power) the MCU(s) responsible for the controller and deactivates the RGB of the
> device. Display On powers on or reconnects the controller respectively.
> This controller, in the Ally X, is composed of 6 HID devices that register to
> the kernel. From testing, it seems that the majority of the problem in the Ally
> comes from Display Off being called way too late timewise, and Display
> 
> The Sleep 7,8 calls, in general, are responsible for setting a low power state
> that is safe to use while the device is sleeping, making the suspend light
> pulse, and turning off the fan. Due to a variety of race conditions, there is
> a rare occasion where the Ally EC can get stuck in its Sleep mode, where the
> TDP is 5W, and prevent increasing it until the device reboots. The sleep entries
> contain actions in the Ally, so there is a suspicion that calling them during
> DRIPS is causing issues. However, this is not the subject of this patch and
> has not been verified yet.
> 
> This patch centers around moving the Display 3,4 calls outside the suspend
> sequence (which is the transition from Sleep to DRIPS in Modern Standby terms),
> and by implementing the proper locks necessary, opening up the possibility of
> making these calls as part of a more elaborate "Modern Standby"-like userspace
> suspend/wakelock implementation. As of this patch, they are only called before
> the suspend sequence, including with the possibility of adding a delay.
> 
> This makes the intent of this patch primarily compatibility focused, as it aims
> to fix issues by the current implementation. And to that end it works.
> After moving the calls outside of the suspend sequence, my ROG Ally X test unit
> can suspend more than 50 times without rebooting, both with powersave on or off,
> regardless of whether it is plugged/unplugged during suspend, and still have the
> controller work with full reliability. In V1, there was an unsolved race condition
> that was dealt by (5) before Display Off triggers. Essentially, Linux suspends
> too fast for the current version of the firmware to deal with. After adding a
> quirk table, which delays suspend after the Display Off call, the controller
> of the original Ally should power off properly (a lot of testing will be done).

Thank you for your work on this and thank you for the comprehensive write-up
on how Windows does modern standby.

First of all may I suggest that you take the above write-up, minus the ROG
Ally specific bits and turn this into a new documentation file under
Documentation/power ?  And also document at which point Linux currently
makes the various transitions.

And then in patches where you move the transitions, also update the docs
on what Linux does to match.

I have read the discussion about tying the display on/off calls to CRTC state
and/or exposing a userspace knob for that. I think that this needs more
discussion / design work.

OTOH IMHO it would be good to take patches 1 - 3 . Certainly 1 + 2 would
be good to have. 3 is a bit unfortunate and not necessary with the current
special ROG Ally handling in the asus-wmi driver. It might be better to
just keep the quirks there.

IMHO it would be good to submit a v2 of just patches 1 - 3 run through
checkpatch. Also the commit message of patch 3 should point to the existing
quirk code in asus-wmi.c and mention that then is no longer necessary after
patch 3, then we can discuss what is the best place for these quirks.

Rafael, what do you think about at least taking patches 1 - 3 upstream?
Reading through how Windows handles things making the display on/off
calls before suspending devices sounds like it is the right thing to do
to me.

Regards,

Hans
Antheas Kapenekakis Oct. 5, 2024, 3:10 p.m. UTC | #2
Hi Hans,

> Thank you for your work on this and thank you for the comprehensive write-up
> on how Windows does modern standby.
>
> First of all may I suggest that you take the above write-up, minus the ROG
> Ally specific bits and turn this into a new documentation file under
> Documentation/power ?  And also document at which point Linux currently
> makes the various transitions.

I will try to move some of that documentation there, this is a great idea.

> And then in patches where you move the transitions, also update the docs
> on what Linux does to match.
>
> I have read the discussion about tying the display on/off calls to CRTC state
> and/or exposing a userspace knob for that. I think that this needs more
> discussion / design work.

Yes, you are right. To become a knob this would require a much bigger
discussion. I would also like to move Sleep calls as part of that. The
Legion Go and OneXPlayer devices turn off their controllers as part of
that and other modern standby devices limit their power envelope
(perhaps the Legion Go too). I think the Sleep call is where most of
the userspace usability will come from. Display On/Off is a bit of a
NOOP on most devices.

As for the LSB0 enter and exit, I do not know where the correct place
for those would be, and perhaps someone from Microsoft needs to be
consulted on that. The documentation is very vague. However it is
clear to me that they should be close to where they are right now, so
they very likely do not need to move.

There is also the new _DSM intent to turn display on 9 call. Which
meshes with the sleep call. That call is made before Sleep Exit, if
the kernel knows that the wake-up will cause the display to turn on,
to boost the thermal envelope of the device and help it wake up
quicker. If the Sleep call is moved then we would also have to
introduce that somewhere to avoid wake-up time regressions on devices
that support it, which also raises the question of how would the
kernel decide if an interrupt will cause the display to turn on before
unfreezing userspace (bulk of resume) (or should it be done after
unfreezing?).

> OTOH IMHO it would be good to take patches 1 - 3 . Certainly 1 + 2 would
> be good to have. 3 is a bit unfortunate and not necessary with the current
> special ROG Ally handling in the asus-wmi driver. It might be better to
> just keep the quirks there.

From what I know Luke plans to remove that quirk ASAP due to new
firmware. I would keep it around until this patch series merges
personally and remove it as part of that. As it will probably cause
regressions if both are in place and require manual intervention if
either is not. I will also note that the quirk in asus-wmi calls the
Display On/Off calls a second time and during the suspend sequence,
which is not in any way proper. So if future devices need this kind of
quirk, it really does not seem like a good idea to me to paper over
their problems by calling the notifications a second time in random
platform drivers. There is the question of where that quirk should be
placed, that is true, but I IMO it should be a pm problem.

Perhaps not in the location where I put it though and perhaps it
should be done with LSB0 callbacks instead. Although, being done this
way allows for it to blend with the suspend sequence. Ideally, the
Display Off delay would be blended with userspace going down such that
if e.g., there is heavy userspace activity that requires ~2s to
freeze, the quirk would add no delay. Instead, it would only add delay
if userspace freezes quickly (less than .5s). Same can be said with
Sleep Entry and beginning prepare_late, which blocks the EC interrupts
(that would need a lot of investigation though).

On that note, it seems to me that the Ally has 2 bugs related to the
_DSM calls 3 and 4. First bug is that Display On is gated on current
firmware and only works when the USB subsystem is powered on.
Allegedly, this is fixed on the upcoming firmware but it is not
something I have verified personally. I will verify it in 10 days or
so, if the new firmware does not fail QA I guess.

However, there is a second bug with Display Off in _DSM 4. The
controller of the Ally needs time to power off, around 500ms.
Otherwise it gets its power clipped and/or does not power off
correctly. This causes the issues mentioned in the discussion and I
have no indication that this is fixed with newer controller firmware.
It is also my understanding that most of the testing of the new
firmware happened with the asus-wmi quirk in place, which papers over
that issue, so removing the quirk might be premature in any case.

We have currently released this patch series in Bazzite and I am happy
to report that it completely fixes all controller related issues in
the Ally devices and makes them behave exactly as they do in Windows,
regardless of firmware and bug for bug.

So we will be keeping it around and extending it as appropriate to
include the Sleep calls. I am reminded multiple times per week that
the Ally has TDP suspend bugs, where if the user is playing a heavy
game, the EC of the device tends to get stuck at 6W and fail to
respond after waking the device. So moving calls 7, 8 is the natural
next step in this investigation. I already have a draft patch on
standby, that we plan to AB test soon.

> IMHO it would be good to submit a v2 of just patches 1 - 3 run through
> checkpatch. Also the commit message of patch 3 should point to the existing
> quirk code in asus-wmi.c and mention that then is no longer necessary after
> patch 3, then we can discuss what is the best place for these quirks.

I did run it through before sending the patch. However, some of the
warnings were a bit cryptic to me... I will run it again.

I will add a note for asus-wmi on future patch series.

First 3 patches of the series are designed to NOOP before patch 4. Did
you mean patch 3 (which adds the delay) instead of 4?

> Rafael, what do you think about at least taking patches 1 - 3 upstream?
> Reading through how Windows handles things making the display on/off
> calls before suspending devices sounds like it is the right thing to do
> to me.

Antheas
Hans de Goede Oct. 5, 2024, 4:24 p.m. UTC | #3
Hi Antheas,

On 5-Oct-24 5:10 PM, Antheas Kapenekakis wrote:
> Hi Hans,
> 
>> Thank you for your work on this and thank you for the comprehensive write-up
>> on how Windows does modern standby.
>>
>> First of all may I suggest that you take the above write-up, minus the ROG
>> Ally specific bits and turn this into a new documentation file under
>> Documentation/power ?  And also document at which point Linux currently
>> makes the various transitions.
> 
> I will try to move some of that documentation there, this is a great idea.
> 
>> And then in patches where you move the transitions, also update the docs
>> on what Linux does to match.
>>
>> I have read the discussion about tying the display on/off calls to CRTC state
>> and/or exposing a userspace knob for that. I think that this needs more
>> discussion / design work.
> 
> Yes, you are right. To become a knob this would require a much bigger
> discussion. I would also like to move Sleep calls as part of that. The
> Legion Go and OneXPlayer devices turn off their controllers as part of
> that and other modern standby devices limit their power envelope
> (perhaps the Legion Go too). I think the Sleep call is where most of
> the userspace usability will come from. Display On/Off is a bit of a
> NOOP on most devices.
> 
> As for the LSB0 enter and exit, I do not know where the correct place
> for those would be, and perhaps someone from Microsoft needs to be
> consulted on that. The documentation is very vague. However it is
> clear to me that they should be close to where they are right now, so
> they very likely do not need to move.
> 
> There is also the new _DSM intent to turn display on 9 call. Which
> meshes with the sleep call. That call is made before Sleep Exit, if
> the kernel knows that the wake-up will cause the display to turn on,
> to boost the thermal envelope of the device and help it wake up
> quicker. If the Sleep call is moved then we would also have to
> introduce that somewhere to avoid wake-up time regressions on devices
> that support it, which also raises the question of how would the
> kernel decide if an interrupt will cause the display to turn on before
> unfreezing userspace (bulk of resume) (or should it be done after
> unfreezing?).
> 
>> OTOH IMHO it would be good to take patches 1 - 3 . Certainly 1 + 2 would
>> be good to have. 3 is a bit unfortunate and not necessary with the current
>> special ROG Ally handling in the asus-wmi driver. It might be better to
>> just keep the quirks there.
> 
> From what I know Luke plans to remove that quirk ASAP due to new
> firmware. I would keep it around until this patch series merges
> personally and remove it as part of that.

Ack I replied to Luke to say something like that.

> As it will probably cause regressions if both are in place

I don't see how having both this patch-sets + the asus-wmi
quirks will cause regressions?  The suspend delay will be done
twice, but that is harmless. 

> and require manual intervention if
> either is not. I will also note that the quirk in asus-wmi calls the
> Display On/Off calls a second time and during the suspend sequence,
> which is not in any way proper.

AFAICT asus-wmi does not call the display on / off callbacks instead
it directly calls "\\_SB.PCI0.SBRG.EC0.CSEE" to control the power ?

> So if future devices need this kind of
> quirk, it really does not seem like a good idea to me to paper over
> their problems by calling the notifications a second time in random
> platform drivers. There is the question of where that quirk should be
> placed, that is true, but I IMO it should be a pm problem.
> 
> Perhaps not in the location where I put it though and perhaps it
> should be done with LSB0 callbacks instead. Although, being done this
> way allows for it to blend with the suspend sequence. Ideally, the
> Display Off delay would be blended with userspace going down such that
> if e.g., there is heavy userspace activity that requires ~2s to
> freeze, the quirk would add no delay. Instead, it would only add delay
> if userspace freezes quickly (less than .5s). Same can be said with
> Sleep Entry and beginning prepare_late, which blocks the EC interrupts
> (that would need a lot of investigation though).
> 
> On that note, it seems to me that the Ally has 2 bugs related to the
> _DSM calls 3 and 4. First bug is that Display On is gated on current
> firmware and only works when the USB subsystem is powered on.
> Allegedly, this is fixed on the upcoming firmware but it is not
> something I have verified personally. I will verify it in 10 days or
> so, if the new firmware does not fail QA I guess.
> 
> However, there is a second bug with Display Off in _DSM 4. The
> controller of the Ally needs time to power off, around 500ms.
> Otherwise it gets its power clipped and/or does not power off
> correctly. This causes the issues mentioned in the discussion and I
> have no indication that this is fixed with newer controller firmware.
> It is also my understanding that most of the testing of the new
> firmware happened with the asus-wmi quirk in place, which papers over
> that issue, so removing the quirk might be premature in any case.

Ok.

> We have currently released this patch series in Bazzite and I am happy
> to report that it completely fixes all controller related issues in
> the Ally devices and makes them behave exactly as they do in Windows,
> regardless of firmware and bug for bug.
> 
> So we will be keeping it around and extending it as appropriate to
> include the Sleep calls. I am reminded multiple times per week that
> the Ally has TDP suspend bugs, where if the user is playing a heavy
> game, the EC of the device tends to get stuck at 6W and fail to
> respond after waking the device. So moving calls 7, 8 is the natural
> next step in this investigation. I already have a draft patch on
> standby, that we plan to AB test soon.
> 
>> IMHO it would be good to submit a v2 of just patches 1 - 3 run through
>> checkpatch. Also the commit message of patch 3 should point to the existing
>> quirk code in asus-wmi.c and mention that then is no longer necessary after
>> patch 3, then we can discuss what is the best place for these quirks.
> 
> I did run it through before sending the patch. However, some of the
> warnings were a bit cryptic to me... I will run it again.
> 
> I will add a note for asus-wmi on future patch series.
> 
> First 3 patches of the series are designed to NOOP before patch 4. Did
> you mean patch 3 (which adds the delay) instead of 4?

Ah I misread the series and failed to notice that patch 4 actually hooks
things up, I was under the impression that patch 4 hooks things up.

But I did mean that patch 3 might lead to discussion not patch 4.

Now that I have taken a better look some more detailed review comments:

* Patches 1 and 2 should be squashed into a single patch IMHO.

* Patch 3 adds the quirks to kernel/power/suspend.c but this
really should be added to drivers/acpi/x86/s2idle.c and then do
the sleep as part of the display_off callback.

* Given my comment on patch 3 I would swap the order of patch 3 and 4
  and only add the quirks after moving the display off ACPI call to
  the new callback

* Patch 4 does too much in a single patch, specifically besides
actually implementing the new callbacks it also does s/SCREEN/DISPLAY
on all the ACPI_LPS0_* defines. This renaming of the defines must
be done in a separate patch.

Regards,

Hans
Hans de Goede Oct. 5, 2024, 4:27 p.m. UTC | #4
p.s.

On 5-Oct-24 6:24 PM, Hans de Goede wrote:
> Hi Antheas,
> 
> On 5-Oct-24 5:10 PM, Antheas Kapenekakis wrote:
>> Hi Hans,
>>

<snip>

>>> IMHO it would be good to submit a v2 of just patches 1 - 3 run through
>>> checkpatch. Also the commit message of patch 3 should point to the existing
>>> quirk code in asus-wmi.c and mention that then is no longer necessary after
>>> patch 3, then we can discuss what is the best place for these quirks.
>>
>> I did run it through before sending the patch. However, some of the
>> warnings were a bit cryptic to me... I will run it again.
>>
>> I will add a note for asus-wmi on future patch series.
>>
>> First 3 patches of the series are designed to NOOP before patch 4. Did
>> you mean patch 3 (which adds the delay) instead of 4?
> 
> Ah I misread the series and failed to notice that patch 4 actually hooks
> things up, I was under the impression that patch 4 hooks things up.
> 
> But I did mean that patch 3 might lead to discussion not patch 4.

Oh and upon re-reading the series I see that pathc 5 is just dropping
the quirks from asus-wmi.c, which is fine.

I somehow thought that the later patches where adding a way for userspace
to already enter the LPS0 display off state earlier. No idea how that
idea got in my head ...

Regards,

Hans
Antheas Kapenekakis Oct. 5, 2024, 4:50 p.m. UTC | #5
On Sat, 5 Oct 2024 at 18:27, Hans de Goede <hdegoede@redhat.com> wrote:
>
> p.s.
>
> On 5-Oct-24 6:24 PM, Hans de Goede wrote:
> > Hi Antheas,
> >
> > On 5-Oct-24 5:10 PM, Antheas Kapenekakis wrote:
> >> Hi Hans,
> >>
>
> <snip>
>
> >>> IMHO it would be good to submit a v2 of just patches 1 - 3 run through
> >>> checkpatch. Also the commit message of patch 3 should point to the existing
> >>> quirk code in asus-wmi.c and mention that then is no longer necessary after
> >>> patch 3, then we can discuss what is the best place for these quirks.
> >>
> >> I did run it through before sending the patch. However, some of the
> >> warnings were a bit cryptic to me... I will run it again.
> >>
> >> I will add a note for asus-wmi on future patch series.
> >>
> >> First 3 patches of the series are designed to NOOP before patch 4. Did
> >> you mean patch 3 (which adds the delay) instead of 4?
> >
> > Ah I misread the series and failed to notice that patch 4 actually hooks
> > things up, I was under the impression that patch 4 hooks things up.
> >
> > But I did mean that patch 3 might lead to discussion not patch 4.
>
> Oh and upon re-reading the series I see that pathc 5 is just dropping
> the quirks from asus-wmi.c, which is fine.
>
> I somehow thought that the later patches where adding a way for userspace
> to already enter the LPS0 display off state earlier. No idea how that
> idea got in my head ...

Done this way to hopefully be easier to upstream and get this fix out
sooner. The plan here would be 3 series: 1) move Display On/Off +
quirk, 2) move Sleep Entry/Exit + Quirk, 3) RFC for exposing to
userspace, in which case if the kernel starts to suspend while in
standby it would skip those calls.

Antheas
Antheas Kapenekakis Oct. 5, 2024, 4:57 p.m. UTC | #6
On Sat, 5 Oct 2024 at 18:24, Hans de Goede <hdegoede@redhat.com> wrote:
>
> Hi Antheas,
>
> On 5-Oct-24 5:10 PM, Antheas Kapenekakis wrote:
> > Hi Hans,
> >
> >> Thank you for your work on this and thank you for the comprehensive write-up
> >> on how Windows does modern standby.
> >>
> >> First of all may I suggest that you take the above write-up, minus the ROG
> >> Ally specific bits and turn this into a new documentation file under
> >> Documentation/power ?  And also document at which point Linux currently
> >> makes the various transitions.
> >
> > I will try to move some of that documentation there, this is a great idea.
> >
> >> And then in patches where you move the transitions, also update the docs
> >> on what Linux does to match.
> >>
> >> I have read the discussion about tying the display on/off calls to CRTC state
> >> and/or exposing a userspace knob for that. I think that this needs more
> >> discussion / design work.
> >
> > Yes, you are right. To become a knob this would require a much bigger
> > discussion. I would also like to move Sleep calls as part of that. The
> > Legion Go and OneXPlayer devices turn off their controllers as part of
> > that and other modern standby devices limit their power envelope
> > (perhaps the Legion Go too). I think the Sleep call is where most of
> > the userspace usability will come from. Display On/Off is a bit of a
> > NOOP on most devices.
> >
> > As for the LSB0 enter and exit, I do not know where the correct place
> > for those would be, and perhaps someone from Microsoft needs to be
> > consulted on that. The documentation is very vague. However it is
> > clear to me that they should be close to where they are right now, so
> > they very likely do not need to move.
> >
> > There is also the new _DSM intent to turn display on 9 call. Which
> > meshes with the sleep call. That call is made before Sleep Exit, if
> > the kernel knows that the wake-up will cause the display to turn on,
> > to boost the thermal envelope of the device and help it wake up
> > quicker. If the Sleep call is moved then we would also have to
> > introduce that somewhere to avoid wake-up time regressions on devices
> > that support it, which also raises the question of how would the
> > kernel decide if an interrupt will cause the display to turn on before
> > unfreezing userspace (bulk of resume) (or should it be done after
> > unfreezing?).
> >
> >> OTOH IMHO it would be good to take patches 1 - 3 . Certainly 1 + 2 would
> >> be good to have. 3 is a bit unfortunate and not necessary with the current
> >> special ROG Ally handling in the asus-wmi driver. It might be better to
> >> just keep the quirks there.
> >
> > From what I know Luke plans to remove that quirk ASAP due to new
> > firmware. I would keep it around until this patch series merges
> > personally and remove it as part of that.
>
> Ack I replied to Luke to say something like that.
>
> > As it will probably cause regressions if both are in place
>
> I don't see how having both this patch-sets + the asus-wmi
> quirks will cause regressions?  The suspend delay will be done
> twice, but that is harmless.

Probably it will be harmless, but I think the Display On being done
twice, and one of the times being inside the suspend sequence might
result in sub-optimal behavior. Well, the behavior that exists now.

> > and require manual intervention if
> > either is not. I will also note that the quirk in asus-wmi calls the
> > Display On/Off calls a second time and during the suspend sequence,
> > which is not in any way proper.
>
> AFAICT asus-wmi does not call the display on / off callbacks instead
> it directly calls "\\_SB.PCI0.SBRG.EC0.CSEE" to control the power ?

Refer to [1]. CSEE with B7, B8 is in fact the _DSM 3,4 Display On/Off
internal call. There is also a spurious Lid switch call on Display On
that does not exist and causes dmesg errors.

Link: https://github.com/hhd-dev/hwinfo/blob/21b7442219922233c41c0568995214ba92873f69/devices/ally_x/decoded/ssdt28.dsl#L841-L855
[1]

> > So if future devices need this kind of
> > quirk, it really does not seem like a good idea to me to paper over
> > their problems by calling the notifications a second time in random
> > platform drivers. There is the question of where that quirk should be
> > placed, that is true, but I IMO it should be a pm problem.
> >
> > Perhaps not in the location where I put it though and perhaps it
> > should be done with LSB0 callbacks instead. Although, being done this
> > way allows for it to blend with the suspend sequence. Ideally, the
> > Display Off delay would be blended with userspace going down such that
> > if e.g., there is heavy userspace activity that requires ~2s to
> > freeze, the quirk would add no delay. Instead, it would only add delay
> > if userspace freezes quickly (less than .5s). Same can be said with
> > Sleep Entry and beginning prepare_late, which blocks the EC interrupts
> > (that would need a lot of investigation though).
> >
> > On that note, it seems to me that the Ally has 2 bugs related to the
> > _DSM calls 3 and 4. First bug is that Display On is gated on current
> > firmware and only works when the USB subsystem is powered on.
> > Allegedly, this is fixed on the upcoming firmware but it is not
> > something I have verified personally. I will verify it in 10 days or
> > so, if the new firmware does not fail QA I guess.
> >
> > However, there is a second bug with Display Off in _DSM 4. The
> > controller of the Ally needs time to power off, around 500ms.
> > Otherwise it gets its power clipped and/or does not power off
> > correctly. This causes the issues mentioned in the discussion and I
> > have no indication that this is fixed with newer controller firmware.
> > It is also my understanding that most of the testing of the new
> > firmware happened with the asus-wmi quirk in place, which papers over
> > that issue, so removing the quirk might be premature in any case.
>
> Ok.
>
> > We have currently released this patch series in Bazzite and I am happy
> > to report that it completely fixes all controller related issues in
> > the Ally devices and makes them behave exactly as they do in Windows,
> > regardless of firmware and bug for bug.
> >
> > So we will be keeping it around and extending it as appropriate to
> > include the Sleep calls. I am reminded multiple times per week that
> > the Ally has TDP suspend bugs, where if the user is playing a heavy
> > game, the EC of the device tends to get stuck at 6W and fail to
> > respond after waking the device. So moving calls 7, 8 is the natural
> > next step in this investigation. I already have a draft patch on
> > standby, that we plan to AB test soon.
> >
> >> IMHO it would be good to submit a v2 of just patches 1 - 3 run through
> >> checkpatch. Also the commit message of patch 3 should point to the existing
> >> quirk code in asus-wmi.c and mention that then is no longer necessary after
> >> patch 3, then we can discuss what is the best place for these quirks.
> >
> > I did run it through before sending the patch. However, some of the
> > warnings were a bit cryptic to me... I will run it again.
> >
> > I will add a note for asus-wmi on future patch series.
> >
> > First 3 patches of the series are designed to NOOP before patch 4. Did
> > you mean patch 3 (which adds the delay) instead of 4?
>
> Ah I misread the series and failed to notice that patch 4 actually hooks
> things up, I was under the impression that patch 4 hooks things up.
>
> But I did mean that patch 3 might lead to discussion not patch 4.
>
> Now that I have taken a better look some more detailed review comments:
>
> * Patches 1 and 2 should be squashed into a single patch IMHO.
>
> * Patch 3 adds the quirks to kernel/power/suspend.c but this
> really should be added to drivers/acpi/x86/s2idle.c and then do
> the sleep as part of the display_off callback.
>
> * Given my comment on patch 3 I would swap the order of patch 3 and 4
>   and only add the quirks after moving the display off ACPI call to
>   the new callback
>
> * Patch 4 does too much in a single patch, specifically besides
> actually implementing the new callbacks it also does s/SCREEN/DISPLAY
> on all the ACPI_LPS0_* defines. This renaming of the defines must
> be done in a separate patch.

All are fair comments. I will fix them on the next revision.

On the Patch 3 comment, do you think there is merit with blending the
quirk with userspace freezing? Moving it inside LPS0 would make that
difficult, however at the same time 500ms for just the Ally (and
perhaps 2-3 other affected devices) is not something particularly
noticeable anyway.

Antheas
Hans de Goede Oct. 5, 2024, 9:47 p.m. UTC | #7
Hi Antheas,

On 5-Oct-24 6:24 PM, Hans de Goede wrote:
> Hi Antheas,
> 
> On 5-Oct-24 5:10 PM, Antheas Kapenekakis wrote:
>> Hi Hans,
>>
>>> Thank you for your work on this and thank you for the comprehensive write-up
>>> on how Windows does modern standby.
>>>
>>> First of all may I suggest that you take the above write-up, minus the ROG
>>> Ally specific bits and turn this into a new documentation file under
>>> Documentation/power ?  And also document at which point Linux currently
>>> makes the various transitions.
>>
>> I will try to move some of that documentation there, this is a great idea.
>>
>>> And then in patches where you move the transitions, also update the docs
>>> on what Linux does to match.
>>>
>>> I have read the discussion about tying the display on/off calls to CRTC state
>>> and/or exposing a userspace knob for that. I think that this needs more
>>> discussion / design work.
>>
>> Yes, you are right. To become a knob this would require a much bigger
>> discussion. I would also like to move Sleep calls as part of that. The
>> Legion Go and OneXPlayer devices turn off their controllers as part of
>> that and other modern standby devices limit their power envelope
>> (perhaps the Legion Go too). I think the Sleep call is where most of
>> the userspace usability will come from. Display On/Off is a bit of a
>> NOOP on most devices.
>>
>> As for the LSB0 enter and exit, I do not know where the correct place
>> for those would be, and perhaps someone from Microsoft needs to be
>> consulted on that. The documentation is very vague. However it is
>> clear to me that they should be close to where they are right now, so
>> they very likely do not need to move.
>>
>> There is also the new _DSM intent to turn display on 9 call. Which
>> meshes with the sleep call. That call is made before Sleep Exit, if
>> the kernel knows that the wake-up will cause the display to turn on,
>> to boost the thermal envelope of the device and help it wake up
>> quicker. If the Sleep call is moved then we would also have to
>> introduce that somewhere to avoid wake-up time regressions on devices
>> that support it, which also raises the question of how would the
>> kernel decide if an interrupt will cause the display to turn on before
>> unfreezing userspace (bulk of resume) (or should it be done after
>> unfreezing?).
>>
>>> OTOH IMHO it would be good to take patches 1 - 3 . Certainly 1 + 2 would
>>> be good to have. 3 is a bit unfortunate and not necessary with the current
>>> special ROG Ally handling in the asus-wmi driver. It might be better to
>>> just keep the quirks there.
>>
>> From what I know Luke plans to remove that quirk ASAP due to new
>> firmware. I would keep it around until this patch series merges
>> personally and remove it as part of that.
> 
> Ack I replied to Luke to say something like that.
> 
>> As it will probably cause regressions if both are in place
> 
> I don't see how having both this patch-sets + the asus-wmi
> quirks will cause regressions?  The suspend delay will be done
> twice, but that is harmless. 
> 
>> and require manual intervention if
>> either is not. I will also note that the quirk in asus-wmi calls the
>> Display On/Off calls a second time and during the suspend sequence,
>> which is not in any way proper.
> 
> AFAICT asus-wmi does not call the display on / off callbacks instead
> it directly calls "\\_SB.PCI0.SBRG.EC0.CSEE" to control the power ?
> 
>> So if future devices need this kind of
>> quirk, it really does not seem like a good idea to me to paper over
>> their problems by calling the notifications a second time in random
>> platform drivers. There is the question of where that quirk should be
>> placed, that is true, but I IMO it should be a pm problem.
>>
>> Perhaps not in the location where I put it though and perhaps it
>> should be done with LSB0 callbacks instead. Although, being done this
>> way allows for it to blend with the suspend sequence. Ideally, the
>> Display Off delay would be blended with userspace going down such that
>> if e.g., there is heavy userspace activity that requires ~2s to
>> freeze, the quirk would add no delay. Instead, it would only add delay
>> if userspace freezes quickly (less than .5s). Same can be said with
>> Sleep Entry and beginning prepare_late, which blocks the EC interrupts
>> (that would need a lot of investigation though).
>>
>> On that note, it seems to me that the Ally has 2 bugs related to the
>> _DSM calls 3 and 4. First bug is that Display On is gated on current
>> firmware and only works when the USB subsystem is powered on.
>> Allegedly, this is fixed on the upcoming firmware but it is not
>> something I have verified personally. I will verify it in 10 days or
>> so, if the new firmware does not fail QA I guess.
>>
>> However, there is a second bug with Display Off in _DSM 4. The
>> controller of the Ally needs time to power off, around 500ms.
>> Otherwise it gets its power clipped and/or does not power off
>> correctly. This causes the issues mentioned in the discussion and I
>> have no indication that this is fixed with newer controller firmware.
>> It is also my understanding that most of the testing of the new
>> firmware happened with the asus-wmi quirk in place, which papers over
>> that issue, so removing the quirk might be premature in any case.
> 
> Ok.
> 
>> We have currently released this patch series in Bazzite and I am happy
>> to report that it completely fixes all controller related issues in
>> the Ally devices and makes them behave exactly as they do in Windows,
>> regardless of firmware and bug for bug.
>>
>> So we will be keeping it around and extending it as appropriate to
>> include the Sleep calls. I am reminded multiple times per week that
>> the Ally has TDP suspend bugs, where if the user is playing a heavy
>> game, the EC of the device tends to get stuck at 6W and fail to
>> respond after waking the device. So moving calls 7, 8 is the natural
>> next step in this investigation. I already have a draft patch on
>> standby, that we plan to AB test soon.
>>
>>> IMHO it would be good to submit a v2 of just patches 1 - 3 run through
>>> checkpatch. Also the commit message of patch 3 should point to the existing
>>> quirk code in asus-wmi.c and mention that then is no longer necessary after
>>> patch 3, then we can discuss what is the best place for these quirks.
>>
>> I did run it through before sending the patch. However, some of the
>> warnings were a bit cryptic to me... I will run it again.
>>
>> I will add a note for asus-wmi on future patch series.
>>
>> First 3 patches of the series are designed to NOOP before patch 4. Did
>> you mean patch 3 (which adds the delay) instead of 4?
> 
> Ah I misread the series and failed to notice that patch 4 actually hooks
> things up, I was under the impression that patch 4 hooks things up.
> 
> But I did mean that patch 3 might lead to discussion not patch 4.
> 
> Now that I have taken a better look some more detailed review comments:
> 
> * Patches 1 and 2 should be squashed into a single patch IMHO.
> 
> * Patch 3 adds the quirks to kernel/power/suspend.c but this
> really should be added to drivers/acpi/x86/s2idle.c and then do
> the sleep as part of the display_off callback.
> 
> * Given my comment on patch 3 I would swap the order of patch 3 and 4
>   and only add the quirks after moving the display off ACPI call to
>   the new callback
> 
> * Patch 4 does too much in a single patch, specifically besides
> actually implementing the new callbacks it also does s/SCREEN/DISPLAY
> on all the ACPI_LPS0_* defines. This renaming of the defines must
> be done in a separate patch.

Thinking some more about this I am having second doubts about
moving the LPS0 display power off call to before devices are suspended,
doing so would mean that the display might still be on when that call
is made and that call could disable power-resources which are necessary
for the display causing issues when the display driver's suspend method
runs.

So I think that we need something closer to Mario's POC from:

https://git.kernel.org/pub/scm/linux/kernel/git/superm1/linux.git/log/?h=superm1/dsm-screen-on-off

here where the call is made when the last display is turned off.

IOW have the drm modesetting core call this.

Maybe have something like a enabled_displays counter in the
drm-core which gets increased / decreased by helpers and
have the drm-core call platform_suspend_screen_off() /
platform_suspend_screen_on() when the counter goes from 1 -> 0
resp. 0 -> 1, ignoring the very first 0 -> 1 transition
which will be done when the first GPU with an enabled
output is found ?

The idea being that the first increase() call gets made when
a drm/kms driver probes a display and finds outputs which are
light up during probe() and then further increase / decrease
calls are made either when all displays go off; or maybe
per crtc when the crtc gets enabled / disabled.

Anyways how best to do this at display off time should be
discussed with the drm/kms community on the dri-devel list.

Regards,

Hans
Antheas Kapenekakis Oct. 5, 2024, 10:15 p.m. UTC | #8
<skip>

> Thinking some more about this I am having second doubts about
> moving the LPS0 display power off call to before devices are suspended,
> doing so would mean that the display might still be on when that call
> is made and that call could disable power-resources which are necessary
> for the display causing issues when the display driver's suspend method
> runs.

Is there any device where that is used for display powersaving?

> So I think that we need something closer to Mario's POC from:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/superm1/linux.git/log/?h=superm1/dsm-screen-on-off
>
> here where the call is made when the last display is turned off.
>
> IOW have the drm modesetting core call this.

I can see two problems with this approach: 1) it would happen in a
random point in the suspend sequence, introducing race conditions with
sensitive modern standby devices (e.g., Ally). 2) It would not be
gated and debounced properly, so a drm driver could call it 5 times
when you e.g., plug in an HDMI cable.

And indeed that is the case, that PR horribly breaks the Ally even
while any asus-wmi quirk was active. Perhaps DRM can be consulted
though, see below.

> Maybe have something like a enabled_displays counter in the
> drm-core which gets increased / decreased by helpers and
> have the drm-core call platform_suspend_screen_off() /
> platform_suspend_screen_on() when the counter goes from 1 -> 0
> resp. 0 -> 1, ignoring the very first 0 -> 1 transition
> which will be done when the first GPU with an enabled
> output is found ?

To quote Microsoft [1], "This _DSM Function will be invoked when the
operating system has entered a state where all displays—local and
*remote*, if any—have been turned off."

Since it says remote, binding it to DRM could prove difficult. In
addition, the call in Windows is made 5 seconds after the displays
turn off due to inactivity. To mirror this behavior you would need
userspace.

If there is strong indication that the Display On/Off calls interfere
with the DRM subsystem, e.g., turn off a GPU in certain laptops, the
call could be gated with a counter similar to Mario's PR and error
out. In that way, it is still controllable by userspace while ensuring
the device display is off. Is there such an indication/do we know of
such a device?

The Ally and Legion Go which I tested happily had their display turn
on after I yanked their display on and sleep exit callbacks. The
Legion Go even had its suspend light blink while the screen was on.
And both had disabled controllers. This behavior was sticky even after
a reboot. I suppose this is due to the fact that the device might
hibernate, so the EC would have to remember the last state before
power off.

> The idea being that the first increase() call gets made when
> a drm/kms driver probes a display and finds outputs which are
> light up during probe() and then further increase / decrease
> calls are made either when all displays go off; or maybe
> per crtc when the crtc gets enabled / disabled.
>
> Anyways how best to do this at display off time should be
> discussed with the drm/kms community on the dri-devel list.

I can cc on the next version.

Link: https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/modern-standby-firmware-notifications#display-off-notification-function-3
[1]

Antheas
Hans de Goede Oct. 6, 2024, 10:16 a.m. UTC | #9
Hi,

On 6-Oct-24 12:15 AM, Antheas Kapenekakis wrote:
> <skip>
> 
>> Thinking some more about this I am having second doubts about
>> moving the LPS0 display power off call to before devices are suspended,
>> doing so would mean that the display might still be on when that call
>> is made and that call could disable power-resources which are necessary
>> for the display causing issues when the display driver's suspend method
>> runs.
> 
> Is there any device where that is used for display powersaving?

The problem is that we cannot rule out that the LPS0 display off
call relies on the displays actually being off.

I have seen ACPI AML code do all sort of crazy stuff.

So IMHO we really need to make sure that all physical displays
are off before we make the LPS0 display off call.

I have read what you wrote about this also applying to virtual
displays, I guess that means that there should be no rendering done
(so also no GPU non display tasks) when this is called.

IOW it might be best to tie this to all VGA class PCI devices
being in D3 as Mario suggested.

Regards,

Hans








>> So I think that we need something closer to Mario's POC from:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/superm1/linux.git/log/?h=superm1/dsm-screen-on-off
>>
>> here where the call is made when the last display is turned off.
>>
>> IOW have the drm modesetting core call this.
> 
> I can see two problems with this approach: 1) it would happen in a
> random point in the suspend sequence, introducing race conditions with
> sensitive modern standby devices (e.g., Ally). 2) It would not be
> gated and debounced properly, so a drm driver could call it 5 times
> when you e.g., plug in an HDMI cable.
> 
> And indeed that is the case, that PR horribly breaks the Ally even
> while any asus-wmi quirk was active. Perhaps DRM can be consulted
> though, see below.
> 
>> Maybe have something like a enabled_displays counter in the
>> drm-core which gets increased / decreased by helpers and
>> have the drm-core call platform_suspend_screen_off() /
>> platform_suspend_screen_on() when the counter goes from 1 -> 0
>> resp. 0 -> 1, ignoring the very first 0 -> 1 transition
>> which will be done when the first GPU with an enabled
>> output is found ?
> 
> To quote Microsoft [1], "This _DSM Function will be invoked when the
> operating system has entered a state where all displays—local and
> *remote*, if any—have been turned off."
> 
> Since it says remote, binding it to DRM could prove difficult. In
> addition, the call in Windows is made 5 seconds after the displays
> turn off due to inactivity. To mirror this behavior you would need
> userspace.
> 
> If there is strong indication that the Display On/Off calls interfere
> with the DRM subsystem, e.g., turn off a GPU in certain laptops, the
> call could be gated with a counter similar to Mario's PR and error
> out. In that way, it is still controllable by userspace while ensuring
> the device display is off. Is there such an indication/do we know of
> such a device?
> 
> The Ally and Legion Go which I tested happily had their display turn
> on after I yanked their display on and sleep exit callbacks. The
> Legion Go even had its suspend light blink while the screen was on.
> And both had disabled controllers. This behavior was sticky even after
> a reboot. I suppose this is due to the fact that the device might
> hibernate, so the EC would have to remember the last state before
> power off.
> 
>> The idea being that the first increase() call gets made when
>> a drm/kms driver probes a display and finds outputs which are
>> light up during probe() and then further increase / decrease
>> calls are made either when all displays go off; or maybe
>> per crtc when the crtc gets enabled / disabled.
>>
>> Anyways how best to do this at display off time should be
>> discussed with the drm/kms community on the dri-devel list.
> 
> I can cc on the next version.
> 
> Link: https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/modern-standby-firmware-notifications#display-off-notification-function-3
> [1]
> 
> Antheas
>
Antheas Kapenekakis Oct. 6, 2024, 11:29 a.m. UTC | #10
Hi Hans,
there is no rush from my end to see this series merge. The current
asus-wmi quirk works well for the Ally, all firmwares. New firmware is
also supposed to fix powersaving with it.

Yes, that quirk is suboptimal as it adds a noticeable delay to suspend
and resume and blocks powersaving from working correctly in old
firmwares. However, as most Ally users are on custom kernels anyway,
and this patch series can be merged quite easily into them, broadly
speaking it is a non-issue if the mainline kernel has ideal behavior
on the Ally for the next few months. From user feedback, they did not
notice changing to this patch anyway, other than the powersaving
benefit ("did you change anything? my ally just uses less power now").

Next revision I will cc dri so we get feedback from there too.

> >> Thinking some more about this I am having second doubts about
> >> moving the LPS0 display power off call to before devices are suspended,
> >> doing so would mean that the display might still be on when that call
> >> is made and that call could disable power-resources which are necessary
> >> for the display causing issues when the display driver's suspend method
> >> runs.
> >
> > Is there any device where that is used for display powersaving?
>
> The problem is that we cannot rule out that the LPS0 display off
> call relies on the displays actually being off.
>
> I have seen ACPI AML code do all sort of crazy stuff.

Indeed, as this series shows (for other reasons).

> So IMHO we really need to make sure that all physical displays
> are off before we make the LPS0 display off call.

In my use-case I'd like to be able to fire the display off
notification prior to turning off the screen or turn on the screen
after both Display Off and Sleep Entry have fired to be able to show a
dim lockscreen if the user briefly interacts with the device. I will
be doing this downstream for a limited set of prevalidated devices
though.

Therefore, before we block this behavior in a non-certain manner (as
it will be up to each gpu driver to do it), there needs to be some
documentation showing there is an issue, certain manufacturers rely on
the behavior, or that Microsoft has guaranteed that this is the case
in Windows. Even with any of the former, a blacklist quirk for those
manufacturers can be put in place in their GPU driver that blocks the
Display Off _DSM from firing and automatically calls the Display On
_DSM if the GPU wants to wake up the display.

> I have read what you wrote about this also applying to virtual
> displays, I guess that means that there should be no rendering done
> (so also no GPU non display tasks) when this is called.
>
> IOW it might be best to tie this to all VGA class PCI devices
> being in D3 as Mario suggested.

GPU can do stuff while the screen is off: render videos, hold browser
videos in main memory, have a game in the background, etc. If by D3
you mean the whole GPU has powered off, this would be a deviation from
Windows.

Antheas