diff mbox series

[v2,8/8] drm/amdgpu: Call drm_atomic_helper_shutdown() at shutdown time

Message ID 20240612152752.v2.8.I27914059cc822b52db9bf72b4013b525b60e06fd@changeid (mailing list archive)
State New, archived
Headers show
Series drm: make leftover drivers call drm_atomic_helper_shutdown() at the right times | expand

Commit Message

Doug Anderson June 12, 2024, 10:28 p.m. UTC
Based on grepping through the source code this driver appears to be
missing a call to drm_atomic_helper_shutdown() at system shutdown
time. Among other things, this means that if a panel is in use that it
won't be cleanly powered off at system shutdown time.

The fact that we should call drm_atomic_helper_shutdown() in the case
of OS shutdown/restart comes straight out of the kernel doc "driver
instance overview" in drm_drv.c.

Suggested-by: Maxime Ripard <mripard@kernel.org>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Xinhui Pan <Xinhui.Pan@amd.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---
This commit is only compile-time tested.

...and further, I'd say that this patch is more of a plea for help
than a patch I think is actually right. I'm _fairly_ certain that
drm/amdgpu needs this call at shutdown time but the logic is a bit
hard for me to follow. I'd appreciate if anyone who actually knows
what this should look like could illuminate me, or perhaps even just
post a patch themselves!

(no changes since v1)

 drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |  2 ++
 3 files changed, 13 insertions(+)

Comments

Alex Deucher June 17, 2024, 3:01 p.m. UTC | #1
On Wed, Jun 12, 2024 at 6:37 PM Douglas Anderson <dianders@chromium.org> wrote:
>
> Based on grepping through the source code this driver appears to be
> missing a call to drm_atomic_helper_shutdown() at system shutdown
> time. Among other things, this means that if a panel is in use that it
> won't be cleanly powered off at system shutdown time.
>
> The fact that we should call drm_atomic_helper_shutdown() in the case
> of OS shutdown/restart comes straight out of the kernel doc "driver
> instance overview" in drm_drv.c.
>
> Suggested-by: Maxime Ripard <mripard@kernel.org>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Xinhui Pan <Xinhui.Pan@amd.com>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
> This commit is only compile-time tested.
>
> ...and further, I'd say that this patch is more of a plea for help
> than a patch I think is actually right. I'm _fairly_ certain that
> drm/amdgpu needs this call at shutdown time but the logic is a bit
> hard for me to follow. I'd appreciate if anyone who actually knows
> what this should look like could illuminate me, or perhaps even just
> post a patch themselves!

I'm not sure this patch makes sense or not.  The driver doesn't really
do a formal tear down in its shutdown routine, it just quiesces the
hardware.  What are the actual requirements of the shutdown function?
In the past when we did a full driver tear down in shutdown, it
delayed the shutdown sequence and users complained.

Alex

>
> (no changes since v1)
>
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++++++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |  2 ++
>  3 files changed, 13 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index f87d53e183c3..c202a1d5ff5f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -1197,6 +1197,7 @@ static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_device *bdev)
>  int amdgpu_device_init(struct amdgpu_device *adev,
>                        uint32_t flags);
>  void amdgpu_device_fini_hw(struct amdgpu_device *adev);
> +void amdgpu_device_shutdown_hw(struct amdgpu_device *adev);
>  void amdgpu_device_fini_sw(struct amdgpu_device *adev);
>
>  int amdgpu_gpu_wait_for_idle(struct amdgpu_device *adev);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 861ccff78af9..a8c4b8412e04 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -4531,6 +4531,16 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
>
>  }
>
> +void amdgpu_device_shutdown_hw(struct amdgpu_device *adev)
> +{
> +       if (adev->mode_info.mode_config_initialized) {
> +               if (!drm_drv_uses_atomic_modeset(adev_to_drm(adev)))
> +                       drm_helper_force_disable_all(adev_to_drm(adev));
> +               else
> +                       drm_atomic_helper_shutdown(adev_to_drm(adev));
> +       }
> +}
> +
>  void amdgpu_device_fini_sw(struct amdgpu_device *adev)
>  {
>         int idx;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index ea14f1c8f430..b34bf9259d5c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -2409,6 +2409,8 @@ amdgpu_pci_shutdown(struct pci_dev *pdev)
>         struct drm_device *dev = pci_get_drvdata(pdev);
>         struct amdgpu_device *adev = drm_to_adev(dev);
>
> +       amdgpu_device_shutdown_hw(adev);
> +
>         if (amdgpu_ras_intr_triggered())
>                 return;
>
> --
> 2.45.2.505.gda0bf45e8d-goog
>
Doug Anderson June 18, 2024, 9:34 p.m. UTC | #2
Hi,


On Mon, Jun 17, 2024 at 8:01 AM Alex Deucher <alexdeucher@gmail.com> wrote:
>
> On Wed, Jun 12, 2024 at 6:37 PM Douglas Anderson <dianders@chromium.org> wrote:
> >
> > Based on grepping through the source code this driver appears to be
> > missing a call to drm_atomic_helper_shutdown() at system shutdown
> > time. Among other things, this means that if a panel is in use that it
> > won't be cleanly powered off at system shutdown time.
> >
> > The fact that we should call drm_atomic_helper_shutdown() in the case
> > of OS shutdown/restart comes straight out of the kernel doc "driver
> > instance overview" in drm_drv.c.
> >
> > Suggested-by: Maxime Ripard <mripard@kernel.org>
> > Cc: Alex Deucher <alexander.deucher@amd.com>
> > Cc: Christian König <christian.koenig@amd.com>
> > Cc: Xinhui Pan <Xinhui.Pan@amd.com>
> > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > ---
> > This commit is only compile-time tested.
> >
> > ...and further, I'd say that this patch is more of a plea for help
> > than a patch I think is actually right. I'm _fairly_ certain that
> > drm/amdgpu needs this call at shutdown time but the logic is a bit
> > hard for me to follow. I'd appreciate if anyone who actually knows
> > what this should look like could illuminate me, or perhaps even just
> > post a patch themselves!
>
> I'm not sure this patch makes sense or not.  The driver doesn't really
> do a formal tear down in its shutdown routine, it just quiesces the
> hardware.  What are the actual requirements of the shutdown function?
> In the past when we did a full driver tear down in shutdown, it
> delayed the shutdown sequence and users complained.

The "inspiration" for this patch is to handle panels properly.
Specifically, panels often have several power/enable signals going to
them and often have requirements that these signals are powered off in
the proper order with the proper delays between them. While we can't
always do so when the system crashes / reboots in an uncontrolled way,
panel manufacturers / HW Engineers get upset if we don't power things
off properly during an orderly shutdown/reboot. When panels are
powered off badly it can cause garbage on the screen and, so I've been
told, can even cause long term damage to the panels over time.

In Linux, some panel drivers have tried to ensure a proper poweroff of
the panel by handling the shutdown() call themselves. However, this is
ugly and panel maintainers want panel drivers to stop doing it. We
have removed the code doing this from most panels now [1]. Instead the
assumption is that the DRM modeset drivers should be calling
drm_atomic_helper_shutdown() which will make sure panels get an
orderly shutdown.

For a lot more details, see the cover letter [2] which then contains
links to even more discussions about the topic.

[1] https://lore.kernel.org/r/20240605002401.2848541-1-dianders@chromium.org
[2] https://lore.kernel.org/r/20240612222435.3188234-1-dianders@chromium.org
Alex Deucher June 18, 2024, 9:59 p.m. UTC | #3
On Tue, Jun 18, 2024 at 5:40 PM Doug Anderson <dianders@chromium.org> wrote:
>
> Hi,
>
>
> On Mon, Jun 17, 2024 at 8:01 AM Alex Deucher <alexdeucher@gmail.com> wrote:
> >
> > On Wed, Jun 12, 2024 at 6:37 PM Douglas Anderson <dianders@chromium.org> wrote:
> > >
> > > Based on grepping through the source code this driver appears to be
> > > missing a call to drm_atomic_helper_shutdown() at system shutdown
> > > time. Among other things, this means that if a panel is in use that it
> > > won't be cleanly powered off at system shutdown time.
> > >
> > > The fact that we should call drm_atomic_helper_shutdown() in the case
> > > of OS shutdown/restart comes straight out of the kernel doc "driver
> > > instance overview" in drm_drv.c.
> > >
> > > Suggested-by: Maxime Ripard <mripard@kernel.org>
> > > Cc: Alex Deucher <alexander.deucher@amd.com>
> > > Cc: Christian König <christian.koenig@amd.com>
> > > Cc: Xinhui Pan <Xinhui.Pan@amd.com>
> > > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > > ---
> > > This commit is only compile-time tested.
> > >
> > > ...and further, I'd say that this patch is more of a plea for help
> > > than a patch I think is actually right. I'm _fairly_ certain that
> > > drm/amdgpu needs this call at shutdown time but the logic is a bit
> > > hard for me to follow. I'd appreciate if anyone who actually knows
> > > what this should look like could illuminate me, or perhaps even just
> > > post a patch themselves!
> >
> > I'm not sure this patch makes sense or not.  The driver doesn't really
> > do a formal tear down in its shutdown routine, it just quiesces the
> > hardware.  What are the actual requirements of the shutdown function?
> > In the past when we did a full driver tear down in shutdown, it
> > delayed the shutdown sequence and users complained.
>
> The "inspiration" for this patch is to handle panels properly.
> Specifically, panels often have several power/enable signals going to
> them and often have requirements that these signals are powered off in
> the proper order with the proper delays between them. While we can't
> always do so when the system crashes / reboots in an uncontrolled way,
> panel manufacturers / HW Engineers get upset if we don't power things
> off properly during an orderly shutdown/reboot. When panels are
> powered off badly it can cause garbage on the screen and, so I've been
> told, can even cause long term damage to the panels over time.
>
> In Linux, some panel drivers have tried to ensure a proper poweroff of
> the panel by handling the shutdown() call themselves. However, this is
> ugly and panel maintainers want panel drivers to stop doing it. We
> have removed the code doing this from most panels now [1]. Instead the
> assumption is that the DRM modeset drivers should be calling
> drm_atomic_helper_shutdown() which will make sure panels get an
> orderly shutdown.
>
> For a lot more details, see the cover letter [2] which then contains
> links to even more discussions about the topic.
>
> [1] https://lore.kernel.org/r/20240605002401.2848541-1-dianders@chromium.org
> [2] https://lore.kernel.org/r/20240612222435.3188234-1-dianders@chromium.org

I don't think it's an issue.  We quiesce the hardware as if we were
about to suspend the system (e.g., S3).  For the display hardware we
call drm_atomic_helper_suspend() as part of that sequence.

Alex
Doug Anderson June 18, 2024, 11:52 p.m. UTC | #4
Hi,

On Tue, Jun 18, 2024 at 3:00 PM Alex Deucher <alexdeucher@gmail.com> wrote:
>
> On Tue, Jun 18, 2024 at 5:40 PM Doug Anderson <dianders@chromium.org> wrote:
> >
> > Hi,
> >
> >
> > On Mon, Jun 17, 2024 at 8:01 AM Alex Deucher <alexdeucher@gmail.com> wrote:
> > >
> > > On Wed, Jun 12, 2024 at 6:37 PM Douglas Anderson <dianders@chromium.org> wrote:
> > > >
> > > > Based on grepping through the source code this driver appears to be
> > > > missing a call to drm_atomic_helper_shutdown() at system shutdown
> > > > time. Among other things, this means that if a panel is in use that it
> > > > won't be cleanly powered off at system shutdown time.
> > > >
> > > > The fact that we should call drm_atomic_helper_shutdown() in the case
> > > > of OS shutdown/restart comes straight out of the kernel doc "driver
> > > > instance overview" in drm_drv.c.
> > > >
> > > > Suggested-by: Maxime Ripard <mripard@kernel.org>
> > > > Cc: Alex Deucher <alexander.deucher@amd.com>
> > > > Cc: Christian König <christian.koenig@amd.com>
> > > > Cc: Xinhui Pan <Xinhui.Pan@amd.com>
> > > > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > > > ---
> > > > This commit is only compile-time tested.
> > > >
> > > > ...and further, I'd say that this patch is more of a plea for help
> > > > than a patch I think is actually right. I'm _fairly_ certain that
> > > > drm/amdgpu needs this call at shutdown time but the logic is a bit
> > > > hard for me to follow. I'd appreciate if anyone who actually knows
> > > > what this should look like could illuminate me, or perhaps even just
> > > > post a patch themselves!
> > >
> > > I'm not sure this patch makes sense or not.  The driver doesn't really
> > > do a formal tear down in its shutdown routine, it just quiesces the
> > > hardware.  What are the actual requirements of the shutdown function?
> > > In the past when we did a full driver tear down in shutdown, it
> > > delayed the shutdown sequence and users complained.
> >
> > The "inspiration" for this patch is to handle panels properly.
> > Specifically, panels often have several power/enable signals going to
> > them and often have requirements that these signals are powered off in
> > the proper order with the proper delays between them. While we can't
> > always do so when the system crashes / reboots in an uncontrolled way,
> > panel manufacturers / HW Engineers get upset if we don't power things
> > off properly during an orderly shutdown/reboot. When panels are
> > powered off badly it can cause garbage on the screen and, so I've been
> > told, can even cause long term damage to the panels over time.
> >
> > In Linux, some panel drivers have tried to ensure a proper poweroff of
> > the panel by handling the shutdown() call themselves. However, this is
> > ugly and panel maintainers want panel drivers to stop doing it. We
> > have removed the code doing this from most panels now [1]. Instead the
> > assumption is that the DRM modeset drivers should be calling
> > drm_atomic_helper_shutdown() which will make sure panels get an
> > orderly shutdown.
> >
> > For a lot more details, see the cover letter [2] which then contains
> > links to even more discussions about the topic.
> >
> > [1] https://lore.kernel.org/r/20240605002401.2848541-1-dianders@chromium.org
> > [2] https://lore.kernel.org/r/20240612222435.3188234-1-dianders@chromium.org
>
> I don't think it's an issue.  We quiesce the hardware as if we were
> about to suspend the system (e.g., S3).  For the display hardware we
> call drm_atomic_helper_suspend() as part of that sequence.

OK. It's no skin off my teeth and we can drop this patch if you're
convinced it's not needed. From the point of view of someone who has
no experience with this driver it seems weird to me that it would use
drm_atomic_helper_suspend() at shutdown time instead of the documented
drm_atomic_helper_shutdown(), but if it works for everyone then I'm
not gonna complain.

-Doug
Alex Deucher June 19, 2024, 1:50 p.m. UTC | #5
On Tue, Jun 18, 2024 at 7:53 PM Doug Anderson <dianders@chromium.org> wrote:
>
> Hi,
>
> On Tue, Jun 18, 2024 at 3:00 PM Alex Deucher <alexdeucher@gmail.com> wrote:
> >
> > On Tue, Jun 18, 2024 at 5:40 PM Doug Anderson <dianders@chromium.org> wrote:
> > >
> > > Hi,
> > >
> > >
> > > On Mon, Jun 17, 2024 at 8:01 AM Alex Deucher <alexdeucher@gmail.com> wrote:
> > > >
> > > > On Wed, Jun 12, 2024 at 6:37 PM Douglas Anderson <dianders@chromium.org> wrote:
> > > > >
> > > > > Based on grepping through the source code this driver appears to be
> > > > > missing a call to drm_atomic_helper_shutdown() at system shutdown
> > > > > time. Among other things, this means that if a panel is in use that it
> > > > > won't be cleanly powered off at system shutdown time.
> > > > >
> > > > > The fact that we should call drm_atomic_helper_shutdown() in the case
> > > > > of OS shutdown/restart comes straight out of the kernel doc "driver
> > > > > instance overview" in drm_drv.c.
> > > > >
> > > > > Suggested-by: Maxime Ripard <mripard@kernel.org>
> > > > > Cc: Alex Deucher <alexander.deucher@amd.com>
> > > > > Cc: Christian König <christian.koenig@amd.com>
> > > > > Cc: Xinhui Pan <Xinhui.Pan@amd.com>
> > > > > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > > > > ---
> > > > > This commit is only compile-time tested.
> > > > >
> > > > > ...and further, I'd say that this patch is more of a plea for help
> > > > > than a patch I think is actually right. I'm _fairly_ certain that
> > > > > drm/amdgpu needs this call at shutdown time but the logic is a bit
> > > > > hard for me to follow. I'd appreciate if anyone who actually knows
> > > > > what this should look like could illuminate me, or perhaps even just
> > > > > post a patch themselves!
> > > >
> > > > I'm not sure this patch makes sense or not.  The driver doesn't really
> > > > do a formal tear down in its shutdown routine, it just quiesces the
> > > > hardware.  What are the actual requirements of the shutdown function?
> > > > In the past when we did a full driver tear down in shutdown, it
> > > > delayed the shutdown sequence and users complained.
> > >
> > > The "inspiration" for this patch is to handle panels properly.
> > > Specifically, panels often have several power/enable signals going to
> > > them and often have requirements that these signals are powered off in
> > > the proper order with the proper delays between them. While we can't
> > > always do so when the system crashes / reboots in an uncontrolled way,
> > > panel manufacturers / HW Engineers get upset if we don't power things
> > > off properly during an orderly shutdown/reboot. When panels are
> > > powered off badly it can cause garbage on the screen and, so I've been
> > > told, can even cause long term damage to the panels over time.
> > >
> > > In Linux, some panel drivers have tried to ensure a proper poweroff of
> > > the panel by handling the shutdown() call themselves. However, this is
> > > ugly and panel maintainers want panel drivers to stop doing it. We
> > > have removed the code doing this from most panels now [1]. Instead the
> > > assumption is that the DRM modeset drivers should be calling
> > > drm_atomic_helper_shutdown() which will make sure panels get an
> > > orderly shutdown.
> > >
> > > For a lot more details, see the cover letter [2] which then contains
> > > links to even more discussions about the topic.
> > >
> > > [1] https://lore.kernel.org/r/20240605002401.2848541-1-dianders@chromium.org
> > > [2] https://lore.kernel.org/r/20240612222435.3188234-1-dianders@chromium.org
> >
> > I don't think it's an issue.  We quiesce the hardware as if we were
> > about to suspend the system (e.g., S3).  For the display hardware we
> > call drm_atomic_helper_suspend() as part of that sequence.
>
> OK. It's no skin off my teeth and we can drop this patch if you're
> convinced it's not needed. From the point of view of someone who has
> no experience with this driver it seems weird to me that it would use
> drm_atomic_helper_suspend() at shutdown time instead of the documented
> drm_atomic_helper_shutdown(), but if it works for everyone then I'm
> not gonna complain.

I think the problem is that it is not clear exactly what the
expectations are around the PCI shutdown callback.  The documentation
says:

"Hook into reboot_notifier_list (kernel/sys.c). Intended to stop any
idling DMA operations. Useful for enabling wake-on-lan (NIC) or
changing the power state of a device before reboot. e.g.
drivers/net/e100.c."

We tried a full driver teardown in the shutdown callback and it added
a lot of latency that really wasn't needed since the system was just
going into a reboot or power down.  The best middle ground was to just
leverage our hw level suspend code to quiesce the hardware.  Adding
complexity to call drm_atomic_helper_suspend() vs
drm_atomic_helper_shutdown() doesn't seem worth it since the functions
do pretty much the same thing (both call
drm_atomic_helper_disable_all()).  Maybe it's better to update the
documentation to recommend drm_atomic_helper_suspend() if drivers want
to leverage their suspend code?

Alex
Alex Deucher June 19, 2024, 1:53 p.m. UTC | #6
On Wed, Jun 19, 2024 at 9:50 AM Alex Deucher <alexdeucher@gmail.com> wrote:
>
> On Tue, Jun 18, 2024 at 7:53 PM Doug Anderson <dianders@chromium.org> wrote:
> >
> > Hi,
> >
> > On Tue, Jun 18, 2024 at 3:00 PM Alex Deucher <alexdeucher@gmail.com> wrote:
> > >
> > > On Tue, Jun 18, 2024 at 5:40 PM Doug Anderson <dianders@chromium.org> wrote:
> > > >
> > > > Hi,
> > > >
> > > >
> > > > On Mon, Jun 17, 2024 at 8:01 AM Alex Deucher <alexdeucher@gmail.com> wrote:
> > > > >
> > > > > On Wed, Jun 12, 2024 at 6:37 PM Douglas Anderson <dianders@chromium.org> wrote:
> > > > > >
> > > > > > Based on grepping through the source code this driver appears to be
> > > > > > missing a call to drm_atomic_helper_shutdown() at system shutdown
> > > > > > time. Among other things, this means that if a panel is in use that it
> > > > > > won't be cleanly powered off at system shutdown time.
> > > > > >
> > > > > > The fact that we should call drm_atomic_helper_shutdown() in the case
> > > > > > of OS shutdown/restart comes straight out of the kernel doc "driver
> > > > > > instance overview" in drm_drv.c.
> > > > > >
> > > > > > Suggested-by: Maxime Ripard <mripard@kernel.org>
> > > > > > Cc: Alex Deucher <alexander.deucher@amd.com>
> > > > > > Cc: Christian König <christian.koenig@amd.com>
> > > > > > Cc: Xinhui Pan <Xinhui.Pan@amd.com>
> > > > > > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > > > > > ---
> > > > > > This commit is only compile-time tested.
> > > > > >
> > > > > > ...and further, I'd say that this patch is more of a plea for help
> > > > > > than a patch I think is actually right. I'm _fairly_ certain that
> > > > > > drm/amdgpu needs this call at shutdown time but the logic is a bit
> > > > > > hard for me to follow. I'd appreciate if anyone who actually knows
> > > > > > what this should look like could illuminate me, or perhaps even just
> > > > > > post a patch themselves!
> > > > >
> > > > > I'm not sure this patch makes sense or not.  The driver doesn't really
> > > > > do a formal tear down in its shutdown routine, it just quiesces the
> > > > > hardware.  What are the actual requirements of the shutdown function?
> > > > > In the past when we did a full driver tear down in shutdown, it
> > > > > delayed the shutdown sequence and users complained.
> > > >
> > > > The "inspiration" for this patch is to handle panels properly.
> > > > Specifically, panels often have several power/enable signals going to
> > > > them and often have requirements that these signals are powered off in
> > > > the proper order with the proper delays between them. While we can't
> > > > always do so when the system crashes / reboots in an uncontrolled way,
> > > > panel manufacturers / HW Engineers get upset if we don't power things
> > > > off properly during an orderly shutdown/reboot. When panels are
> > > > powered off badly it can cause garbage on the screen and, so I've been
> > > > told, can even cause long term damage to the panels over time.
> > > >
> > > > In Linux, some panel drivers have tried to ensure a proper poweroff of
> > > > the panel by handling the shutdown() call themselves. However, this is
> > > > ugly and panel maintainers want panel drivers to stop doing it. We
> > > > have removed the code doing this from most panels now [1]. Instead the
> > > > assumption is that the DRM modeset drivers should be calling
> > > > drm_atomic_helper_shutdown() which will make sure panels get an
> > > > orderly shutdown.
> > > >
> > > > For a lot more details, see the cover letter [2] which then contains
> > > > links to even more discussions about the topic.
> > > >
> > > > [1] https://lore.kernel.org/r/20240605002401.2848541-1-dianders@chromium.org
> > > > [2] https://lore.kernel.org/r/20240612222435.3188234-1-dianders@chromium.org
> > >
> > > I don't think it's an issue.  We quiesce the hardware as if we were
> > > about to suspend the system (e.g., S3).  For the display hardware we
> > > call drm_atomic_helper_suspend() as part of that sequence.
> >
> > OK. It's no skin off my teeth and we can drop this patch if you're
> > convinced it's not needed. From the point of view of someone who has
> > no experience with this driver it seems weird to me that it would use
> > drm_atomic_helper_suspend() at shutdown time instead of the documented
> > drm_atomic_helper_shutdown(), but if it works for everyone then I'm
> > not gonna complain.
>
> I think the problem is that it is not clear exactly what the
> expectations are around the PCI shutdown callback.  The documentation
> says:
>
> "Hook into reboot_notifier_list (kernel/sys.c). Intended to stop any
> idling DMA operations. Useful for enabling wake-on-lan (NIC) or
> changing the power state of a device before reboot. e.g.
> drivers/net/e100.c."

Arguably, there is no requirement to even touch the display hardware
at all.  In theory you could just leave the display hardware as is in
the current state.  The system will either be rebooting or powering
down anyway.

Alex

>
> We tried a full driver teardown in the shutdown callback and it added
> a lot of latency that really wasn't needed since the system was just
> going into a reboot or power down.  The best middle ground was to just
> leverage our hw level suspend code to quiesce the hardware.  Adding
> complexity to call drm_atomic_helper_suspend() vs
> drm_atomic_helper_shutdown() doesn't seem worth it since the functions
> do pretty much the same thing (both call
> drm_atomic_helper_disable_all()).  Maybe it's better to update the
> documentation to recommend drm_atomic_helper_suspend() if drivers want
> to leverage their suspend code?
>
> Alex
Maxime Ripard June 20, 2024, 7:10 a.m. UTC | #7
Hi,

On Wed, Jun 19, 2024 at 09:53:12AM GMT, Alex Deucher wrote:
> On Wed, Jun 19, 2024 at 9:50 AM Alex Deucher <alexdeucher@gmail.com> wrote:
> >
> > On Tue, Jun 18, 2024 at 7:53 PM Doug Anderson <dianders@chromium.org> wrote:
> > >
> > > Hi,
> > >
> > > On Tue, Jun 18, 2024 at 3:00 PM Alex Deucher <alexdeucher@gmail.com> wrote:
> > > >
> > > > On Tue, Jun 18, 2024 at 5:40 PM Doug Anderson <dianders@chromium.org> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > >
> > > > > On Mon, Jun 17, 2024 at 8:01 AM Alex Deucher <alexdeucher@gmail.com> wrote:
> > > > > >
> > > > > > On Wed, Jun 12, 2024 at 6:37 PM Douglas Anderson <dianders@chromium.org> wrote:
> > > > > > >
> > > > > > > Based on grepping through the source code this driver appears to be
> > > > > > > missing a call to drm_atomic_helper_shutdown() at system shutdown
> > > > > > > time. Among other things, this means that if a panel is in use that it
> > > > > > > won't be cleanly powered off at system shutdown time.
> > > > > > >
> > > > > > > The fact that we should call drm_atomic_helper_shutdown() in the case
> > > > > > > of OS shutdown/restart comes straight out of the kernel doc "driver
> > > > > > > instance overview" in drm_drv.c.
> > > > > > >
> > > > > > > Suggested-by: Maxime Ripard <mripard@kernel.org>
> > > > > > > Cc: Alex Deucher <alexander.deucher@amd.com>
> > > > > > > Cc: Christian König <christian.koenig@amd.com>
> > > > > > > Cc: Xinhui Pan <Xinhui.Pan@amd.com>
> > > > > > > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > > > > > > ---
> > > > > > > This commit is only compile-time tested.
> > > > > > >
> > > > > > > ...and further, I'd say that this patch is more of a plea for help
> > > > > > > than a patch I think is actually right. I'm _fairly_ certain that
> > > > > > > drm/amdgpu needs this call at shutdown time but the logic is a bit
> > > > > > > hard for me to follow. I'd appreciate if anyone who actually knows
> > > > > > > what this should look like could illuminate me, or perhaps even just
> > > > > > > post a patch themselves!
> > > > > >
> > > > > > I'm not sure this patch makes sense or not.  The driver doesn't really
> > > > > > do a formal tear down in its shutdown routine, it just quiesces the
> > > > > > hardware.  What are the actual requirements of the shutdown function?
> > > > > > In the past when we did a full driver tear down in shutdown, it
> > > > > > delayed the shutdown sequence and users complained.
> > > > >
> > > > > The "inspiration" for this patch is to handle panels properly.
> > > > > Specifically, panels often have several power/enable signals going to
> > > > > them and often have requirements that these signals are powered off in
> > > > > the proper order with the proper delays between them. While we can't
> > > > > always do so when the system crashes / reboots in an uncontrolled way,
> > > > > panel manufacturers / HW Engineers get upset if we don't power things
> > > > > off properly during an orderly shutdown/reboot. When panels are
> > > > > powered off badly it can cause garbage on the screen and, so I've been
> > > > > told, can even cause long term damage to the panels over time.
> > > > >
> > > > > In Linux, some panel drivers have tried to ensure a proper poweroff of
> > > > > the panel by handling the shutdown() call themselves. However, this is
> > > > > ugly and panel maintainers want panel drivers to stop doing it. We
> > > > > have removed the code doing this from most panels now [1]. Instead the
> > > > > assumption is that the DRM modeset drivers should be calling
> > > > > drm_atomic_helper_shutdown() which will make sure panels get an
> > > > > orderly shutdown.
> > > > >
> > > > > For a lot more details, see the cover letter [2] which then contains
> > > > > links to even more discussions about the topic.
> > > > >
> > > > > [1] https://lore.kernel.org/r/20240605002401.2848541-1-dianders@chromium.org
> > > > > [2] https://lore.kernel.org/r/20240612222435.3188234-1-dianders@chromium.org
> > > >
> > > > I don't think it's an issue.  We quiesce the hardware as if we were
> > > > about to suspend the system (e.g., S3).  For the display hardware we
> > > > call drm_atomic_helper_suspend() as part of that sequence.
> > >
> > > OK. It's no skin off my teeth and we can drop this patch if you're
> > > convinced it's not needed. From the point of view of someone who has
> > > no experience with this driver it seems weird to me that it would use
> > > drm_atomic_helper_suspend() at shutdown time instead of the documented
> > > drm_atomic_helper_shutdown(), but if it works for everyone then I'm
> > > not gonna complain.
> >
> > I think the problem is that it is not clear exactly what the
> > expectations are around the PCI shutdown callback.  The documentation
> > says:
> >
> > "Hook into reboot_notifier_list (kernel/sys.c). Intended to stop any
> > idling DMA operations. Useful for enabling wake-on-lan (NIC) or
> > changing the power state of a device before reboot. e.g.
> > drivers/net/e100.c."
> 
> Arguably, there is no requirement to even touch the display hardware
> at all.  In theory you could just leave the display hardware as is in
> the current state.  The system will either be rebooting or powering
> down anyway.

I think it mostly boils down to a cultural mismatch :)

Doug works on panel for ARM systems, where devices need (and need to
handle) much more resources than what's typical on a system with an AMD
GPU.

So, for the kind of hardware Doug usually deals with, we definitely need
the shutdown hook to make sure the regulators, GPIOs, etc. supplying the
panel are properly shutdown.

And panels usually tied to AMD GPUs probably don't need any of that.

Maxime
Alex Deucher June 20, 2024, 1 p.m. UTC | #8
On Thu, Jun 20, 2024 at 3:10 AM Maxime Ripard <mripard@kernel.org> wrote:
>
> Hi,
>
> On Wed, Jun 19, 2024 at 09:53:12AM GMT, Alex Deucher wrote:
> > On Wed, Jun 19, 2024 at 9:50 AM Alex Deucher <alexdeucher@gmail.com> wrote:
> > >
> > > On Tue, Jun 18, 2024 at 7:53 PM Doug Anderson <dianders@chromium.org> wrote:
> > > >
> > > > Hi,
> > > >
> > > > On Tue, Jun 18, 2024 at 3:00 PM Alex Deucher <alexdeucher@gmail.com> wrote:
> > > > >
> > > > > On Tue, Jun 18, 2024 at 5:40 PM Doug Anderson <dianders@chromium.org> wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > >
> > > > > > On Mon, Jun 17, 2024 at 8:01 AM Alex Deucher <alexdeucher@gmail.com> wrote:
> > > > > > >
> > > > > > > On Wed, Jun 12, 2024 at 6:37 PM Douglas Anderson <dianders@chromium.org> wrote:
> > > > > > > >
> > > > > > > > Based on grepping through the source code this driver appears to be
> > > > > > > > missing a call to drm_atomic_helper_shutdown() at system shutdown
> > > > > > > > time. Among other things, this means that if a panel is in use that it
> > > > > > > > won't be cleanly powered off at system shutdown time.
> > > > > > > >
> > > > > > > > The fact that we should call drm_atomic_helper_shutdown() in the case
> > > > > > > > of OS shutdown/restart comes straight out of the kernel doc "driver
> > > > > > > > instance overview" in drm_drv.c.
> > > > > > > >
> > > > > > > > Suggested-by: Maxime Ripard <mripard@kernel.org>
> > > > > > > > Cc: Alex Deucher <alexander.deucher@amd.com>
> > > > > > > > Cc: Christian König <christian.koenig@amd.com>
> > > > > > > > Cc: Xinhui Pan <Xinhui.Pan@amd.com>
> > > > > > > > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > > > > > > > ---
> > > > > > > > This commit is only compile-time tested.
> > > > > > > >
> > > > > > > > ...and further, I'd say that this patch is more of a plea for help
> > > > > > > > than a patch I think is actually right. I'm _fairly_ certain that
> > > > > > > > drm/amdgpu needs this call at shutdown time but the logic is a bit
> > > > > > > > hard for me to follow. I'd appreciate if anyone who actually knows
> > > > > > > > what this should look like could illuminate me, or perhaps even just
> > > > > > > > post a patch themselves!
> > > > > > >
> > > > > > > I'm not sure this patch makes sense or not.  The driver doesn't really
> > > > > > > do a formal tear down in its shutdown routine, it just quiesces the
> > > > > > > hardware.  What are the actual requirements of the shutdown function?
> > > > > > > In the past when we did a full driver tear down in shutdown, it
> > > > > > > delayed the shutdown sequence and users complained.
> > > > > >
> > > > > > The "inspiration" for this patch is to handle panels properly.
> > > > > > Specifically, panels often have several power/enable signals going to
> > > > > > them and often have requirements that these signals are powered off in
> > > > > > the proper order with the proper delays between them. While we can't
> > > > > > always do so when the system crashes / reboots in an uncontrolled way,
> > > > > > panel manufacturers / HW Engineers get upset if we don't power things
> > > > > > off properly during an orderly shutdown/reboot. When panels are
> > > > > > powered off badly it can cause garbage on the screen and, so I've been
> > > > > > told, can even cause long term damage to the panels over time.
> > > > > >
> > > > > > In Linux, some panel drivers have tried to ensure a proper poweroff of
> > > > > > the panel by handling the shutdown() call themselves. However, this is
> > > > > > ugly and panel maintainers want panel drivers to stop doing it. We
> > > > > > have removed the code doing this from most panels now [1]. Instead the
> > > > > > assumption is that the DRM modeset drivers should be calling
> > > > > > drm_atomic_helper_shutdown() which will make sure panels get an
> > > > > > orderly shutdown.
> > > > > >
> > > > > > For a lot more details, see the cover letter [2] which then contains
> > > > > > links to even more discussions about the topic.
> > > > > >
> > > > > > [1] https://lore.kernel.org/r/20240605002401.2848541-1-dianders@chromium.org
> > > > > > [2] https://lore.kernel.org/r/20240612222435.3188234-1-dianders@chromium.org
> > > > >
> > > > > I don't think it's an issue.  We quiesce the hardware as if we were
> > > > > about to suspend the system (e.g., S3).  For the display hardware we
> > > > > call drm_atomic_helper_suspend() as part of that sequence.
> > > >
> > > > OK. It's no skin off my teeth and we can drop this patch if you're
> > > > convinced it's not needed. From the point of view of someone who has
> > > > no experience with this driver it seems weird to me that it would use
> > > > drm_atomic_helper_suspend() at shutdown time instead of the documented
> > > > drm_atomic_helper_shutdown(), but if it works for everyone then I'm
> > > > not gonna complain.
> > >
> > > I think the problem is that it is not clear exactly what the
> > > expectations are around the PCI shutdown callback.  The documentation
> > > says:
> > >
> > > "Hook into reboot_notifier_list (kernel/sys.c). Intended to stop any
> > > idling DMA operations. Useful for enabling wake-on-lan (NIC) or
> > > changing the power state of a device before reboot. e.g.
> > > drivers/net/e100.c."
> >
> > Arguably, there is no requirement to even touch the display hardware
> > at all.  In theory you could just leave the display hardware as is in
> > the current state.  The system will either be rebooting or powering
> > down anyway.
>
> I think it mostly boils down to a cultural mismatch :)
>
> Doug works on panel for ARM systems, where devices need (and need to
> handle) much more resources than what's typical on a system with an AMD
> GPU.
>
> So, for the kind of hardware Doug usually deals with, we definitely need
> the shutdown hook to make sure the regulators, GPIOs, etc. supplying the
> panel are properly shutdown.
>
> And panels usually tied to AMD GPUs probably don't need any of that.

Makes sense.  I think drm_atomic_helper_suspend() is a viable
alternative if drivers want to leverage their existing suspend code.
I could write up a doc patch unless there is reason to prefer the
shutdown variant.

Alex
Maxime Ripard June 21, 2024, 8:42 a.m. UTC | #9
On Thu, Jun 20, 2024 at 09:00:23AM GMT, Alex Deucher wrote:
> On Thu, Jun 20, 2024 at 3:10 AM Maxime Ripard <mripard@kernel.org> wrote:
> >
> > Hi,
> >
> > On Wed, Jun 19, 2024 at 09:53:12AM GMT, Alex Deucher wrote:
> > > On Wed, Jun 19, 2024 at 9:50 AM Alex Deucher <alexdeucher@gmail.com> wrote:
> > > >
> > > > On Tue, Jun 18, 2024 at 7:53 PM Doug Anderson <dianders@chromium.org> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > On Tue, Jun 18, 2024 at 3:00 PM Alex Deucher <alexdeucher@gmail.com> wrote:
> > > > > >
> > > > > > On Tue, Jun 18, 2024 at 5:40 PM Doug Anderson <dianders@chromium.org> wrote:
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Jun 17, 2024 at 8:01 AM Alex Deucher <alexdeucher@gmail.com> wrote:
> > > > > > > >
> > > > > > > > On Wed, Jun 12, 2024 at 6:37 PM Douglas Anderson <dianders@chromium.org> wrote:
> > > > > > > > >
> > > > > > > > > Based on grepping through the source code this driver appears to be
> > > > > > > > > missing a call to drm_atomic_helper_shutdown() at system shutdown
> > > > > > > > > time. Among other things, this means that if a panel is in use that it
> > > > > > > > > won't be cleanly powered off at system shutdown time.
> > > > > > > > >
> > > > > > > > > The fact that we should call drm_atomic_helper_shutdown() in the case
> > > > > > > > > of OS shutdown/restart comes straight out of the kernel doc "driver
> > > > > > > > > instance overview" in drm_drv.c.
> > > > > > > > >
> > > > > > > > > Suggested-by: Maxime Ripard <mripard@kernel.org>
> > > > > > > > > Cc: Alex Deucher <alexander.deucher@amd.com>
> > > > > > > > > Cc: Christian König <christian.koenig@amd.com>
> > > > > > > > > Cc: Xinhui Pan <Xinhui.Pan@amd.com>
> > > > > > > > > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > > > > > > > > ---
> > > > > > > > > This commit is only compile-time tested.
> > > > > > > > >
> > > > > > > > > ...and further, I'd say that this patch is more of a plea for help
> > > > > > > > > than a patch I think is actually right. I'm _fairly_ certain that
> > > > > > > > > drm/amdgpu needs this call at shutdown time but the logic is a bit
> > > > > > > > > hard for me to follow. I'd appreciate if anyone who actually knows
> > > > > > > > > what this should look like could illuminate me, or perhaps even just
> > > > > > > > > post a patch themselves!
> > > > > > > >
> > > > > > > > I'm not sure this patch makes sense or not.  The driver doesn't really
> > > > > > > > do a formal tear down in its shutdown routine, it just quiesces the
> > > > > > > > hardware.  What are the actual requirements of the shutdown function?
> > > > > > > > In the past when we did a full driver tear down in shutdown, it
> > > > > > > > delayed the shutdown sequence and users complained.
> > > > > > >
> > > > > > > The "inspiration" for this patch is to handle panels properly.
> > > > > > > Specifically, panels often have several power/enable signals going to
> > > > > > > them and often have requirements that these signals are powered off in
> > > > > > > the proper order with the proper delays between them. While we can't
> > > > > > > always do so when the system crashes / reboots in an uncontrolled way,
> > > > > > > panel manufacturers / HW Engineers get upset if we don't power things
> > > > > > > off properly during an orderly shutdown/reboot. When panels are
> > > > > > > powered off badly it can cause garbage on the screen and, so I've been
> > > > > > > told, can even cause long term damage to the panels over time.
> > > > > > >
> > > > > > > In Linux, some panel drivers have tried to ensure a proper poweroff of
> > > > > > > the panel by handling the shutdown() call themselves. However, this is
> > > > > > > ugly and panel maintainers want panel drivers to stop doing it. We
> > > > > > > have removed the code doing this from most panels now [1]. Instead the
> > > > > > > assumption is that the DRM modeset drivers should be calling
> > > > > > > drm_atomic_helper_shutdown() which will make sure panels get an
> > > > > > > orderly shutdown.
> > > > > > >
> > > > > > > For a lot more details, see the cover letter [2] which then contains
> > > > > > > links to even more discussions about the topic.
> > > > > > >
> > > > > > > [1] https://lore.kernel.org/r/20240605002401.2848541-1-dianders@chromium.org
> > > > > > > [2] https://lore.kernel.org/r/20240612222435.3188234-1-dianders@chromium.org
> > > > > >
> > > > > > I don't think it's an issue.  We quiesce the hardware as if we were
> > > > > > about to suspend the system (e.g., S3).  For the display hardware we
> > > > > > call drm_atomic_helper_suspend() as part of that sequence.
> > > > >
> > > > > OK. It's no skin off my teeth and we can drop this patch if you're
> > > > > convinced it's not needed. From the point of view of someone who has
> > > > > no experience with this driver it seems weird to me that it would use
> > > > > drm_atomic_helper_suspend() at shutdown time instead of the documented
> > > > > drm_atomic_helper_shutdown(), but if it works for everyone then I'm
> > > > > not gonna complain.
> > > >
> > > > I think the problem is that it is not clear exactly what the
> > > > expectations are around the PCI shutdown callback.  The documentation
> > > > says:
> > > >
> > > > "Hook into reboot_notifier_list (kernel/sys.c). Intended to stop any
> > > > idling DMA operations. Useful for enabling wake-on-lan (NIC) or
> > > > changing the power state of a device before reboot. e.g.
> > > > drivers/net/e100.c."
> > >
> > > Arguably, there is no requirement to even touch the display hardware
> > > at all.  In theory you could just leave the display hardware as is in
> > > the current state.  The system will either be rebooting or powering
> > > down anyway.
> >
> > I think it mostly boils down to a cultural mismatch :)
> >
> > Doug works on panel for ARM systems, where devices need (and need to
> > handle) much more resources than what's typical on a system with an AMD
> > GPU.
> >
> > So, for the kind of hardware Doug usually deals with, we definitely need
> > the shutdown hook to make sure the regulators, GPIOs, etc. supplying the
> > panel are properly shutdown.
> >
> > And panels usually tied to AMD GPUs probably don't need any of that.
> 
> Makes sense.  I think drm_atomic_helper_suspend() is a viable
> alternative if drivers want to leverage their existing suspend code.
> I could write up a doc patch unless there is reason to prefer the
> shutdown variant.

It could be worth adding that mention in the _shutdown() doc indeed, but
I'm not sure we need to enforce anything at this point. I'd rather have
either, at the driver's authors discretion.

Maxime
diff mbox series

Patch

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index f87d53e183c3..c202a1d5ff5f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1197,6 +1197,7 @@  static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_device *bdev)
 int amdgpu_device_init(struct amdgpu_device *adev,
 		       uint32_t flags);
 void amdgpu_device_fini_hw(struct amdgpu_device *adev);
+void amdgpu_device_shutdown_hw(struct amdgpu_device *adev);
 void amdgpu_device_fini_sw(struct amdgpu_device *adev);
 
 int amdgpu_gpu_wait_for_idle(struct amdgpu_device *adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 861ccff78af9..a8c4b8412e04 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4531,6 +4531,16 @@  void amdgpu_device_fini_hw(struct amdgpu_device *adev)
 
 }
 
+void amdgpu_device_shutdown_hw(struct amdgpu_device *adev)
+{
+	if (adev->mode_info.mode_config_initialized) {
+		if (!drm_drv_uses_atomic_modeset(adev_to_drm(adev)))
+			drm_helper_force_disable_all(adev_to_drm(adev));
+		else
+			drm_atomic_helper_shutdown(adev_to_drm(adev));
+	}
+}
+
 void amdgpu_device_fini_sw(struct amdgpu_device *adev)
 {
 	int idx;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index ea14f1c8f430..b34bf9259d5c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2409,6 +2409,8 @@  amdgpu_pci_shutdown(struct pci_dev *pdev)
 	struct drm_device *dev = pci_get_drvdata(pdev);
 	struct amdgpu_device *adev = drm_to_adev(dev);
 
+	amdgpu_device_shutdown_hw(adev);
+
 	if (amdgpu_ras_intr_triggered())
 		return;