mbox series

[git,pull] drm fixes for 6.11-rc6

Message ID CAPM=9tzX71UKndfu8JECMOzc9kf4s4pp9cWTMWwE476cQXt_Yw@mail.gmail.com (mailing list archive)
State New, archived
Headers show
Series [git,pull] drm fixes for 6.11-rc6 | expand

Pull-request

https://gitlab.freedesktop.org/drm/kernel.git tags/drm-fixes-2024-08-30

Message

Dave Airlie Aug. 30, 2024, 2:08 a.m. UTC
Hey Linus,

Another week, another set of GPU fixes. amdgpu and vmwgfx leading the
charge, then i915 and xe changes along with v3d and some other bits.
The TTM revert is due to some stuttering graphical apps probably due
to longer stalls while prefaulting.

Seems pretty much where I'd expect things,

Dave.

drm-fixes-2024-08-30:
drm fixes for 6.11-rc6

ttm:
- revert prefault change, caused stutters

aperture:
- handle non-VGA devices bettter

amdgpu:
- SWSMU gaming stability fix
- SMU 13.0.7 fix
- SWSMU documentation alignment fix
- SMU 14.0.x fixes
- GC 12.x fix
- Display fix
- IP discovery fix
- SMU 13.0.6 fix

i915:
- Fix #11195: The external display connect via USB type-C dock stays
blank after re-connect the dock
- Make DSI backlight work for 2G version of Lenovo Yoga Tab 3 X90F
- Move ARL GuC firmware to correct version

xe:
- Invalidate media_gt TLBs
- Fix HWMON i1 power setup write command

vmwgfx:
- prevent unmapping active read buffers
- fix prime with external buffers
- disable coherent dumb buffers without 3d

v3d:
- disable preemption while updating GPU stats
The following changes since commit 5be63fc19fcaa4c236b307420483578a56986a37:

  Linux 6.11-rc5 (2024-08-25 19:07:11 +1200)

are available in the Git repository at:

  https://gitlab.freedesktop.org/drm/kernel.git tags/drm-fixes-2024-08-30

for you to fetch changes up to 27f5b729cb56e46d8beca47c227c0edf1e958fbb:

  Merge tag 'drm-misc-fixes-2024-08-29' of
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
(2024-08-30 11:28:11 +1000)

----------------------------------------------------------------
drm fixes for 6.11-rc6

ttm:
- revert prefault change, caused stutters

aperture:
- handle non-VGA devices bettter

amdgpu:
- SWSMU gaming stability fix
- SMU 13.0.7 fix
- SWSMU documentation alignment fix
- SMU 14.0.x fixes
- GC 12.x fix
- Display fix
- IP discovery fix
- SMU 13.0.6 fix

i915:
- Fix #11195: The external display connect via USB type-C dock stays
blank after re-connect the dock
- Make DSI backlight work for 2G version of Lenovo Yoga Tab 3 X90F
- Move ARL GuC firmware to correct version

xe:
- Invalidate media_gt TLBs
- Fix HWMON i1 power setup write command

vmwgfx:
- prevent unmapping active read buffers
- fix prime with external buffers
- disable coherent dumb buffers without 3d

v3d:
- disable preemption while updating GPU stats

----------------------------------------------------------------
Alex Deucher (6):
      Revert "drm/ttm: increase ttm pre-fault value to PMD size"
      video/aperture: optionally match the device in sysfb_disable()
      drm/amdgpu: align pp_power_profile_mode with kernel docs
      drm/amdgpu/smu13.0.7: print index for profiles
      drm/amdgpu/swsmu: always force a state reprogram on init
      drm/amdgpu/gfx12: set UNORD_DISPATCH in compute MQDs

Candice Li (1):
      drm/amd/pm: Drop unsupported features on smu v14_0_2

Dave Airlie (4):
      Merge tag 'amd-drm-fixes-6.11-2024-08-28' of
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
      Merge tag 'drm-intel-fixes-2024-08-29' of
https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes
      Merge tag 'drm-xe-fixes-2024-08-29' of
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
      Merge tag 'drm-misc-fixes-2024-08-29' of
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

Hans de Goede (1):
      drm/i915/dsi: Make Lenovo Yoga Tab 3 X90F DMI match less strict

Imre Deak (1):
      drm/i915/dp_mst: Fix MST state after a sink reset

John Harrison (1):
      drm/i915: ARL requires a newer GSC firmware

Karthik Poosa (1):
      drm/xe/hwmon: Fix WRITE_I1 param from u32 to u16

Kenneth Feng (1):
      drm/amd/pm: update message interface for smu v14.0.2/3

Lijo Lazar (1):
      drm/amd/pm: Add support for new P2S table revision

Likun Gao (1):
      drm/amdgpu: support for gc_info table v1.3

Ma Ke (1):
      drm/amd/display: avoid using null object of framebuffer

Matthew Brost (1):
      drm/xe: Invalidate media_gt TLBs

Tvrtko Ursulin (1):
      drm/v3d: Disable preemption while updating GPU stats

Zack Rusin (3):
      drm/vmwgfx: Prevent unmapping active read buffers
      drm/vmwgfx: Fix prime with external buffers
      drm/vmwgfx: Disable coherent dumb buffers without 3d

 drivers/firmware/sysfb.c                           |  19 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c      |  11 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h            |   6 ++
 drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c             |   2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v12.c   |   1 +
 .../drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c    |   9 +-
 drivers/gpu/drm/amd/include/discovery.h            |  42 ++++++++
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c          |  21 ++--
 .../amd/pm/swsmu/inc/pmfw_if/smu_v14_0_2_ppsmc.h   |  18 +++-
 .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c   |   7 +-
 .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c   |   4 +-
 .../gpu/drm/amd/pm/swsmu/smu14/smu_v14_0_2_ppt.c   |  48 ---------
 drivers/gpu/drm/i915/display/intel_dp.c            |  12 +++
 drivers/gpu/drm/i915/display/intel_dp_mst.c        |  40 ++++++++
 drivers/gpu/drm/i915/display/intel_dp_mst.h        |   1 +
 drivers/gpu/drm/i915/display/vlv_dsi.c             |   1 -
 drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c          |  31 ++++++
 drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c           |  10 +-
 drivers/gpu/drm/i915/i915_drv.h                    |   2 +
 drivers/gpu/drm/i915/intel_device_info.c           |   7 ++
 drivers/gpu/drm/i915/intel_device_info.h           |   3 +
 drivers/gpu/drm/v3d/v3d_sched.c                    |   6 ++
 drivers/gpu/drm/vmwgfx/vmwgfx_blit.c               | 114 ++++++++++++++++++++-
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.c                 |  13 ++-
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.h                 |   3 +
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h                |   4 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c               |  12 +--
 drivers/gpu/drm/vmwgfx/vmwgfx_surface.c            |   6 +-
 drivers/gpu/drm/xe/xe_hwmon.c                      |   2 +-
 drivers/gpu/drm/xe/xe_vm.c                         |  37 ++++---
 drivers/of/platform.c                              |   2 +-
 drivers/video/aperture.c                           |  11 +-
 include/drm/intel/i915_pciids.h                    |  11 +-
 include/drm/ttm/ttm_bo.h                           |   4 -
 include/linux/sysfb.h                              |   4 +-
 35 files changed, 398 insertions(+), 126 deletions(-)

Comments

Linus Torvalds Aug. 30, 2024, 2:32 a.m. UTC | #1
On Fri, 30 Aug 2024 at 14:08, Dave Airlie <airlied@gmail.com> wrote:
>
> The TTM revert is due to some stuttering graphical apps probably due
> to longer stalls while prefaulting.

Yeah, trying to pre-fault a PMD worth of pages in one go is just crazy talk.

Now, if it was PMD-aligned and you faulted in a single PMD, that would
be different. But just doing prn_insert_page() in a loop is insane.

The code doesn't even stop when it hits a page that already existed,
and it keeps locking and unlocking the last-level page table over and
over again.

Honestly, that code is questionable even for the *small* value, much
less the "a PMD size" case.

Now, if you have an array of 'struct page *", you can use
vm_insert_pages(), and that's reasonably efficient.

And if you have a *contiguous* are of pfns, you can use remap_pfn_range().

But that "insert one pfn at a time" that the drm layer does is
complete garbage. You're not speeding anything up, you're just digging
deeper.

                  Linus
pr-tracker-bot@kernel.org Aug. 30, 2024, 2:44 a.m. UTC | #2
The pull request you sent on Fri, 30 Aug 2024 12:08:41 +1000:

> https://gitlab.freedesktop.org/drm/kernel.git tags/drm-fixes-2024-08-30

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/20371ba120635d9ab7fc7670497105af8f33eb08

Thank you!
Dave Airlie Sept. 1, 2024, 10:13 p.m. UTC | #3
On Fri, 30 Aug 2024 at 12:32, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Fri, 30 Aug 2024 at 14:08, Dave Airlie <airlied@gmail.com> wrote:
> >
> > The TTM revert is due to some stuttering graphical apps probably due
> > to longer stalls while prefaulting.
>
> Yeah, trying to pre-fault a PMD worth of pages in one go is just crazy talk.
>
> Now, if it was PMD-aligned and you faulted in a single PMD, that would
> be different. But just doing prn_insert_page() in a loop is insane.
>
> The code doesn't even stop when it hits a page that already existed,
> and it keeps locking and unlocking the last-level page table over and
> over again.
>
> Honestly, that code is questionable even for the *small* value, much
> less the "a PMD size" case.
>
> Now, if you have an array of 'struct page *", you can use
> vm_insert_pages(), and that's reasonably efficient.
>
> And if you have a *contiguous* are of pfns, you can use remap_pfn_range().
>
> But that "insert one pfn at a time" that the drm layer does is
> complete garbage. You're not speeding anything up, you're just digging
> deeper.

I wonder if there is functionality that could be provided in a common
helper, by the mm layers, or if there would be too many locking
interactions to make it sane,

It seems too fraught with danger for drivers or subsystems to be just
doing this in the simplest way that isn't actually that smart.

Dave.
Thomas Hellstrom Sept. 2, 2024, 9:32 a.m. UTC | #4
On Mon, 2024-09-02 at 08:13 +1000, Dave Airlie wrote:
> On Fri, 30 Aug 2024 at 12:32, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > 
> > On Fri, 30 Aug 2024 at 14:08, Dave Airlie <airlied@gmail.com>
> > wrote:
> > > 
> > > The TTM revert is due to some stuttering graphical apps probably
> > > due
> > > to longer stalls while prefaulting.
> > 
> > Yeah, trying to pre-fault a PMD worth of pages in one go is just
> > crazy talk.
> > 
> > Now, if it was PMD-aligned and you faulted in a single PMD, that
> > would
> > be different. But just doing prn_insert_page() in a loop is insane.
> > 
> > The code doesn't even stop when it hits a page that already
> > existed,
> > and it keeps locking and unlocking the last-level page table over
> > and
> > over again.
> > 
> > Honestly, that code is questionable even for the *small* value,
> > much
> > less the "a PMD size" case.
> > 
> > Now, if you have an array of 'struct page *", you can use
> > vm_insert_pages(), and that's reasonably efficient.
> > 
> > And if you have a *contiguous* are of pfns, you can use
> > remap_pfn_range().
> > 
> > But that "insert one pfn at a time" that the drm layer does is
> > complete garbage. You're not speeding anything up, you're just
> > digging
> > deeper.


> 
> I wonder if there is functionality that could be provided in a common
> helper, by the mm layers, or if there would be too many locking
> interactions to make it sane,
> 
> It seems too fraught with danger for drivers or subsystems to be just
> doing this in the simplest way that isn't actually that smart.

Hmm. I see even the "Don't error on prefaults" check was broken at some
point :/.

There have been numerous ways to try to address this,

The remap_pfn_range was last tried, at least in the context of the i915
driver IIRC by Christoph Hellwig but had to be ripped out since it
requires the mmap_lock in write mode. Here we have it only in read
mode.

Then there's the apply_to_page_range() used by the igfx functionality
of the i915 driver. I don't think we should go that route without
turning it into something like vm_insert_pfns() with proper checking.
This approach populates all entries of a buffer object.

Finally there's the huge fault attempt that had to be ripped out due to
lack of pmd_special and pud_special flags and resulting clashes with
gup_fast.

Perhaps a combination of the two latter if properly implemented would
be the best choice.

/Thomas

> 
> Dave.
Christian König Sept. 2, 2024, 10:33 a.m. UTC | #5
Am 02.09.24 um 11:32 schrieb Thomas Hellström:
> On Mon, 2024-09-02 at 08:13 +1000, Dave Airlie wrote:
>> On Fri, 30 Aug 2024 at 12:32, Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>>> On Fri, 30 Aug 2024 at 14:08, Dave Airlie <airlied@gmail.com>
>>> wrote:
>>>> The TTM revert is due to some stuttering graphical apps probably
>>>> due
>>>> to longer stalls while prefaulting.
>>> Yeah, trying to pre-fault a PMD worth of pages in one go is just
>>> crazy talk.
>>>
>>> Now, if it was PMD-aligned and you faulted in a single PMD, that
>>> would
>>> be different. But just doing prn_insert_page() in a loop is insane.
>>>
>>> The code doesn't even stop when it hits a page that already
>>> existed,
>>> and it keeps locking and unlocking the last-level page table over
>>> and
>>> over again.
>>>
>>> Honestly, that code is questionable even for the *small* value,
>>> much
>>> less the "a PMD size" case.
>>>
>>> Now, if you have an array of 'struct page *", you can use
>>> vm_insert_pages(), and that's reasonably efficient.
>>>
>>> And if you have a *contiguous* are of pfns, you can use
>>> remap_pfn_range().
>>>
>>> But that "insert one pfn at a time" that the drm layer does is
>>> complete garbage. You're not speeding anything up, you're just
>>> digging
>>> deeper.
>
>> I wonder if there is functionality that could be provided in a common
>> helper, by the mm layers, or if there would be too many locking
>> interactions to make it sane,
>>
>> It seems too fraught with danger for drivers or subsystems to be just
>> doing this in the simplest way that isn't actually that smart.
> Hmm. I see even the "Don't error on prefaults" check was broken at some
> point :/.
>
> There have been numerous ways to try to address this,
>
> The remap_pfn_range was last tried, at least in the context of the i915
> driver IIRC by Christoph Hellwig but had to be ripped out since it
> requires the mmap_lock in write mode. Here we have it only in read
> mode.
>
> Then there's the apply_to_page_range() used by the igfx functionality
> of the i915 driver. I don't think we should go that route without
> turning it into something like vm_insert_pfns() with proper checking.
> This approach populates all entries of a buffer object.
>
> Finally there's the huge fault attempt that had to be ripped out due to
> lack of pmd_special and pud_special flags and resulting clashes with
> gup_fast.
>
> Perhaps a combination of the two latter if properly implemented would
> be the best choice.

I'm not deep enough into the memory management background to judge which 
approach is the best, just one more data point to provide:

The pre-faulting was increased because of virtualization. When KVM/XEN 
is mapping a BO into a guest the switching overhead for each fault is so 
high that mapping a lot of PFNs at the same time becomes beneficial.

Regards,
Christian.

>
> /Thomas
>
>> Dave.
Linus Torvalds Sept. 2, 2024, 7:37 p.m. UTC | #6
On Mon, 2 Sept 2024 at 03:34, Christian König <christian.koenig@amd.com> wrote:
>
> Am 02.09.24 um 11:32 schrieb Thomas Hellström:
> >
> > The remap_pfn_range was last tried, at least in the context of the i915
> > driver IIRC by Christoph Hellwig but had to be ripped out since it
> > requires the mmap_lock in write mode. Here we have it only in read
> > mode.

Yeah, that does make things much more fragile. The "fill in multiple
pages" helpers are really meant to be used at mmap() time, not as some
kind of fault-around.

So remap_pfn_range() sets the vma flags etc, but it also depends on
being the only one to fill in the ptes, and will be *very* unhappy if
it finds something that has already been filled in.

And fault-around is *much* more fragile because unlike the mmap time
thing, it can happen concurrently with other faults, and you cannot
make assumptions about existing page table layout etc.

> The pre-faulting was increased because of virtualization. When KVM/XEN
> is mapping a BO into a guest the switching overhead for each fault is so
> high that mapping a lot of PFNs at the same time becomes beneficial.

Honestly, from a pure performance standpoint, if you can just do all
the page mapping at mmap() time, that would be *much* much better.

Then you can easily use that remap_pfn_range() function, and latencies
are much less of an issue in general (not because it would be any
faster, but simply because people don't tend to use mmap() in
latency-critical code - but taking a page fault is *easily* very
critical indeed).

                Linus
Thomas Hellstrom Sept. 3, 2024, 9:26 a.m. UTC | #7
On Mon, 2024-09-02 at 12:33 +0200, Christian König wrote:
> Am 02.09.24 um 11:32 schrieb Thomas Hellström:
> > On Mon, 2024-09-02 at 08:13 +1000, Dave Airlie wrote:
> > > On Fri, 30 Aug 2024 at 12:32, Linus Torvalds
> > > <torvalds@linux-foundation.org> wrote:
> > > > On Fri, 30 Aug 2024 at 14:08, Dave Airlie <airlied@gmail.com>
> > > > wrote:
> > > > > The TTM revert is due to some stuttering graphical apps
> > > > > probably
> > > > > due
> > > > > to longer stalls while prefaulting.
> > > > Yeah, trying to pre-fault a PMD worth of pages in one go is
> > > > just
> > > > crazy talk.
> > > > 
> > > > Now, if it was PMD-aligned and you faulted in a single PMD,
> > > > that
> > > > would
> > > > be different. But just doing prn_insert_page() in a loop is
> > > > insane.
> > > > 
> > > > The code doesn't even stop when it hits a page that already
> > > > existed,
> > > > and it keeps locking and unlocking the last-level page table
> > > > over
> > > > and
> > > > over again.
> > > > 
> > > > Honestly, that code is questionable even for the *small* value,
> > > > much
> > > > less the "a PMD size" case.
> > > > 
> > > > Now, if you have an array of 'struct page *", you can use
> > > > vm_insert_pages(), and that's reasonably efficient.
> > > > 
> > > > And if you have a *contiguous* are of pfns, you can use
> > > > remap_pfn_range().
> > > > 
> > > > But that "insert one pfn at a time" that the drm layer does is
> > > > complete garbage. You're not speeding anything up, you're just
> > > > digging
> > > > deeper.
> > 
> > > I wonder if there is functionality that could be provided in a
> > > common
> > > helper, by the mm layers, or if there would be too many locking
> > > interactions to make it sane,
> > > 
> > > It seems too fraught with danger for drivers or subsystems to be
> > > just
> > > doing this in the simplest way that isn't actually that smart.
> > Hmm. I see even the "Don't error on prefaults" check was broken at
> > some
> > point :/.
> > 
> > There have been numerous ways to try to address this,
> > 
> > The remap_pfn_range was last tried, at least in the context of the
> > i915
> > driver IIRC by Christoph Hellwig but had to be ripped out since it
> > requires the mmap_lock in write mode. Here we have it only in read
> > mode.
> > 
> > Then there's the apply_to_page_range() used by the igfx
> > functionality
> > of the i915 driver. I don't think we should go that route without
> > turning it into something like vm_insert_pfns() with proper
> > checking.
> > This approach populates all entries of a buffer object.
> > 
> > Finally there's the huge fault attempt that had to be ripped out
> > due to
> > lack of pmd_special and pud_special flags and resulting clashes
> > with
> > gup_fast.
> > 
> > Perhaps a combination of the two latter if properly implemented
> > would
> > be the best choice.
> 
> I'm not deep enough into the memory management background to judge
> which 
> approach is the best, just one more data point to provide:
> 
> The pre-faulting was increased because of virtualization. When
> KVM/XEN 
> is mapping a BO into a guest the switching overhead for each fault is
> so 
> high that mapping a lot of PFNs at the same time becomes beneficial.

Since populating at mmap time is not possible due to eviction /
migration, perhaps one way would be to use madvise() to toggle
prefaulting size? MADV_RANDOM vs MADV_SEQUENTIAL vs MADV_NORMAL.

/Thomas

> 
> Regards,
> Christian.
> 
> > 
> > /Thomas
> > 
> > > Dave.
>